-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(python): Enable collection with gpu engine #17550
Conversation
For discussion if this is the interface we want (cc @r-brink, @ritchie46). lazy-loading the gpu engine makes the UX of importing polars still fast if the relevant packages are available, but slows down the first query anyone runs. I'm not sure there's a good solution here: eager-loading is fiddly because then it's easy to get into circular import issues between cudf_polars and polars itself (since cudf_polars depends on polars). |
d5eb4fd
to
ae70be9
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #17550 +/- ##
==========================================
+ Coverage 80.49% 80.53% +0.03%
==========================================
Files 1503 1505 +2
Lines 197054 197187 +133
Branches 2805 2810 +5
==========================================
+ Hits 158615 158795 +180
+ Misses 37918 37873 -45
+ Partials 521 519 -2 ☔ View full report in Codecov by Sentry. |
This looks great. What about using it on multi GPU workstation or multi GPU cloud images? Can we set a specific GPU when running on a mutli gpu workstation? Pytorch has "cuda" device, but also "cuda:n" where n is the GPU index. The latter is very useful when one wants to run different pipelines on each available GPU. |
Yes, this seems right to me @wence- . 👍 I hope we will be able to do mixed subplans soon. If we can we must sell that in the docstrings as well. |
I think we want to allow (as @jfpuget suggests) configuration of which device the query will execute on. Let's paint this shed: Some suggestions: We could pick something like the pytorch approach, and rather than a boolean flag Downside of this approach is that it is more (slightly) complex, and if we using a string for the engine parameter we can't provide a fully-typed signature. We could get around that by accepting strings, or small classes. For example (sketch): @dataclass
class GpuEngine(EngineConfig):
device: int
# example other options, advanced users may wish to control
# the memory allocator used (rather than using our default)
... other options here
@dataclass
class StreamingEngine(EngineConfig):
... streaming options
...
EngineOptions = Literal["default", "cpu", "streaming", "gpu"] | EngineConfig
def collect(..., engine: EngineOptions = "default") Or, we can stick with the current configuration through environment variables/
Advantages: fits with the rest of the polars configuration setup, and keeps the default UX simple Disadvantages: passing "complicated" additional configuration options is more challenging, since we can only shove (effectively) strings in as values. |
Pytorch uses either string or a device object. The later enables type checking. Seems similar to what you propose. |
I think it is best to have this local (e.g. with arguments). That way you can collect multiple plans on different devices if you'd like. Let's go for the engine options API. |
To be more complete, pytorch and others like vllm also allow environment variables. With the environment variable
Then in the code you can create a device for one of the four gpu that are accessible, e.g
or
|
TBH I don't know if the |
It depends on when torch initialises the cuda runtime/driver stack. However, in any case, in a single process once the driver is initialised, any subsequent modification of Moreover, the mapping is managed by the driver, so we don't need to do anything special here to support it: the ordinal device ID that one passes maps directly on to the ID in |
Exactly, and I don't know when this happens. It is why I set the environment variable before importing torch in my code. |
90a5de7
to
7088992
Compare
This is getting close I think, implemented the @bdice, some suggestions on the pyproject deps side of thing would be good. We are planning that:
Gets the CUDA-12 enabled package. We will provide fallback errors on our (cudf-polars) side at import if the host CTK is too old (and they should therefore install There is not (yet) a package for cudf polars on pypi (that is coming as part of the rapids 24.08 release). |
7088992
to
c7a0af8
Compare
df7e427
to
67065b8
Compare
Introduce option to collect a query using the cudf_polars gpu engine. By default this falls back transparently to the default cpu engine if the query cannot be executed. Configuration of specifics of the GPU engine can be controlled by passing a GPUEngine object to the new `engine` argument to `collect`. The import of cudf_polars is currently quite expensive (will improve in the coming months), so since we lazy-load the gpu engine, the first query executed on gpu will pay this (one-time) cost.
67065b8
to
98cc120
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for putting this up!
Looking forward to collect on GPU.
Some packaging thoughts.
52da91f
to
7fafb7c
Compare
7fafb7c
to
06bb0bd
Compare
Alright. Thanks everyone! Going in. |
Introduce option to collect a query using the cudf_polars gpu engine. By default this falls back transparently to the default cpu engine if the query cannot be executed. This can be controlled by setting POLARS_GPU_DISABLE_FALLBACK (mostly useful for testing/debugging).
The import of cudf_polars is currently quite expensive (will improve in the coming months), so since we lazy-load the gpu engine, the first query executed on gpu will pay this (one-time) cost.