feat(python): Enable collection with gpu engine #17550

wence- · 2024-07-10T11:53:04Z

Introduce option to collect a query using the cudf_polars gpu engine. By default this falls back transparently to the default cpu engine if the query cannot be executed. This can be controlled by setting POLARS_GPU_DISABLE_FALLBACK (mostly useful for testing/debugging).

The import of cudf_polars is currently quite expensive (will improve in the coming months), so since we lazy-load the gpu engine, the first query executed on gpu will pay this (one-time) cost.

wence- · 2024-07-10T11:54:50Z

For discussion if this is the interface we want (cc @r-brink, @ritchie46).

lazy-loading the gpu engine makes the UX of importing polars still fast if the relevant packages are available, but slows down the first query anyone runs.

I'm not sure there's a good solution here: eager-loading is fiddly because then it's easy to get into circular import issues between cudf_polars and polars itself (since cudf_polars depends on polars).

py-polars/polars/lazyframe/frame.py

codecov · 2024-07-10T15:09:11Z

Codecov Report

Attention: Patch coverage is 91.17647% with 3 lines in your changes missing coverage. Please review.

Project coverage is 80.53%. Comparing base (26a16e3) to head (06bb0bd).
Report is 16 commits behind head on main.

Files	Patch %	Lines
py-polars/polars/lazyframe/frame.py	83.33%	3 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #17550      +/-   ##
==========================================
+ Coverage   80.49%   80.53%   +0.03%     
==========================================
  Files        1503     1505       +2     
  Lines      197054   197187     +133     
  Branches     2805     2810       +5     
==========================================
+ Hits       158615   158795     +180     
+ Misses      37918    37873      -45     
+ Partials      521      519       -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jfpuget · 2024-07-18T17:48:58Z

This looks great.

What about using it on multi GPU workstation or multi GPU cloud images? Can we set a specific GPU when running on a mutli gpu workstation? Pytorch has "cuda" device, but also "cuda:n" where n is the GPU index. The latter is very useful when one wants to run different pipelines on each available GPU.

ritchie46 · 2024-07-19T07:21:17Z

Yes, this seems right to me @wence- . 👍

I hope we will be able to do mixed subplans soon. If we can we must sell that in the docstrings as well.

wence- · 2024-07-19T08:26:48Z

I think we want to allow (as @jfpuget suggests) configuration of which device the query will execute on. Let's paint this shed:

Some suggestions:

We could pick something like the pytorch approach, and rather than a boolean flag gpu=True/False could introduce an engine=... parameter. That way, we avoid some of the complexity around streaming/background mode not being compatible (for now) with the gpu engine.

Downside of this approach is that it is more (slightly) complex, and if we using a string for the engine parameter we can't provide a fully-typed signature. We could get around that by accepting strings, or small classes.

For example (sketch):

@dataclass
class GpuEngine(EngineConfig):
   device: int
   # example other options, advanced users may wish to control
   # the memory allocator used (rather than using our default)
   ... other options here

@dataclass
class StreamingEngine(EngineConfig):
    ... streaming options

...

EngineOptions = Literal["default", "cpu", "streaming", "gpu"] | EngineConfig

def collect(..., engine: EngineOptions = "default")

Or, we can stick with the current configuration through environment variables/pl.Config approach so we would have:

export POLARS_GPU_DEVICE=1

python run.py # runs query on device 1

with pl.Config(gpu_device=1):
    q.collect(gpu=True) # runs on device 1.

Advantages: fits with the rest of the polars configuration setup, and keeps the default UX simple

Disadvantages: passing "complicated" additional configuration options is more challenging, since we can only shove (effectively) strings in as values.

jfpuget · 2024-07-22T09:51:28Z

Pytorch uses either string or a device object. The later enables type checking. Seems similar to what you propose.

ritchie46 · 2024-07-22T11:42:02Z

I think it is best to have this local (e.g. with arguments). That way you can collect multiple plans on different devices if you'd like.

Let's go for the engine options API.

jfpuget · 2024-07-22T12:24:13Z

To be more complete, pytorch and others like vllm also allow environment variables.

With the environment variable CUDA_VISIBLE_DEVICES (if I rememeber the name correctly) you can select which GPU a process sees, befor eimporting pytorch, e.g.

os.environ["CUDA_VISIBLE_DEVICES"] = "4,5,6,7"
import torch

Then in the code you can create a device for one of the four gpu that are accessible, e.g

tensor(data, devicce='cuda:2')

or

device = torch.device("cuda:2")
x = tensor(data, device=device)

jfpuget · 2024-07-22T12:26:09Z

TBH I don't know if the CUDA_VISIBLE_DEVICES variable is read when torch is imported or when a device is created.

wence- · 2024-07-22T13:14:47Z

CUDA_VISIBLE_DEVICES affects which physical devices are accessible to the cuda driver and runtime. The mapping is initialised during the initialisation of the cuda driver. So in response to @jfpuget:

TBH I don't know if the CUDA_VISIBLE_DEVICES variable is read when torch is imported or when a device is created.

It depends on when torch initialises the cuda runtime/driver stack.

However, in any case, in a single process once the driver is initialised, any subsequent modification of CUDA_VISIBLE_DEVICES does nothing.

Moreover, the mapping is managed by the driver, so we don't need to do anything special here to support it: the ordinal device ID that one passes maps directly on to the ID in CUDA_VISIBLE_DEVICES order.

jfpuget · 2024-07-22T14:32:50Z

It depends on when torch initialises the cuda runtime/driver stack.

Exactly, and I don't know when this happens. It is why I set the environment variable before importing torch in my code.

wence- · 2024-07-22T17:18:30Z

This is getting close I think, implemented the engine=SomeObject approach.

@bdice, some suggestions on the pyproject deps side of thing would be good.

We are planning that:

pip install polars[gpu]

Gets the CUDA-12 enabled package. We will provide fallback errors on our (cudf-polars) side at import if the host CTK is too old (and they should therefore install cudf-polars-cu11 instead).

There is not (yet) a package for cudf polars on pypi (that is coming as part of the rapids 24.08 release).

py-polars/polars/lazyframe/frame.py

Introduce option to collect a query using the cudf_polars gpu engine. By default this falls back transparently to the default cpu engine if the query cannot be executed. Configuration of specifics of the GPU engine can be controlled by passing a GPUEngine object to the new `engine` argument to `collect`. The import of cudf_polars is currently quite expensive (will improve in the coming months), so since we lazy-load the gpu engine, the first query executed on gpu will pay this (one-time) cost.

py-polars/polars/lazyframe/engine_config.py

py-polars/pyproject.toml

lithomas1

Thanks for putting this up!

Looking forward to collect on GPU.

Some packaging thoughts.

py-polars/pyproject.toml

py-polars/polars/lazyframe/frame.py

py-polars/pyproject.toml

py-polars/polars/lazyframe/engine_config.py

py-polars/polars/lazyframe/frame.py

ritchie46 · 2024-07-25T16:41:37Z

Alright. Thanks everyone! Going in.

github-actions bot added the title needs formatting label Jul 10, 2024

wence- changed the title ~~feat(python): enable collection with gpu engine~~ feat(python): Enable collection with gpu engine Jul 10, 2024

github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars and removed title needs formatting labels Jul 10, 2024

wence- force-pushed the wence/fea/polars-gpu-collect branch from d5eb4fd to ae70be9 Compare July 10, 2024 11:56

bdice reviewed Jul 10, 2024

View reviewed changes

py-polars/polars/lazyframe/frame.py Outdated Show resolved Hide resolved

py-polars/polars/lazyframe/frame.py Outdated Show resolved Hide resolved

beckernick mentioned this pull request Jul 10, 2024

[FEA] Reduce time required to import cudf_polars rapidsai/cudf#16244

Closed

wence- force-pushed the wence/fea/polars-gpu-collect branch 2 times, most recently from 90a5de7 to 7088992 Compare July 22, 2024 17:14

wence- marked this pull request as ready for review July 22, 2024 17:16

wence- requested review from ritchie46, c-peters, alexander-beedie, MarcoGorelli and reswqa as code owners July 22, 2024 17:16

wence- mentioned this pull request Jul 22, 2024

Use new polars engine config object in cudf-polars callback rapidsai/cudf#16347

Merged

3 tasks

wence- force-pushed the wence/fea/polars-gpu-collect branch from 7088992 to c7a0af8 Compare July 23, 2024 16:08

bdice reviewed Jul 23, 2024

View reviewed changes

py-polars/polars/lazyframe/frame.py Outdated Show resolved Hide resolved

wence- force-pushed the wence/fea/polars-gpu-collect branch 2 times, most recently from df7e427 to 67065b8 Compare July 24, 2024 13:16

wence- added 2 commits July 24, 2024 13:18

Pyproject deps

98cc120

wence- force-pushed the wence/fea/polars-gpu-collect branch from 67065b8 to 98cc120 Compare July 24, 2024 13:18

wence- commented Jul 24, 2024

View reviewed changes

py-polars/polars/lazyframe/engine_config.py Show resolved Hide resolved

wence- commented Jul 24, 2024

View reviewed changes

py-polars/pyproject.toml Show resolved Hide resolved

wence- commented Jul 24, 2024

View reviewed changes

py-polars/pyproject.toml Show resolved Hide resolved

lithomas1 reviewed Jul 24, 2024

View reviewed changes

py-polars/pyproject.toml Show resolved Hide resolved

py-polars/pyproject.toml Show resolved Hide resolved

py-polars/polars/lazyframe/frame.py Show resolved Hide resolved

py-polars/pyproject.toml Show resolved Hide resolved

vyasr reviewed Jul 24, 2024

View reviewed changes

py-polars/polars/lazyframe/engine_config.py Show resolved Hide resolved

vyasr reviewed Jul 24, 2024

View reviewed changes

py-polars/polars/lazyframe/frame.py Show resolved Hide resolved

Add docstring example and error checking for engine argument

15b2ad4

wence- force-pushed the wence/fea/polars-gpu-collect branch from 52da91f to 7fafb7c Compare July 25, 2024 10:49

Add test of engine selection logic

06bb0bd

wence- force-pushed the wence/fea/polars-gpu-collect branch from 7fafb7c to 06bb0bd Compare July 25, 2024 11:01

ritchie46 approved these changes Jul 25, 2024

View reviewed changes

ritchie46 merged commit 4739460 into pola-rs:main Jul 25, 2024
14 checks passed

wence- deleted the wence/fea/polars-gpu-collect branch July 26, 2024 09:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(python): Enable collection with gpu engine #17550

feat(python): Enable collection with gpu engine #17550

wence- commented Jul 10, 2024

wence- commented Jul 10, 2024

codecov bot commented Jul 10, 2024 •

edited

Loading

jfpuget commented Jul 18, 2024

ritchie46 commented Jul 19, 2024

wence- commented Jul 19, 2024

jfpuget commented Jul 22, 2024

ritchie46 commented Jul 22, 2024

jfpuget commented Jul 22, 2024 •

edited

Loading

jfpuget commented Jul 22, 2024

wence- commented Jul 22, 2024

jfpuget commented Jul 22, 2024

wence- commented Jul 22, 2024

lithomas1 left a comment

ritchie46 commented Jul 25, 2024 •

edited

Loading

feat(python): Enable collection with gpu engine #17550

feat(python): Enable collection with gpu engine #17550

Conversation

wence- commented Jul 10, 2024

wence- commented Jul 10, 2024

codecov bot commented Jul 10, 2024 • edited Loading

Codecov Report

jfpuget commented Jul 18, 2024

ritchie46 commented Jul 19, 2024

wence- commented Jul 19, 2024

jfpuget commented Jul 22, 2024

ritchie46 commented Jul 22, 2024

jfpuget commented Jul 22, 2024 • edited Loading

jfpuget commented Jul 22, 2024

wence- commented Jul 22, 2024

jfpuget commented Jul 22, 2024

wence- commented Jul 22, 2024

lithomas1 left a comment

Choose a reason for hiding this comment

ritchie46 commented Jul 25, 2024 • edited Loading

codecov bot commented Jul 10, 2024 •

edited

Loading

jfpuget commented Jul 22, 2024 •

edited

Loading

ritchie46 commented Jul 25, 2024 •

edited

Loading