Optimize GPU components #489

PhilippeMoussalli · 2023-10-05T09:19:08Z

PR that modifies all current GPU components by:

Batching both the preprocessing and inference to avoid OOM issues
Disabling a pytorch API that caused illegal memory access issues. More on this issue here

PhilippeMoussalli · 2023-10-05T09:20:06Z

examples/pipelines/datacomp/components/mask_images/src/main.py

-            r, g, b = tuple(avg_color)
-            draw.rectangle(((x1, y1), (x2, y2)), fill=(int(r), int(g), int(b)))
+
+            if cropped_image.any():


This is not really related to this PR but is needed to tackle edge cases

RobbeSneyders

Thanks @PhilippeMoussalli! We might want to create a model inference component in the future which packages the batching functionality, so users only need to implement their code per batch.

RobbeSneyders · 2023-10-05T09:42:24Z

components/caption_images/src/main.py

@@ -12,75 +13,97 @@

 logger = logging.getLogger(__name__)

+os.environ['CUDA_LAUNCH_BLOCKING'] = "1"


We no longer need this, right? It was just for debugging I think.
Same in the other components.

Kept it in case we run into other issues later on and we need further debugging. We can remove it later on once we've tested enough GPU components at scale and are sure that everything runs fine

I don't think it helped us since are already running single-threaded. It also didn't change the stacktrace, the first one was already correct. And it seems like it really should not be used when not debugging.

oh ok, I'll remove it then

RobbeSneyders

Thanks!

PhilippeMoussalli · 2023-10-11T14:00:56Z

When using the default Dask scheduler (threaded), it is important to take into account that all the GPU related processing (preprocessing, inference) has to be batched to avoid running into OOM issues.

To scale the model efficiently, multiple GPUs can be loaded for inference using pytorch Data Parallelism (this does not work on every model) in order to parallelize the batches across multiple GPUs. One important consideration there is to use either a single threaded scheduler (not recommended) or to limit the number of workers to be the same as the number of GPU cores dask.config.set(num_workers=<#GPU>) to avoid running into issues. Other alternatives could include assigning GPUs to spawned processes (not tested yet).

In order to test and diagnose GPU components both nvtop and htop can be used to monitor GPU and CPU usage. This can help identify bottlnecks and pinpoint whether a GPU component is compute or memory bound.

Further things that still need to be clarified:

Whether to run a model using the processes or threaded scheduler (so far, the threaded scheduler has shown to be faster). However, most resources seem to indicate to use threads (link).
How to parallelize GPU and CPU tasks efficiently: limiting the number of workers can leave some workers/CPU cores idle (when #GPU in one machine is larger than the number of cores). There is some room for optimization.

optimize GPU components

69e9789

PhilippeMoussalli requested a review from RobbeSneyders October 5, 2023 09:19

PhilippeMoussalli commented Oct 5, 2023

View reviewed changes

Merge branch 'main' into optimize-gpu-components

caa649a

RobbeSneyders reviewed Oct 5, 2023

View reviewed changes

RobbeSneyders approved these changes Oct 5, 2023

View reviewed changes

remove cuda debugging

0a255da

PhilippeMoussalli force-pushed the optimize-gpu-components branch from afa779d to 0a255da Compare October 5, 2023 11:49

PhilippeMoussalli closed this Oct 5, 2023

PhilippeMoussalli reopened this Oct 5, 2023

PhilippeMoussalli merged commit 015dc0a into main Oct 5, 2023
8 checks passed

PhilippeMoussalli deleted the optimize-gpu-components branch October 5, 2023 11:56

RobbeSneyders mentioned this pull request Oct 29, 2023

Implement scaling across multiple GPUs #566

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize GPU components #489

Optimize GPU components #489

PhilippeMoussalli commented Oct 5, 2023

PhilippeMoussalli Oct 5, 2023

RobbeSneyders left a comment

RobbeSneyders Oct 5, 2023

PhilippeMoussalli Oct 5, 2023

RobbeSneyders Oct 5, 2023

PhilippeMoussalli Oct 5, 2023

RobbeSneyders left a comment

PhilippeMoussalli commented Oct 11, 2023

		@@ -12,75 +13,97 @@

		logger = logging.getLogger(__name__)

		os.environ['CUDA_LAUNCH_BLOCKING'] = "1"

Optimize GPU components #489

Optimize GPU components #489

Conversation

PhilippeMoussalli commented Oct 5, 2023

PhilippeMoussalli Oct 5, 2023

Choose a reason for hiding this comment

RobbeSneyders left a comment

Choose a reason for hiding this comment

RobbeSneyders Oct 5, 2023

Choose a reason for hiding this comment

PhilippeMoussalli Oct 5, 2023

Choose a reason for hiding this comment

RobbeSneyders Oct 5, 2023

Choose a reason for hiding this comment

PhilippeMoussalli Oct 5, 2023

Choose a reason for hiding this comment

RobbeSneyders left a comment

Choose a reason for hiding this comment

PhilippeMoussalli commented Oct 11, 2023