Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement scaling across multiple GPUs #566

Open
RobbeSneyders opened this issue Oct 29, 2023 · 1 comment
Open

Implement scaling across multiple GPUs #566

RobbeSneyders opened this issue Oct 29, 2023 · 1 comment
Labels
Infrastructure Infrastructure and deployment

Comments

@RobbeSneyders
Copy link
Member

RobbeSneyders commented Oct 29, 2023

By @PhilippeMoussalli:

Conclusions from #489

Need to find a way to scale across GPUs, possible options:

  • Multiple GPUs can be loaded for inference using pytorch Data Parallelism (this does not work on every model) in order to parallelize the batches across multiple GPUs. One important consideration there is to use either a single threaded scheduler (not recommended) or to limit the number of workers to be the same as the number of GPU cores dask.config.set(num_workers=<#GPU>) to avoid running into issues. Other alternatives could include assigning GPUs to spawned processes (not tested yet).

  • Other option could be to test out the [LocalCUDACluster] which seems like the intended way to run GPU components with Dask. Requires testing it and how it interacts with pytorch

  • Whether to run a model using the processes or threaded scheduler (so far, the threaded scheduler has shown to be faster). However, most resources seem to indicate to use threads (link).

Open questions:

  • How to parallelize GPU and CPU tasks efficiently: limiting the number of workers can leave some workers/CPU cores idle (when #GPU in one machine is larger than the number of cores). There is some room for optimization.
@RobbeSneyders RobbeSneyders converted this from a draft issue Oct 29, 2023
@RobbeSneyders RobbeSneyders added the Infrastructure Infrastructure and deployment label Dec 18, 2023
@RobbeSneyders
Copy link
Member Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Infrastructure Infrastructure and deployment
Projects
Status: Breakdown
Development

No branches or pull requests

1 participant