Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will cuDA standalone mode occupy cuda gpu resources? #18

Open
ou525 opened this issue Jan 26, 2024 · 4 comments
Open

Will cuDA standalone mode occupy cuda gpu resources? #18

ou525 opened this issue Jan 26, 2024 · 4 comments
Assignees
Labels

Comments

@ou525
Copy link

ou525 commented Jan 26, 2024

I want to deploy the yolov5 model on both gpu and dla at the same time. Will there be resource competition issues between the two? What I learned before is that dla has unsupported layers, such as yolov5, which will use cuda resources, resulting in a significant decrease in efficiency.

@lynettez lynettez self-assigned this Feb 1, 2024
@lynettez
Copy link
Collaborator

lynettez commented Feb 1, 2024

Using cuDLA requires all layers can be supported by DLA, we moved several unsupported layers into post-processing, so that it won't use GPU resource in the runtime. Comparing to cuDLA Hybrid mode, cuDLA Standalone mode won't create CUDA context, that will be no CUDA context switching overhead for multiple processes case.

@ou525
Copy link
Author

ou525 commented Feb 1, 2024

If so, after solving this problem #15, I can safely run different models on dla and gpu.

@ou525
Copy link
Author

ou525 commented Feb 4, 2024

I conducted tests, and when I executed the command with USE_DLA_STANDALONE_MODE=1 and USE_DETERMINISTIC_SEMAPHORE=1 along with another deep learning model program, the time taken increased significantly compared to running either one individually. It appears that these two options do have an impact.

@lynettez
Copy link
Collaborator

lynettez commented Sep 2, 2024

Then it should be due to bandwidth-bound. DLA and GPU both consume the same resource: system DRAM. The more bandwidth-bound a workload is, the higher the chances that both DLA and GPU will become bottlenecked for memory access when running in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants