-
Notifications
You must be signed in to change notification settings - Fork 996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't profile benchmark with ncu, nsys #183
Comments
@juney-nvidia Thanks, it seems to have become possible to use the ncu profiler, but nsys still doesn't work. Is there any way to use nsys profiler? |
Have you tried to run with smaller batch size, smaller in/out length to see whether the issue still exist? And what is hardware you are using? June |
@juney-nvidia Because running benchmarks with on-air engine build is much more comfortable, it would be great if there's a way to use it with nsys. |
Thanks for the feedback @WyldeCat . It would also mean that you would "pollute" your NSYS trace with a lot of extra kernel launches that are not relevant for your application (all the auto-tuning done by TensorRT) and you will end up with a much bigger NSYS output file. I'm pretty sure it would make the analysis of the NSYS report a lot harder. For now, I'm going to close the issue as "closed" but feel free to open a "feature request" if you think it's really a needed feature. |
Tried to profile llama 7b benchmark but failed to obtain reports.
When using nsys, the following error occurs
What do I need to do to get reports?
The text was updated successfully, but these errors were encountered: