-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
miniBUDE Timer Updates #5
Conversation
pranav-sivaraman
commented
Feb 2, 2024
•
edited
Loading
edited
- Double check data movement
- Add configurable warmup iterations
RAJA Update
Add scope guards to Kokkos views so they are destroyed before Kokkos finalize is called.
Spack injects its own compiler wrappers which rpath shared libraries for us and also any user defined flags. Therefore we do not want to set the compiler in this scenario.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, two small changes to discuss
@joy-kitson would you be able to review this one as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like in most of the models you're timing allocation under the umbrella of host to device data movement, unless I'm missing something. I think it might be worth discussing whether or not this is the intended behavior for our timers, as if it is I think I need to make some changes in Cloverleaf (and if memory serves, we may need to do so in some of the other apps as well).
host(energies[:nposes]) | ||
|
||
auto deviceToHostEnd = now(); | ||
sample.deviceToHost = {deviceToHostStart, deviceToHostEnd}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason we're saving the start and end times rather than just the time elapsed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See below
@@ -85,9 +86,10 @@ struct Sample { | |||
size_t ppwi, wgsize; | |||
std::vector<float> energies; | |||
std::vector<std::pair<TimePoint, TimePoint>> kernelTimes; | |||
std::optional<std::pair<TimePoint, TimePoint>> contextTime; | |||
std::optional<std::pair<TimePoint, TimePoint>> hostToDevice; | |||
std::optional<std::pair<TimePoint, TimePoint>> deviceToHost; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it makes sense to do it this way if that's what's in the existing code, but it does still seem rather unorthodox, and also doesn't quite line up with what we do in the other codes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The final output is just the elapsed time so I think it shouldn't be a problem copying the way they do it.