Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

miniBUDE Timer Updates #5

Merged
merged 24 commits into from
Feb 16, 2024
Merged

miniBUDE Timer Updates #5

merged 24 commits into from
Feb 16, 2024

Conversation

pranav-sivaraman
Copy link
Collaborator

@pranav-sivaraman pranav-sivaraman commented Feb 2, 2024

  • Double check data movement
  • Add configurable warmup iterations

@pranav-sivaraman pranav-sivaraman changed the title miniBUDE Timer and Tuning Updates miniBUDE Timer Updates Feb 7, 2024
@pranav-sivaraman pranav-sivaraman marked this pull request as ready for review February 7, 2024 22:50
Copy link

@jhdavis8 jhdavis8 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, two small changes to discuss

src/acc/fasten.hpp Show resolved Hide resolved
src/omp/fasten.hpp Show resolved Hide resolved
@jhdavis8
Copy link

@joy-kitson would you be able to review this one as well?

Copy link

@joy-kitson joy-kitson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like in most of the models you're timing allocation under the umbrella of host to device data movement, unless I'm missing something. I think it might be worth discussing whether or not this is the intended behavior for our timers, as if it is I think I need to make some changes in Cloverleaf (and if memory serves, we may need to do so in some of the other apps as well).

host(energies[:nposes])

auto deviceToHostEnd = now();
sample.deviceToHost = {deviceToHostStart, deviceToHostEnd};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we're saving the start and end times rather than just the time elapsed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below

@@ -85,9 +86,10 @@ struct Sample {
size_t ppwi, wgsize;
std::vector<float> energies;
std::vector<std::pair<TimePoint, TimePoint>> kernelTimes;
std::optional<std::pair<TimePoint, TimePoint>> contextTime;
std::optional<std::pair<TimePoint, TimePoint>> hostToDevice;
std::optional<std::pair<TimePoint, TimePoint>> deviceToHost;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it makes sense to do it this way if that's what's in the existing code, but it does still seem rather unorthodox, and also doesn't quite line up with what we do in the other codes

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final output is just the elapsed time so I think it shouldn't be a problem copying the way they do it.

src/cuda/fasten.hpp Show resolved Hide resolved
@pranav-sivaraman pranav-sivaraman merged commit 0aca8e0 into v2 Feb 16, 2024
@pranav-sivaraman pranav-sivaraman deleted the timer-updates branch March 2, 2024 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants