-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make LLVM Profiling robust for multithreaded programs #47778
Make LLVM Profiling robust for multithreaded programs #47778
Conversation
It's not super clear to me what the purpose of this PR is. Is the intent to allow access to these timings within Julia code? If so, why is the timing struct embedding a vector<> and not a julia array? Also, the previous implementation of timing reporting should have been thread-safe, albeit not manipulable from Julia, since we locked before ever operating on the stream. |
@pchintalapudi This is very much a work in progress :) |
Exactly, we want to access the timings from within Julia while potentially having another thread do code generation. Interesting, we thought that we might run into a case where we unset the streams, and codgen is trying to write to it, finding a null pointer. Where does the locking happen? |
@pchintalapudi Does the actual optimization in |
🤔 @bachdavi: now that i'm looking at the PR change, we maybe don't want to hold the lock over the whole LLVM optimization, since that would force serialization of two different threads doing LLVM optimization if we were profiling.. So i think that the string-buffer approach is probably the best one. 👍 Sorry i didn't think of this when we discussed it earlier! Thanks for the quick change today :) |
We actually specifically want to avoid holding a lock during optimization, so that in the future multiple modules can be optimizing at the same time. Rather than write to a string first, I would say holding the data in variables and dumping the entire yaml block post-optimization using those variables would be a better solution. Guess I got sniped by @NHDaly :) |
💡 @pchintalapudi, yeah, just keeping the results in variables seems even better. 👍 👍 |
As an aside, we currently don't measure machine code optimization and emission time at the moment (this is in jitlayers.cpp, CompilerT::operator()) although this probably constitutes a decent fraction of optimization time. That may also be something of interest if you're looking to collect optimization statistics. |
I was wondering about that, @pchintalapudi: Does that get measured in |
It does get measured by |
This uses a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks, @bachdavi!
I've updated the description as well, to explain this situation better.
@pchintalapudi: Can you do a final review pass as well?
Friendly ping for @pchintalapudi. Do you want to review? We've been using this in production at RAI and so far it seems to be working correctly. Can we merge this? CC: @IanButterworth as well, since this is profiling related. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems fine to me
Needs a rebase |
Also, are the test failures here real? Or unrelated? |
|
Mh, it seems so indeed! |
Thanks Valentin and David. The remaining failures look unrelated:
and
|
* Use stringsteam to atomically write LLVM opt timings * Add boolean to ensure we don't _only_ write the after block * Use ios_printf Co-authored-by: Nathan Daly <NHDaly@gmail.com> (cherry picked from commit 8985403)
This is a draft PR to make LLVM optimization logging correct for multithreaded programs.
It guards against the case where you enable or disable LLVM optimization logging (
@snoopl
) in one thread, while another thread is midway through an ongoing optimization, or stopping it before it's finished. Without this PR, this can result in a malformed .yaml file, since the logging might start with the "after" clause, or it might only contain the "begin" clause.After this PR, such cases will simply not be logged, and only LLVM optimizations that occurred completely while logging was enabled would be logged.