-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Minor Performance Regression Fixes #314
Minor Performance Regression Fixes #314
Conversation
MACHINE_FENCE macro. Somehow this results in better performance for some chapel benchmarks.
26f1469
to
59b2236
Compare
@@ -87,7 +87,7 @@ using std::memory_order_relaxed; | |||
|
|||
#include "macros.h" | |||
|
|||
#define MACHINE_FENCE atomic_thread_fence(memory_order_acq_rel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is change is needed for correctness or for performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surprisingly, performance. I'm still confused as to how/why this is happening, but for the chapel benchmark I'm using to track the regression there's a modest but consistent performance improvement from using the sequentially consistent fences instead of just acquire release consistent. This matches what the old assembly based version was doing prior to b122234 anyway, but it's still weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of anything in our codebase that ought to actually require sequential consistency for correctness' sake. The fact that sequential consistency is beneficial here seems to indicate that there's something suboptimal going on internally. This patch at least corrects the downstream performance regression though.
af98f00
into
sandialabs:release-1.22-pre
Backporting two more performance fixes from sandialabs/qthreads#314. [Contributed by @insertinterestingnamehere. Reviewed and merged by @jabraham17]
These are some minor performance fixes to help fix regressions downstream
In particular, this removes an unnecessary thread fence in the IO code, and updates the MACHINE_FENCE macro to use a stronger type of memory fence (since this somehow results in better performance in practice).
The trylock patch is just a simplification that I wrote while attempting to debug the performance regressions. That should be performance neutral.