Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor Performance Regression Fixes #314

Conversation

insertinterestingnamehere
Copy link
Collaborator

These are some minor performance fixes to help fix regressions downstream

In particular, this removes an unnecessary thread fence in the IO code, and updates the MACHINE_FENCE macro to use a stronger type of memory fence (since this somehow results in better performance in practice).

The trylock patch is just a simplification that I wrote while attempting to debug the performance regressions. That should be performance neutral.

@@ -87,7 +87,7 @@ using std::memory_order_relaxed;

#include "macros.h"

#define MACHINE_FENCE atomic_thread_fence(memory_order_acq_rel);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is change is needed for correctness or for performance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprisingly, performance. I'm still confused as to how/why this is happening, but for the chapel benchmark I'm using to track the regression there's a modest but consistent performance improvement from using the sequentially consistent fences instead of just acquire release consistent. This matches what the old assembly based version was doing prior to b122234 anyway, but it's still weird.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of anything in our codebase that ought to actually require sequential consistency for correctness' sake. The fact that sequential consistency is beneficial here seems to indicate that there's something suboptimal going on internally. This patch at least corrects the downstream performance regression though.

@insertinterestingnamehere insertinterestingnamehere merged commit af98f00 into sandialabs:release-1.22-pre Nov 26, 2024
293 of 295 checks passed
jabraham17 added a commit to chapel-lang/chapel that referenced this pull request Dec 12, 2024
Backporting two more performance fixes from
sandialabs/qthreads#314.

[Contributed by @insertinterestingnamehere. Reviewed and merged by @jabraham17]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants