-
Notifications
You must be signed in to change notification settings - Fork 174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Improved FPE monitoring #2157
Conversation
Co-authored-by: Andreas Stefl <stefl.andreas@gmail.com>
Co-authored-by: Andreas Stefl <stefl.andreas@gmail.com>
Codecov Report
@@ Coverage Diff @@
## main #2157 +/- ##
==========================================
- Coverage 49.37% 49.36% -0.02%
==========================================
Files 446 445 -1
Lines 25290 25259 -31
Branches 11657 11646 -11
==========================================
- Hits 12488 12468 -20
+ Misses 4515 4511 -4
+ Partials 8287 8280 -7 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
📊 Physics performance monitoring for 8298167Summary VertexingSeedingCKFAmbiguity resolutionTruth tracking (Kalman Filter)Truth tracking (GSF) |
It's green. Let's quickly merge it before it breaks again @andiwand 😅 |
Need to debug the compiler segfault on Monday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets get this in
…roject#2086) Our full chain pulls are in a bad state. Looks like the reconstruction and simulation energy loss did not match up. This PR switches the Fatras interactions on which should bring our pulls back to standard normal distribution. Fixes - acts-project#1643 Blocked by - acts-project#2157 - acts-project#2239 - acts-project#2295 - acts-project#2293 - acts-project#2294
Overall the goal is to not fail a job when an FPE occurs, but to mask that FPE type in the signal handler, take a stack trace, resume execution. The sequencer can then demask the type again for the next algorithm. Overall I implemented the resuming based on discussion with @stephenswat and only for x86_64 for now. It keeps stack traces, accumulates them across algorithms / events / threads, deduplicates stack traces, and can print a summary at the end, looking something like this:
Currently, this doesn't fail the job, and the plan is to implement a masking mechanism based on the top level stack frame source file and line, as well as summation by algorithm / reader / writer, rather than just one global one.