-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL telescope Kalman fitter tests fail in SYCL with OneAPI 2024.2 #655
Comments
stephenswat
added a commit
to stephenswat/traccc
that referenced
this issue
Jul 26, 2024
As shown in acts-project#655, this is creating a lot of headache. I am looking for a fix but in the meanwhile this is holding up acts-project#628, so I want to temporarily disable these tests.
stephenswat
added a commit
to stephenswat/traccc
that referenced
this issue
Jul 26, 2024
As shown in acts-project#655, this is creating a lot of headache. I am looking for a fix but in the meanwhile this is holding up acts-project#628, so I want to temporarily disable these tests.
stephenswat
added a commit
to stephenswat/traccc
that referenced
this issue
Jul 26, 2024
As shown in acts-project#655, this is creating a lot of headache. I am looking for a fix but in the meanwhile this is holding up acts-project#628, so I want to temporarily disable these tests.
This sounds similar to what I see in algebra plugins now. I ran the same instructions, but this time on algebra-plugins, and I see the following:
So, the problem might already be in the linear algebra implementation? Interestingly, only the |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here is one of the most profoundly bewildering bugs I have ever seen. The Kalman fitter tests in telescope geometries don't work in SYCL with OneAPI 2024.2.
Reproduction
To reproduce the bug, perform the following set of actions:
This will produce the following error:
So, we have a segmentation fault in this executable.
Diagnostics
At this point you may, just like I did, naively assume that this is some memory error in our code. Wouldn't that be nice and easy to fix. But nothing could be less true, as gdb shows us:
So the issue is not really on our end per se, it's happening in Intel's SPIR compiler. Aight.
Workarounds
This is where it gets truly spicy. I've been able to identify two different ways that the segmentation fault can be avoided (of course, these all break the actual test; but they make it run), here they are:
simulation/include/traccc/simulation/simulator.hpp
, comment out line 97 (p.propagate(propagation, actor_states);
).core/include/traccc/fitting/kalman_filter/kalman_fitter.hpp
, comment out lines 185 (propagator.propagate(propagation, fitter_state());
) and 188 (smooth(fitter_state);
).These functions are completely independent, and one of them runs on the host, the other runs on the device. Lmao.
Conclusion
I don't even know at this point, but most certainly there is something very funky happening in OneAPI right now. It could also be some subtle bug in our code, but I haven't been able to find it.
The text was updated successfully, but these errors were encountered: