-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy (Test) Improvements, main branch (2024.02.29.) #270
Copy (Test) Improvements, main branch (2024.02.29.) #270
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the code created a "SEH exception" in the CUDA tests on Windows. I'll have to hunt that down before this would go in... 🤔
The "SEH exceptions" on Windows turned out to come from this: https://forums.developer.nvidia.com/t/accessing-managed-memory-during-asynchronous-copies/ So I just disabled some tests on Windows... |
All of the "copy tests" are now instantiated in the same way, inside of single compilation units.
Unfortunately some issues still remain, which I'm in the process of finding / fixing.
Made the sycl_event and cuda_event types implicitly wait for their underlying events in their destructors, if the user did not wait for them explicitly. This is to avoid 99% of the asynchronous errors that I encountered during debugging. At the same time also made it possible to explicitly ignore such events, for the rare case where it may be needed. Finally, introduced VECMEM_FAIL_ON_ASYNC_ERRORS for building the project in a mode where asynchronous errors (users not explicitly ignoring or waiting for an event) would cause the program to terminate.
It seems that the implementation of CUDA managed memory is a bit more fragile on Windows than on Linux.
8c237f1
to
1007af4
Compare
With the conclusions coming out of https://forums.developer.nvidia.com/t/accessing-managed-memory-during-asynchronous-copies/ I considered for a bit if the |
Managed to hit "Enter" with my pinkie a bit too early... 😦
This is a follow-up from #268.
vecmem::abstract_event::ignore()
function, to allow clients to explicitly ignore a given synchronization event in their code if they want to. (In some rare cases this can be useful.)::sycl_event
and::cuda_event
implicitly wait for their underlying events in their destructors, if the user didn't do so explicitly.VECMEM_FAIL_ON_ASYNC_ERRORS
CMake option for building the project in a way that it would call std::terimate() in case it detects such an error. After considering throwing an exception at first, I went with callingstd::terminate()
in the end, because throwing exceptions in destructors "cleanly" is just a lot of hassle. I still needed to do something "drastic", since the way that we build the project, with pretty aggressive symbol hiding, there is no easy way of debugging such issues in a client project easily without the code doing something "drastic". 🤔vecmem::copy
becoming yet a bit more complicated. In order to avoid "internal waits" in its functions as much as possible.With all of this in place, all copy related issues are gone from my tests now... 🤞