-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi thread/device support #418
Conversation
Current issue is IO tests failing because the Singletons have already been init, when loading frm file. As such the setup env vars loaded from file code is not executed. |
Full test suite now passes on Linux. Will add the multi-thread, multi-device tests tomorrow. Then test RTC. |
One side-effect/by-product of multi-device is an issue we have when a thread exits in not device 0, and a new CUDASim is then created in the same thread, without having applyConfig() called, to change device back to device 0 (the default value found in config). |
4e55957
to
3804cdd
Compare
Didn't make much progress today.
Spent most the day chasing DeviceException test failures. Had some confusion with Mav, which had different errors. The failures still (only?) occur on my laptop and I'm yet to find a reasonable excuse for why they disappear as soon as I make a change. Also affects RTCDeviceException. Should really debug them with tests_dev to have a much faster build time. |
A bunch of message tests are failing/crashing out in release model (on both Windows/Linux), need to investigate. |
791af5f
to
64d9112
Compare
Initial status of RTC tests, first test passed (message), other 3 failed. The first does not use any environment variables, so this is likely the issue. |
Make Curve singleton thread-safe and unique per-device Make EnvironmentManager singleton thread-safe and unique per-device Add mutex locks around agent function execution, to prevent environment data getting changed by defragment during before execution). Add nvcc arg --default-stream per-thread to CMake (and then comment it out as it would require far more testing) Fix double inclusion of common.cmake if configuring with an example as root. Modify initialisation of CUDASimulation so that EnvironmentManager is not used before device has been selected. Add tests for multi-threading and multi-device CUDASimulation execution. Adds a reduced set of multi-thread and multi-device tests for RTC. Fix a bug, where rtc_offsets were reset to 0 when EnvironmentManager:defragment() was called. Required for #245 Closes #395
2eabec2
to
515db48
Compare
ToDo
EnvironmentManager
thread-safeCurve
thread-safeCudaSimulation
instance id, into Curve hashes.EnvironmentManager
multi-device awareCurve
multi-device awareCUDASimulation
multi-device aware--default-stream per-thread
to CMakeI think Pete said jitify complains about compiling in anything other than the main thread,(RTC compile works, doesn't print any warning it's just noticeably slower)CUDASimulation
execution.tests_dev
cmakeChangeLog
Curve
singleton thread-safe and unique per-deviceEnvironmentManager
singleton thread-safe and unique per-device--default-stream per-thread
to CMake (and then comment out as it would require far more testing)common.cmake
if configuring with an example as root.CUDASimulation
so thatEnvironmentManager
is not used before device has been selected.CUDASimulation
execution.EnvironmentManager:defragment()
was called.Known Issues
CUDASimulation
instance id incrementing each time, and this mutating the hashes used. The test does not collide in isolation, but atleast 1 test from the series always seems to collide if ran as a set. Might be trouble with hashes bunching, as there should only be a max of around 24/1024 hashes in use at any time during these tests.nsys
timeline for one of the multi-device test cases, appears to show the context creation for tertiary devices waiting for CUDA to become idle on already initialised devices. Seems important to initialise CUDA on all devices before launching sims. CallingcudaFree(nullptr)
on every device at the start of the test appears to work,--default-stream per-thread
removes some implicit device syncs that would otherwise be expected. Very important to sync on either side of agent functions.Relevant to #245
Closes #395 (if done properly)