-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize compile times #179
Comments
Came across this post on how |
Also add a |
See also #239 |
As part of optimizing compile time, we tried the following approach. We use Mushroombody algorithm to optimise compile time. Currently, a number of different code object files are created. For the optimization, we merged the code object files and some additional files (synapses_classes, network. objects, rand, run) in the parent directory. The make file was also updated accordingly. Although we tried to merge all the code_objects a particular subset of files were left untouched (file type : *_synapses_create_generator_codeobject). This was because there was some merge clash which would lead to changing in variable names and function definition. In our project only 3 such files were present. The time taken to compile and the standard deviation (sample size 10) is noted below for the 3 modes : brian2cuda (post optimization ) , brian2cuda (pre optimization) and brian2genn . We also ran two modes of
|
Generally, compilation in brian2cuda seems to take significantly longer than in brian2 or brian2genn. Since this is essential for the user experience, I would argue its worth investing some time into this.
I have very little experience with compilation optimization, so would need to do some research. And currently the compilation is implemented to "just work somehow" and zero optimized.
But here are few things I came across in the past that could be relevant:
#include
ed where. Currently, it is quite a mess and there are certainly unnecessary includes here and there.__noinline__
qualifiers to reduce compile times. But this could potentially reduce kernel runtimes. Check out this post for some discussion on this. If it is a trade-off kernel execution time vs compile time, it might make sense to use__noinline__
for short simulations (find some heuristic on runtime), where compile time is larger than simulation time and to use inlinine for long simulations only, where simulation time is longer than compile time. Maybe adding aprefs
would make sense here.__restrict__
keywords, see Use nvcc compatible __restrict__ keyword for compiler optimisation #53The text was updated successfully, but these errors were encountered: