Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize compile times #179

Open
denisalevi opened this issue Aug 23, 2020 · 4 comments
Open

Optimize compile times #179

denisalevi opened this issue Aug 23, 2020 · 4 comments

Comments

@denisalevi
Copy link
Member

denisalevi commented Aug 23, 2020

Generally, compilation in brian2cuda seems to take significantly longer than in brian2 or brian2genn. Since this is essential for the user experience, I would argue its worth investing some time into this.

I have very little experience with compilation optimization, so would need to do some research. And currently the compilation is implemented to "just work somehow" and zero optimized.

But here are few things I came across in the past that could be relevant:

  • Cleaning up what is #includeed where. Currently, it is quite a mess and there are certainly unnecessary includes here and there.
  • Check out compilation with clang
  • Check out how brian2genn / genn does it
  • Check out how Spike does it, seems to be pretty efficient in compilation and execution of even small programs
  • One can use __noinline__ qualifiers to reduce compile times. But this could potentially reduce kernel runtimes. Check out this post for some discussion on this. If it is a trade-off kernel execution time vs compile time, it might make sense to use __noinline__ for short simulations (find some heuristic on runtime), where compile time is larger than simulation time and to use inlinine for long simulations only, where simulation time is longer than compile time. Maybe adding a prefs would make sense here.
  • __restrict__ keywords, see Use nvcc compatible __restrict__ keyword for compiler optimisation #53
@denisalevi
Copy link
Member Author

Came across this post on how nvcc calls gcc with -Wl,-start-group archive -Wl,-end-group, which might be quite iniefficient. Maybe there is a way to avoid that? (-Wl passed argument directly to the linker ld when passed to gcc)
https://forums.developer.nvidia.com/t/linking-process-with-nvcc-linker/63433/4

@denisalevi
Copy link
Member Author

Also add a make.deps dependency file such that recompilation can be sped up (I presume?). Just check out the C++ Standalone makefile, which does that. For CUDA, the options are different I think (-M instead of -MM), I didn't look into it in detail though. Alternatively, one can also pass the dependency file generation to the C++ compiler, but I guess that only is for host code dependencies then? See this post/comment

@denisalevi
Copy link
Member Author

See also #239

@SudeshnaBora
Copy link
Contributor

As part of optimizing compile time, we tried the following approach. We use Mushroombody algorithm to optimise compile time.

Currently, a number of different code object files are created. For the optimization, we merged the code object files and some additional files (synapses_classes, network. objects, rand, run) in the parent directory. The make file was also updated accordingly. Although we tried to merge all the code_objects a particular subset of files were left untouched (file type : *_synapses_create_generator_codeobject). This was because there was some merge clash which would lead to changing in variable names and function definition. In our project only 3 such files were present.

The time taken to compile and the standard deviation (sample size 10) is noted below for the 3 modes : brian2cuda (post optimization ) , brian2cuda (pre optimization) and brian2genn . We also ran two modes of make command , one simple and the other which utilises all available cpus.

time time -j
time (sec) std dev (sec)
time (sec) std dev (sec)
brian2cuda (pre optimization)
207.179 0.139
179.940 0.605
brian2cuda (post optimization)
49.287 0.119
42.423 0.113
brian2genn
11.419 0.014
11.149 0.01

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants