Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Theoretical Timings #8

Open
npadmana opened this issue Aug 9, 2019 · 0 comments
Open

Theoretical Timings #8

npadmana opened this issue Aug 9, 2019 · 0 comments

Comments

@npadmana
Copy link
Owner

npadmana commented Aug 9, 2019

Using this issue to keep track of some notes on timings discussed on Gitter :

(from @ronawho)
As a rule of thumb -- ordered fine-grained comm can achieve ~80 MB/s per node injection rate, unordered fine grain comm can achieve ~400 MB/s injection rate, and ordered bulk comm can achieve 8 GB/s per node.

aries is capable of ~8 GB/s uni-directional, and I think 15 GB/s bidirectional.

"""For applications in which traffic is uniformly distributed from each node to each of the other nodes
(e.g., all-to-all), global bandwidth controls performance rather than the bisection — and all the optical
links contribute. Peak global bandwidth is 11.7 GB/s per node for a full network. With the payload
efficiency of 64 percent this equates to 7.5 GB/s per direction. """

Refs --
https://chapel-lang.org/perf/16-node-xc/?configs=gnuugniqthreads&graphs=smallarraygetperformance,largearraygetperformance,smallarrayputperformance,largearrayputperformance has those numbers

https://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf has more information on that than you'll ever want


The NPB-FT class D problem is a 32GB array. Our default configuration happens to be 8 locales, so 4 GB per locales. A single FFT requires both GET/PUT -ing this data twice, so a total of 4x4GB or 16 GB per FFT. If we don't overlap communication, then the best we can do (ignoring the actual FFT time) is 2s per FFT.

The YZ FFT (which is completely local) takes ~0.2s per iteration, so the total time for XYZ might be estimated at 0.3s per FFT.

So an estimate of the amount of time is ~2.3s per FFT or about 60s total. So maybe #4 is already at this limit, or very close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant