Theoretical Timings #8

npadmana · 2019-08-09T17:19:13Z

Using this issue to keep track of some notes on timings discussed on Gitter :

(from @ronawho)
As a rule of thumb -- ordered fine-grained comm can achieve ~80 MB/s per node injection rate, unordered fine grain comm can achieve ~400 MB/s injection rate, and ordered bulk comm can achieve 8 GB/s per node.

aries is capable of ~8 GB/s uni-directional, and I think 15 GB/s bidirectional.

"""For applications in which traffic is uniformly distributed from each node to each of the other nodes
(e.g., all-to-all), global bandwidth controls performance rather than the bisection — and all the optical
links contribute. Peak global bandwidth is 11.7 GB/s per node for a full network. With the payload
efficiency of 64 percent this equates to 7.5 GB/s per direction. """

Refs --
https://chapel-lang.org/perf/16-node-xc/?configs=gnuugniqthreads&graphs=smallarraygetperformance,largearraygetperformance,smallarrayputperformance,largearrayputperformance has those numbers

https://www.cray.com/sites/default/files/resources/CrayXCNetwork.pdf has more information on that than you'll ever want

The NPB-FT class D problem is a 32GB array. Our default configuration happens to be 8 locales, so 4 GB per locales. A single FFT requires both GET/PUT -ing this data twice, so a total of 4x4GB or 16 GB per FFT. If we don't overlap communication, then the best we can do (ignoring the actual FFT time) is 2s per FFT.

The YZ FFT (which is completely local) takes ~0.2s per iteration, so the total time for XYZ might be estimated at 0.3s per FFT.

So an estimate of the amount of time is ~2.3s per FFT or about 60s total. So maybe #4 is already at this limit, or very close.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Theoretical Timings #8

Theoretical Timings #8

npadmana commented Aug 9, 2019

Theoretical Timings #8

Theoretical Timings #8

Comments

npadmana commented Aug 9, 2019