refactor sync_chunk_connections to remove barrier during fields::step #1635

stevengj · 2021-06-24T21:35:01Z

oskooi · 2021-06-25T06:57:10Z

I ran some benchmarks comparing this branch and master for a test problem based on a 3d OLED simulation using a single-machine with Intel Xeon 2.0 GHz processors with 10, 20, and 40 MPI processes. The benchmark involved time stepping the fields for some number of timesteps after the pulsed source had turned off to ensure that the fields arrays contain non-zero values. The timers were reset after the first time step to remove the time for the array initialization, etc. from the timing statistics. The results shown below are for the test case involving 10 processes and the results for 20 and 40 processes are similar.

There are two things to note from these results:

first figure: there is almost no change in the time stepping rate and also the total runtime of the simulation.
second figure: the timing statistics for this branch (labelled "PR # 1635") shows that the time for all-to-all communication for all processes are zero but the time for the point-to-point communication has increased. ~~This is probably just an error in the timing statistics but the key point is that the total MPI time does not change relative to master.~~

As a result, there does not seem to be any noticeable performance gains using this branch which is unexpected.

stevengj · 2021-06-25T12:46:54Z

It could be that to see a significant timing impact from a barrier you need to be on a large distributed-memory cluster?

It could be that the only reason the all-to-all communications (the barrier) are taking 12 seconds is due to imperfect load-balancing, i.e. waiting for the slowest process to catch up. In that case, it makes total sense that the barrier time is shifted to the point-to-point communications, because ultimately all the processors are synchronized by this anyway.

(I thought you guys did some profiling which was why you were complaining about this barrier?)

ahoenselaar · 2021-06-25T22:00:43Z

We do see an improvement in our profiles. With the changes in this PR, cycles spent on connecting chunks effectively disappear.

stevengj · 2021-06-30T01:09:08Z

We do see an improvement in our profiles. With the changes in this PR, cycles spent on connecting chunks effectively disappear.

Are you sure that those cycles aren't simply shifting to someplace else, as in Ardavan's test?

ahoenselaar · 2021-06-30T01:15:40Z

Given the noise in these benchmarks and the small fraction of <1% of cycles before the PR, I cannot determine that with a reasonable investment of effort.

stevengj · 2021-06-30T14:06:30Z

I'll go ahead and merge this, since it can't hurt to eliminate an Allreduce on every timestep. I'm not convinced that we're actually seeing a net performance benefit (as opposed to an accounting change), however.

…NanoComp#1635)

refactor sync_chunk_connections to remove barrier during fields::step

8dc8625

stevengj requested a review from oskooi June 24, 2021 21:48

oskooi approved these changes Jun 26, 2021

View reviewed changes

stevengj merged commit ff1b75f into master Jun 30, 2021

stevengj deleted the sync_chunk_connections branch June 30, 2021 14:06

bencbartlett pushed a commit to bencbartlett/meep that referenced this pull request Sep 9, 2021

refactor sync_chunk_connections to remove barrier during fields::step (…

bd71b86

…NanoComp#1635)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor sync_chunk_connections to remove barrier during fields::step #1635

refactor sync_chunk_connections to remove barrier during fields::step #1635

stevengj commented Jun 24, 2021

oskooi commented Jun 25, 2021 •

edited

Loading

stevengj commented Jun 25, 2021 •

edited

Loading

ahoenselaar commented Jun 25, 2021

stevengj commented Jun 30, 2021

ahoenselaar commented Jun 30, 2021

stevengj commented Jun 30, 2021

refactor sync_chunk_connections to remove barrier during fields::step #1635

refactor sync_chunk_connections to remove barrier during fields::step #1635

Conversation

stevengj commented Jun 24, 2021

oskooi commented Jun 25, 2021 • edited Loading

stevengj commented Jun 25, 2021 • edited Loading

ahoenselaar commented Jun 25, 2021

stevengj commented Jun 30, 2021

ahoenselaar commented Jun 30, 2021

stevengj commented Jun 30, 2021

oskooi commented Jun 25, 2021 •

edited

Loading

stevengj commented Jun 25, 2021 •

edited

Loading