Improve offsets-generation and uniformity offset count for the N-1 case. #219

pkoutoupis · 2020-03-17T15:25:16Z

Patch to improve the efficiency of the offsets-generation and uniformity of offset count for the N-1 case.

This patch addresses the following:

IOR generates an array for the sequence of offsets. The GetOffsetArraySequential() and GetOffsetArrayRandom() function calls are included in the timed test. For the Random offsets, this can potentially be a significant overhead with very small offsets to very large files and impact the reported performance. Both of the GetOffsetArray functions are rewritten to improve efficiency.
For the random offset array generated for the single-shared-file (N-1) case, this includes a random number of offsets for processes such that the load balance is not uniform. With the change to GetOffsetArrayRandom(), the single-shared-file case now has a uniform number of offsets per rank.

glennklockwood

Thanks for cleaning this up Petros. The changes to GetOffsetArraySequential look like a big improvement in hygiene; thanks for contributing that. One minor typo in variable name, but otherwise it all looks good to me.

I'm having a harder time understanding the changes to GetOffsetArrayRandom to divide it into two parts. I would feel better if someone else with more late-night brainpower could take a closer look to really understand the second phase. @JulianKunkel or @johnbent ?

If nobody is available I'll see if someone at NERSC can look at it.

glennklockwood · 2020-05-14T05:57:41Z

src/ior.c

-                                offsetArray[k] +=
-                                        (i * test->numTasks * test->blockSize)
-                                        + (pretendRank * test->blockSize);
+        IOR_offset_t next_offset, incrament, trips;


incrament should be increment no?

Yup. Not sure what I was thinking. Must have been a bad substitute. Thanks for catching it.

johnbent · 2020-05-14T15:48:44Z

src/ior.c

+                        offsetArray[i] = next_offset;
+                        next_offset += incrament;
+                }
+        } else {


Maybe add a '/* single shared file */' comment after '} else {'.

johnbent · 2020-05-14T15:50:20Z

src/ior.c

+                                offsetArray[i] = next_offset;
+                                next_offset += incrament;
+                        }
+                } else {        /* usual case is segmentCount=1 */


Do we need an explicit test for segmentCount==1? What happens here if it isn't?

johnbent · 2020-05-14T15:51:42Z

src/ior.c

        /* reorder array */
-        for (i = 0; i < offsets; i++) {
-                value = rand() % offsets;
+        if (offsets < 2*first) first = offsets;


What is special about the number 2 here?

johnbent · 2020-05-14T15:52:33Z

src/ior.c

-                }
-        }
+        /* start with same array of offsets as in sequential case */
+        offsetArray = GetOffsetArraySequential(test, pretendRank);


Here we are in the random branch? But we start with sequential? Does this make it less random than it used to be?

johnbent · 2020-05-14T15:54:09Z

src/ior.c

+                                value = start[j];
+                                tmp = offsetArray[value+k];
+                                offsetArray[value+k] = offsetArray[i+seq];
+                                offsetArray[i+seq] = tmp;


I wonder if the three previous lines would look a bit cleaner with a helper function:
swapArrayElements(offsetArray,value+k,i+seq);
I think this would look better and be slightly easier to read.

JulianKunkel

This is quite a bunch of changes.
I believe we should have a unit test for the function to allow proper testing of these changes.
That would help to verify in the long run that the code remains stable.
I would be grateful if you could add such a test.

pkoutoupis · 2020-05-19T19:11:49Z

@JulianKunkel Is there a document or format that I can reference for a unit test?

glennklockwood · 2020-05-20T00:44:03Z

We've struggled to decide on a proper testing framework, and at present, it is limited to what's in testing/*-tests.sh which essentially runs a full (but small) IOR test.

I am not enough of a computer scientist to know a reasonable way to test randomness in the offset generation subroutine. Hopefully someone else can help. Worst case, -vvv can be passed to dump exact offsets during a -z run and we have a postprocessor in the testing/ directory that asserts that the random offsets are random enough...maybe?

JulianKunkel · 2020-05-20T16:48:21Z

I had started the testing under src/test when the API was created. I have simplified adding a test now and added one for the "sequential". Should hopefully be easier to add more tests now.

JulianKunkel · 2020-11-05T11:56:35Z

To collect some more evidence, I have done some benchmarking on my laptop regarding the current strategy in order to reconsider. At the moment, all the offsets are precomputed (even for sequential access), this increases the memory footprint and the runtime. When using stonewalling with very high numbers it may cause performance issues.
Test ran with 1,000,000,000 segments. Takes about 7.5 GByte memory (8 Byte per offset).
With sequential, takes 3.5s to precompute.
Random (-z), 183s which seems significant at first glance.
However, this means 5,470,676 offsets are computed per second. Even for 4096-byte accesses, the file would be 3.73 TByte large for this testcase. That looks a bit extreme even for the next 10 years.

IMHO it would be best to integrate the calculation of the offsets into the processing instead of pre-computing them. That would avoid the memory consumption and the overall need.
For random, though, this would imply that no full shuffling of all segments can be done anymore. Similar to this patch, I consider that it is likely OK to have a pattern with a number of segments defined by the user (e.g., 1 million) that are shuffled, and then repeat this pattern over and over again. That would still result in random access over a range of 4 GByte which seems to me OK. I'll inclined to test such a patchset.

JulianKunkel · 2020-11-06T12:49:06Z

I have now implemented the new behavior in the PR listed above and done quite some testing.
To make a note about performance:

there is now no storage needed for sequential and no precompute.
the example I gave above: with 1,000,000,000 segments, could be rewritten to for example use a blocksize of 1/100th of the segments listed, these the time to pre-calculate offsets (needed only 1x as before) would be ~1.8s, the area that is truly random would then be 1/100th of the overall file size, e.g. 37 GByte out of the 3.73 TByte. I personally consider that to be random enough (otherwise, could use bigger block size) which then though needs more time and memory to pre-calculate/store these offsets.

JulianKunkel · 2020-11-09T18:02:01Z

I'm closing this in favor of: Fix offset integration #270

Improve offsets-generation and uniformity offset count for the N-1 case.

91030bf

glennklockwood reviewed May 14, 2020

View reviewed changes

johnbent requested changes May 14, 2020

View reviewed changes

JulianKunkel requested changes May 14, 2020

View reviewed changes

JulianKunkel added a commit that referenced this pull request May 20, 2020

Clarified some issues in the test framework and added example test. #219

a536649

JulianKunkel mentioned this pull request Nov 4, 2020

Enable random seed to be stored. #268

Merged

JulianKunkel mentioned this pull request Nov 6, 2020

Fix offset integration #270

Merged

JulianKunkel closed this Nov 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve offsets-generation and uniformity offset count for the N-1 case. #219

Improve offsets-generation and uniformity offset count for the N-1 case. #219

pkoutoupis commented Mar 17, 2020

glennklockwood left a comment

glennklockwood May 14, 2020

pkoutoupis May 19, 2020

johnbent May 14, 2020

johnbent May 14, 2020

johnbent May 14, 2020

johnbent May 14, 2020

johnbent May 14, 2020

JulianKunkel left a comment

pkoutoupis commented May 19, 2020

glennklockwood commented May 20, 2020

JulianKunkel commented May 20, 2020

JulianKunkel commented Nov 5, 2020 •

edited

Loading

JulianKunkel commented Nov 6, 2020

JulianKunkel commented Nov 9, 2020

Improve offsets-generation and uniformity offset count for the N-1 case. #219

Improve offsets-generation and uniformity offset count for the N-1 case. #219

Conversation

pkoutoupis commented Mar 17, 2020

glennklockwood left a comment

Choose a reason for hiding this comment

glennklockwood May 14, 2020

Choose a reason for hiding this comment

pkoutoupis May 19, 2020

Choose a reason for hiding this comment

johnbent May 14, 2020

Choose a reason for hiding this comment

johnbent May 14, 2020

Choose a reason for hiding this comment

johnbent May 14, 2020

Choose a reason for hiding this comment

johnbent May 14, 2020

Choose a reason for hiding this comment

johnbent May 14, 2020

Choose a reason for hiding this comment

JulianKunkel left a comment

Choose a reason for hiding this comment

pkoutoupis commented May 19, 2020

glennklockwood commented May 20, 2020

JulianKunkel commented May 20, 2020

JulianKunkel commented Nov 5, 2020 • edited Loading

JulianKunkel commented Nov 6, 2020

JulianKunkel commented Nov 9, 2020

JulianKunkel commented Nov 5, 2020 •

edited

Loading