Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve offsets-generation and uniformity offset count for the N-1 case. #219

Closed

Conversation

pkoutoupis
Copy link
Contributor

Patch to improve the efficiency of the offsets-generation and uniformity of offset count for the N-1 case.

This patch addresses the following:

  1. IOR generates an array for the sequence of offsets. The GetOffsetArraySequential() and GetOffsetArrayRandom() function calls are included in the timed test. For the Random offsets, this can potentially be a significant overhead with very small offsets to very large files and impact the reported performance. Both of the GetOffsetArray functions are rewritten to improve efficiency.

  2. For the random offset array generated for the single-shared-file (N-1) case, this includes a random number of offsets for processes such that the load balance is not uniform. With the change to GetOffsetArrayRandom(), the single-shared-file case now has a uniform number of offsets per rank.

Copy link
Contributor

@glennklockwood glennklockwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning this up Petros. The changes to GetOffsetArraySequential look like a big improvement in hygiene; thanks for contributing that. One minor typo in variable name, but otherwise it all looks good to me.

I'm having a harder time understanding the changes to GetOffsetArrayRandom to divide it into two parts. I would feel better if someone else with more late-night brainpower could take a closer look to really understand the second phase. @JulianKunkel or @johnbent ?

If nobody is available I'll see if someone at NERSC can look at it.

offsetArray[k] +=
(i * test->numTasks * test->blockSize)
+ (pretendRank * test->blockSize);
IOR_offset_t next_offset, incrament, trips;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incrament should be increment no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup. Not sure what I was thinking. Must have been a bad substitute. Thanks for catching it.

offsetArray[i] = next_offset;
next_offset += incrament;
}
} else {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a '/* single shared file */' comment after '} else {'.

offsetArray[i] = next_offset;
next_offset += incrament;
}
} else { /* usual case is segmentCount=1 */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an explicit test for segmentCount==1? What happens here if it isn't?

/* reorder array */
for (i = 0; i < offsets; i++) {
value = rand() % offsets;
if (offsets < 2*first) first = offsets;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is special about the number 2 here?

}
}
/* start with same array of offsets as in sequential case */
offsetArray = GetOffsetArraySequential(test, pretendRank);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are in the random branch? But we start with sequential? Does this make it less random than it used to be?

value = start[j];
tmp = offsetArray[value+k];
offsetArray[value+k] = offsetArray[i+seq];
offsetArray[i+seq] = tmp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the three previous lines would look a bit cleaner with a helper function:
swapArrayElements(offsetArray,value+k,i+seq);
I think this would look better and be slightly easier to read.

Copy link
Collaborator

@JulianKunkel JulianKunkel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a bunch of changes.
I believe we should have a unit test for the function to allow proper testing of these changes.
That would help to verify in the long run that the code remains stable.
I would be grateful if you could add such a test.

@pkoutoupis
Copy link
Contributor Author

@JulianKunkel Is there a document or format that I can reference for a unit test?

@glennklockwood
Copy link
Contributor

We've struggled to decide on a proper testing framework, and at present, it is limited to what's in testing/*-tests.sh which essentially runs a full (but small) IOR test.

I am not enough of a computer scientist to know a reasonable way to test randomness in the offset generation subroutine. Hopefully someone else can help. Worst case, -vvv can be passed to dump exact offsets during a -z run and we have a postprocessor in the testing/ directory that asserts that the random offsets are random enough...maybe?

@JulianKunkel
Copy link
Collaborator

I had started the testing under src/test when the API was created. I have simplified adding a test now and added one for the "sequential". Should hopefully be easier to add more tests now.

@JulianKunkel
Copy link
Collaborator

JulianKunkel commented Nov 5, 2020

To collect some more evidence, I have done some benchmarking on my laptop regarding the current strategy in order to reconsider. At the moment, all the offsets are precomputed (even for sequential access), this increases the memory footprint and the runtime. When using stonewalling with very high numbers it may cause performance issues.
Test ran with 1,000,000,000 segments. Takes about 7.5 GByte memory (8 Byte per offset).
With sequential, takes 3.5s to precompute.
Random (-z), 183s which seems significant at first glance.
However, this means 5,470,676 offsets are computed per second. Even for 4096-byte accesses, the file would be 3.73 TByte large for this testcase. That looks a bit extreme even for the next 10 years.

IMHO it would be best to integrate the calculation of the offsets into the processing instead of pre-computing them. That would avoid the memory consumption and the overall need.
For random, though, this would imply that no full shuffling of all segments can be done anymore. Similar to this patch, I consider that it is likely OK to have a pattern with a number of segments defined by the user (e.g., 1 million) that are shuffled, and then repeat this pattern over and over again. That would still result in random access over a range of 4 GByte which seems to me OK. I'll inclined to test such a patchset.

@JulianKunkel
Copy link
Collaborator

I have now implemented the new behavior in the PR listed above and done quite some testing.
To make a note about performance:

  • there is now no storage needed for sequential and no precompute.
  • the example I gave above: with 1,000,000,000 segments, could be rewritten to for example use a blocksize of 1/100th of the segments listed, these the time to pre-calculate offsets (needed only 1x as before) would be ~1.8s, the area that is truly random would then be 1/100th of the overall file size, e.g. 37 GByte out of the 3.73 TByte. I personally consider that to be random enough (otherwise, could use bigger block size) which then though needs more time and memory to pre-calculate/store these offsets.

@JulianKunkel
Copy link
Collaborator

I'm closing this in favor of: Fix offset integration #270

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants