enable shuffle of slice data source at the end of epoch #1160
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR tends to fix the bug that simple data source does not shuffle at the end of epoch.
The following explain the change:
The feature needs to be kept is shown as the following:
For example, (shuffle mode) (e.g. data iterator, using slice data source --> simple data source)
Here,
d_?_?
represents the dataset retrieved from a specified slice iterator.REQUIREMENT:
Requirement 1: d_0_1 & d_0_2 == empty, d_0_1 & d_0_3 == empty, d_0_1 & d_0_3 == empty
Requirement 2: d_0_1 != d_1_1 != d_2_1, d_0_2 != d_1_2 != d_2_2, d_0_3 != d_1_3 != d_2_3,
The dropped feature is shown as the following:
NO REQUIREMENT:
Requirement 3: d_0_1 & d_1_2 == empty, or d_0_1 & d_2_2 == empty, other is similar.
The change tends to resolve the problem that different slice with different epoch generation should share same random order.
It means that:
When Slice1, Slice 2 or Slice3 is exactly synchronized, it is no problem. But if they are not exactly synchronized, for example, Slice1 is running on epoch 1, Slice 2 is running on epoch 0, the random order 0 MUST be saved for the access of slice 2. When Slice 1 upgraded from epoch 0 to epoch 1, the new random order is generated, the old random order cannot be overwritten, instead that it should be kept. Thus, at that time, there are 2 random order coexists.
In fact, in multi-node distribution computation, this condition will never happen. Because we keep exactly synchronization by MPI primitives, and data need to be exchanged with strictly synchronization. Our code change is still necessary, because a slice iterator cannot assume how developers use it, since slice iterator looks not only for distribution computation.