enable shuffle of slice data source at the end of epoch #1160

TomonobuTsujikawa · 2023-01-24T00:22:58Z

This PR tends to fix the bug that simple data source does not shuffle at the end of epoch.

The following explain the change:
The feature needs to be kept is shown as the following:

For example, (shuffle mode) (e.g. data iterator, using slice data source --> simple data source)

                                        Slice 1      Slice 2      Slice 3
epoch_number 0             d_0_1      d_0_2         d_0_3
epoch_number 1             d_1_1      d_1_2         d_1_3
epoch_number 2             d_2_1      d_2_2         d_2_3

Here, d_?_? represents the dataset retrieved from a specified slice iterator.

REQUIREMENT:
Requirement 1: d_0_1 & d_0_2 == empty, d_0_1 & d_0_3 == empty, d_0_1 & d_0_3 == empty
Requirement 2: d_0_1 != d_1_1 != d_2_1, d_0_2 != d_1_2 != d_2_2, d_0_3 != d_1_3 != d_2_3,

The dropped feature is shown as the following:

NO REQUIREMENT:
Requirement 3: d_0_1 & d_1_2 == empty, or d_0_1 & d_2_2 == empty, other is similar.

The change tends to resolve the problem that different slice with different epoch generation should share same random order.
It means that:

                                        Slice 1                            Slice 2                             Slice 3
epoch_number 0                  << Random Order Generation 0>>
epoch_number 1                  << Random Order Generation 1>>
epoch_number 2                  << Random Order Generation 2>>

When Slice1, Slice 2 or Slice3 is exactly synchronized, it is no problem. But if they are not exactly synchronized, for example, Slice1 is running on epoch 1, Slice 2 is running on epoch 0, the random order 0 MUST be saved for the access of slice 2. When Slice 1 upgraded from epoch 0 to epoch 1, the new random order is generated, the old random order cannot be overwritten, instead that it should be kept. Thus, at that time, there are 2 random order coexists.

In fact, in multi-node distribution computation, this condition will never happen. Because we keep exactly synchronization by MPI primitives, and data need to be exchanged with strictly synchronization. Our code change is still necessary, because a slice iterator cannot assume how developers use it, since slice iterator looks not only for distribution computation.

enable shuffle of slice data source at the end of epoch

124b03b

TomonobuTsujikawa added the release-note-utility Auto-release; Utilities label Jan 24, 2023

TomonobuTsujikawa assigned YukioOobuchi Jan 24, 2023

YukioOobuchi merged commit 76268f2 into master Jan 24, 2023

YukioOobuchi deleted the feature/20221222-fix-data-source-simple-shuffle branch January 24, 2023 00:39

TomonobuTsujikawa mentioned this pull request Apr 6, 2023

fix simple data source shuffle problem #1187

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable shuffle of slice data source at the end of epoch #1160

enable shuffle of slice data source at the end of epoch #1160

TomonobuTsujikawa commented Jan 24, 2023

enable shuffle of slice data source at the end of epoch #1160

enable shuffle of slice data source at the end of epoch #1160

Conversation

TomonobuTsujikawa commented Jan 24, 2023