-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add oversampling strategy iterable datasets interleave (#5036)
* add IterableDatasets oversampling strategy in interleave_datasets * add oversampling strategy for iterable datasets * update docs regarding interleave_datasets with oversampling strategy for IterableDatasets * correct the is_exhausted index in order to avoid unconsistent behavior * refactoring of RandomlyCyclingMultiSourcesExamplesIterable and CyclingMultiSourcesExamplesIterable to avoid code redundancy * add hardcoded test for interleaving IterableDatasets with oversampling strategy * add test for interleaving IterableDatasets with oversampling strategy * modify tests for cycling multiple IterableDatasets * update the interleave_datasets doc * make style following conflict * Update src/datasets/iterable_dataset.py Remove resetting of empty iterators Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com> Co-authored-by: Quentin Lhoest <42851186+lhoestq@users.noreply.github.com>
- Loading branch information
Showing
5 changed files
with
151 additions
and
42 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
1529bdc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Show benchmarks
PyArrow==6.0.0
Show updated benchmarks!
Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!
Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json