Skip to content

Releases: huggingface/datasets

1.18.3

02 Feb 14:21
Compare
Choose a tag to compare

Bug fixes

  • Fix MP3 resampling when a dataset's audio files have different sampling rates by @lhoestq in #3665
  • Extend dataset builder for streaming in get_dataset_split_names by @mariosasko in #3657

Dataset changes

Other improvements

New Contributors

Full Changelog: 1.18.2...1.18.3

1.18.2

28 Jan 16:55
Compare
Choose a tag to compare

Bug fixes

  • Fix streaming datasets that are not reset correctly by @lhoestq in #3646
  • Fix numpy rngs when shuffling with seed=None by @mariosasko in #3641
  • Fix dataset slicing with negative bounds when indices mapping is not None by @mariosasko in #3642
  • Fix add_column on datasets with indices mapping by @mariosasko in #3647

Other improvements

New Contributors

Full Changelog: 1.18.1...1.18.2

1.18.1

26 Jan 14:23
Compare
Choose a tag to compare

Improvements

Bug fixes

Full Changelog: 1.18.0...1.18.1

1.18.0

21 Jan 16:46
Compare
Choose a tag to compare

Datasets Changes

Datasets Features

Metrics Changes

Dataset cards

Documentation

General improvements and bug fixes

New Contributors

Full Changelog: 1.17.0...1.18.0

1.17.0

21 Dec 17:41
Compare
Choose a tag to compare

Dataset Changes

Dataset Features

Dataset cards

Dataset Tasks

Metric Changes

Docs

Additional improvements and bug fixes

New Contributors

Full Changelog: 1.16.1...1.17.0

1.16.1

26 Nov 16:58
Compare
Choose a tag to compare

Bug fixes

1.16.0

26 Nov 14:22
Compare
Choose a tag to compare

Datasets Changes

Datasets Features

  • Push to hub capabilities for Dataset and DatasetDict by @LysandreJik in #3098:
    • upload your dataset to the Hugging face Hub with the push_to_hub() method !
    • See documentation here
  • 200+ datasets now support streaming:
  • Resolve data_files by split name automatically by @lhoestq in #3221
    • It takes into account the file names to know which file goes into which split
    • See documentation here
  • Filter method for batched=True by @thomasw21 in #3244
  • Adding with_rank arg to pass process rank to map by @TevenLeScao in #3314

Dataset Cards

Metrics Changes

  • New: OpenAI's pass@k code evaluation metric by @lvwerra in #2916
  • Update: BLEURT - options to use updated bleurt checkpoints by @jaehlee in #3235
  • Update: CER - update to support latest release by @mariosasko in #3252
  • Update: WER - update to the documentation by @wooters in #3278

Documentation

Additional improvements and bug fixes

Citation

Deprecations

Full Changelog: 1.15.1...1.16.0

1.15.1

02 Nov 21:47
Compare
Choose a tag to compare

Dependencies

1.15.0

02 Nov 21:22
Compare
Choose a tag to compare

Dataset Changes

Dataset Features

Dataset Cards

  • Fill in dataset card for NCBI disease dataset by @edugp in #3115

Metrics Changes

General improvements and bug fixes

1.14.0

19 Oct 16:46
Compare
Choose a tag to compare

Dataset changes

Dataset features

General improvements and bug fixes