Releases · lhotse-speech/lhotse

19 Nov 14:54

pzelasko

v1.28.0

1880fc1

v1.28.0 - Lurking Lizard Latest

Latest

New features

Implement conversion from CutSet to HuggingFace dataset by @domklement in #1398
Add workflow: annotate DNSMOS P.835 by @yfyeung in #1406

New recipes

Add recipe for the Santa Barbara Corpus of Spoken American English (SBCSAE) by @mmaciej2 in #1395
Adds radio data recipe by @m-wiesner in #1400
Fleurs by @m-wiesner in #1402
Add the Emilia corpus by @csukuangfj in #1404

What's Changed

[spgispeech] Fix durations object is null issue by @frankyoujian in #1390
Fix backend to None while ffmpeg is unavailable. by @pengzhendong in #1392
Fix ksponspeech recipe by @yfyeung in #1394
Fix cli for ksponspeech by @yfyeung in #1393
[fix] fisher_english recipe by @pengzhendong in #1410
downgrading sphinx version from 7.2.6 to 7.1.2 by @annapovey in #1409
Update lhotse.py by @pengzhendong in #1414
Make torchaudio an optional dependency by @pzelasko in #1382
minor fix by @pengzhendong in #1418
Support for AIStore ObjectFile resilient reading when AIStore SDK version >=1.9.1 is present

New Contributors

@frankyoujian made their first contribution in #1390
@pengzhendong made their first contribution in #1392
@mmaciej2 made their first contribution in #1395
@domklement made their first contribution in #1398
@annapovey made their first contribution in #1409

Full Changelog: v1.27.0...v1.28.0

Contributors

csukuangfj, pengzhendong, and 7 other contributors

Assets 2

22 Aug 15:25

pzelasko

v1.27.0

170046f

v1.27.0 - Crispy Momo

New recipes

[Recipe] Wenetspeech4tts by @yuekaizhang in #1384
[Recipe] Spatial LibriSpeech by @JinZr in #1386

Other enhancements

Cap the 'trng' random seeds to 2**31 avoiding numpy error by @pzelasko in #1379
CutSet.prefetch() for background cuts loading during iteration by @pzelasko in #1380
Include a copyright NOTICE listing major copyright holders by @pzelasko in #1381
Added has_custom to MixedCut by @anteju in #1383
Fix to fixed batch size bucketing and audio loading network connectio… by @pzelasko in #1387

New Contributors

@anteju made their first contribution in #1383

Full Changelog: v1.26.0...v1.27.0

Contributors

pzelasko, yuekaizhang, and 2 other contributors

Assets 2

26 Jul 15:58

pzelasko

v1.26.0

21b102c

v1.26.0 - Uranium Fever

What's Changed

Add EARS recipe by @Ryu1845 in #1375
Concurrent dynamic bucketing by @pzelasko in #1373
Refactor bucket selection for customization by @pzelasko in #1377

New Contributors

@Ryu1845 made their first contribution in #1375

Full Changelog: v1.25.0...v1.26.0

Contributors

pzelasko and Ryu1845

Assets 2

18 Jul 23:45

pzelasko

v1.25.0

18436e9

v1.25.0 - Himalayan Cat

What's Changed

[feature] Add .narrowband() effect (mulaw, lpc10 codecs) by @rouseabout in #1348
[feature/optimization] Support for pre-determined batch sizes in DynamicBucketingSampler by @pzelasko in #1372
[bug] Fix MixedCut transforms serialization by @pzelasko in #1370

Full Changelog: v1.24.2...v1.25.0

Contributors

pzelasko and rouseabout

Assets 2

25 Jun 15:59

pzelasko

v1.24.2

e76dc3c

v1.24.2

New recipes

Add KsponSpeech recipe by @whsqkaak in #1353

New features

Several new APIs for manifest classes added in #1361:

cut.iter_data() which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g., ("recording", Recording(...)), ("custom_features", TemporalArray(...)))
is_in_memory property for all manifest types to indicate if it contains data that is held in memory
is_placeholder for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)
cut.drop_in_memory_data() which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)

Bug fixes

Restoring smart open for local files if available by @pzelasko in #1360
Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
Numpy 2.0 compatibility by @pzelasko in #1362

New Contributors

@whsqkaak made their first contribution in #1353

Full Changelog: v1.24.1...v1.24.2

Contributors

pzelasko and whsqkaak

Assets 2

10 Jun 20:35

pzelasko

v1.24.1

866e4a8

v1.24.1

What's Changed

Support for reading data from AIStore using Python SDK by @pzelasko in #1354

Full Changelog: v1.24...v1.24.1

Contributors

pzelasko

Assets 2

05 Jun 19:59

pzelasko

v1.24

4d57d53

v1.24 - The World's Highest Wingsuit Jump

What's Changed

New features

Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets and is enabled by default.

Dynamic bucket selection RNG sync by @pzelasko in #1341
Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir: support Cut input and in memory data by @pzelasko in #1332

Recipes

Add the ReazonSpeech recipe by @Triplecq in #1330

Other improvements

Missing 'subset' parameter by @daniel-dona in #1336
Fix describe on cuts by @keeofkoo in #1340
Use libsndfile in recording chunk dataset by @pzelasko in #1335
Fix librispeech manifest caching by @haerski in #1343
Fix one-off edge case in split_lazy by @pzelasko in #1347
Increase the start diff tolerance for feature loading by @pzelasko in #1349
More test coverage for lhotse subset by @pzelasko in #1345

New Contributors

@keeofkoo made their first contribution in #1340
@haerski made their first contribution in #1343
@Triplecq made their first contribution in #1330

Full Changelog: v1.23...v1.24

Contributors

Triplecq, haerski, and 4 other contributors

Assets 2

30 Apr 18:43

pzelasko

v1.23

b2dce78

v1.23 - Snowdrop

What's Changed

Recipes

MDCC recipe by @JinZr in #1302
Updated text_norm for aishell recipe by @JinZr in #1305
Allow skipping missing files in AMI download by @pzelasko in #1318
Add Chinese TTS dataset baker. by @csukuangfj in #1304
In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in #1328

Fixes to a regression in noise mixing augmentations

Enhance CutSet.mix() randomness and data utilization by @pzelasko in #1315
Fix randomness in CutMix transform by @pzelasko in #1316
select a random sub-region of the noise based on the delta duration by @osadj in #1317

Other improvements

Add dataset for audio tagging by @marcoyang1998 in #1241
Fix _get_strided_batch device by @lifeiteng in #1303
Fix typo in README.md by @yfyeung in #1308
Fix export of features/array to shar by @pzelasko in #1323
Fix trim_to_supervision_groups by @pzelasko in #1322

New Contributors

@daniel-dona made their first contribution in #1328

Full Changelog: v1.22...v1.23

Contributors

lifeiteng, csukuangfj, and 6 other contributors

Assets 2

07 Mar 19:38

pzelasko

v1.22

d26d476

v1.22 - Sherpa's Paradise

What's Changed

New features

Extending Lhotse dataloading to text/multimodal data by @pzelasko in #1295

As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints

Multi-channel support improvements
- Fix loading multi-channel custom recording fields in multi cuts by @pzelasko in #1298
- Channel selection for multi-channel custom recording fields by @pzelasko in #1299

Lhotse MultiCuts:

are now exportable into Lhotse Shar format
gained a new method cut = cut.with_channels([0, 1, ...]) to modify the channels they refer to
can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining cut.target_recording, audio can be read via cut.load_target_recording() and channels will be auto-selected by looking up cut.target_recording_channel_selector).

Recipes

Add new recipe: speechio by @yuekaizhang in #1297
tedlium2 recipe by @JinZr in #1296

Other improvements

Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
Cutconcat fixed max duration by @swigls in #1292
Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
fix whisper for multi-channel data by @yuekaizhang in #1289
Xfail flaky SileroVAD tests by @pzelasko in #1300

New Contributors

@swigls made their first contribution in #1292

Full Changelog: v1.21...v1.22

Contributors

csukuangfj, swigls, and 3 other contributors

Assets 2

13 Feb 19:57

pzelasko

v1.21

769c273

v1.21 - Glaciology

What's Changed

This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.

Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
Fixes for manifest validation and fixing by @pzelasko in #1284
Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend specific save_audio and info, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, using libsndfile as preferred audio backend by @pzelasko in #1288

Full Changelog: v1.20...v1.21

Contributors

pzelasko and yfyeung

Assets 2

Releases: lhotse-speech/lhotse

v1.28.0 - Lurking Lizard

New features

New recipes

What's Changed

New Contributors

Contributors

v1.27.0 - Crispy Momo

New recipes

Other enhancements

New Contributors

Contributors

v1.26.0 - Uranium Fever

What's Changed

New Contributors

Contributors

v1.25.0 - Himalayan Cat

What's Changed

Contributors

v1.24.2

New recipes

New features

Bug fixes

New Contributors

Contributors

v1.24.1

What's Changed

Contributors

v1.24 - The World's Highest Wingsuit Jump

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.23 - Snowdrop

What's Changed

Recipes

Fixes to a regression in noise mixing augmentations

Other improvements

New Contributors

Contributors

v1.22 - Sherpa's Paradise

What's Changed

New features

Recipes

Other improvements

New Contributors

Contributors

v1.21 - Glaciology

What's Changed

Contributors