Releases: lhotse-speech/lhotse
v1.28.0 - Lurking Lizard
New features
- Implement conversion from CutSet to HuggingFace dataset by @domklement in #1398
- Add workflow: annotate DNSMOS P.835 by @yfyeung in #1406
New recipes
- Add recipe for the Santa Barbara Corpus of Spoken American English (SBCSAE) by @mmaciej2 in #1395
- Adds radio data recipe by @m-wiesner in #1400
- Fleurs by @m-wiesner in #1402
- Add the Emilia corpus by @csukuangfj in #1404
What's Changed
- [spgispeech] Fix durations object is null issue by @frankyoujian in #1390
- Fix backend to None while ffmpeg is unavailable. by @pengzhendong in #1392
- Fix ksponspeech recipe by @yfyeung in #1394
- Fix cli for ksponspeech by @yfyeung in #1393
- [fix] fisher_english recipe by @pengzhendong in #1410
- downgrading sphinx version from 7.2.6 to 7.1.2 by @annapovey in #1409
- Update lhotse.py by @pengzhendong in #1414
- Make torchaudio an optional dependency by @pzelasko in #1382
- minor fix by @pengzhendong in #1418
- Support for AIStore ObjectFile resilient reading when AIStore SDK version >=1.9.1 is present
New Contributors
- @frankyoujian made their first contribution in #1390
- @pengzhendong made their first contribution in #1392
- @mmaciej2 made their first contribution in #1395
- @domklement made their first contribution in #1398
- @annapovey made their first contribution in #1409
Full Changelog: v1.27.0...v1.28.0
v1.27.0 - Crispy Momo
New recipes
- [Recipe] Wenetspeech4tts by @yuekaizhang in #1384
- [Recipe] Spatial LibriSpeech by @JinZr in #1386
Other enhancements
- Cap the 'trng' random seeds to 2**31 avoiding numpy error by @pzelasko in #1379
CutSet
.prefetch() for background cuts loading during iteration by @pzelasko in #1380- Include a copyright NOTICE listing major copyright holders by @pzelasko in #1381
- Added has_custom to MixedCut by @anteju in #1383
- Fix to fixed batch size bucketing and audio loading network connectio… by @pzelasko in #1387
New Contributors
Full Changelog: v1.26.0...v1.27.0
v1.26.0 - Uranium Fever
v1.25.0 - Himalayan Cat
What's Changed
- [feature] Add
.narrowband()
effect (mulaw, lpc10 codecs) by @rouseabout in #1348 - [feature/optimization] Support for pre-determined batch sizes in
DynamicBucketingSampler
by @pzelasko in #1372 - [bug] Fix
MixedCut
transforms serialization by @pzelasko in #1370
Full Changelog: v1.24.2...v1.25.0
v1.24.2
New recipes
New features
Several new APIs for manifest classes added in #1361:
cut.iter_data()
which iterates over (key, manifest) pairs of all data items attached to a given cut (e.g.,("recording", Recording(...)), ("custom_features", TemporalArray(...))
)is_in_memory
property for all manifest types to indicate if it contains data that is held in memoryis_placeholder
for non-cut manifests to indicate if a manifest is just a placeholder (has some metadata, but can't be used to load data)cut.drop_in_memory_data()
which converts manifests with in-memory data to placeholders (this is useful for manifests that live longer than just dataloading to avoid blowing up CPU memory and/or slowing down the program)
Bug fixes
- Restoring smart open for local files if available by @pzelasko in #1360
- Fix Recording.to_dict() when transforms are dicts and transform pickling issues by @pzelasko in #1355
- Utils for discovering attached data and dropping in-memory data by @pzelasko in #1361
- Numpy 2.0 compatibility by @pzelasko in #1362
New Contributors
Full Changelog: v1.24.1...v1.24.2
v1.24.1
v1.24 - The World's Highest Wingsuit Jump
What's Changed
New features
Notably, there's a new optimization for dynamic bucketing sampler in multi-GPU training - it will choose the same (or the closest possible) bucket on each DDP rank to keep the total training step times closer. The expected speedup is dependent on the model and the number of GPUs. We observed 8 and 13% speedups across two experiments compared to non-synchronized bucket selection. The new option is called sync_buckets
and is enabled by default.
- Dynamic bucket selection RNG sync by @pzelasko in #1341
- Add new sampler: weighted sampler by @marcoyang1998 in #1344
reverb_rir
: support Cut input and in memory data by @pzelasko in #1332
Recipes
Other improvements
- Missing 'subset' parameter by @daniel-dona in #1336
- Fix describe on cuts by @keeofkoo in #1340
- Use libsndfile in recording chunk dataset by @pzelasko in #1335
- Fix librispeech manifest caching by @haerski in #1343
- Fix one-off edge case in split_lazy by @pzelasko in #1347
- Increase the start diff tolerance for feature loading by @pzelasko in #1349
- More test coverage for lhotse subset by @pzelasko in #1345
New Contributors
- @keeofkoo made their first contribution in #1340
- @haerski made their first contribution in #1343
- @Triplecq made their first contribution in #1330
Full Changelog: v1.23...v1.24
v1.23 - Snowdrop
What's Changed
Recipes
- MDCC recipe by @JinZr in #1302
- Updated text_norm for
aishell
recipe by @JinZr in #1305 - Allow skipping missing files in AMI download by @pzelasko in #1318
- Add Chinese TTS dataset
baker
. by @csukuangfj in #1304 - In CommonVoice corpus, use .tsv headers to parse and not column index by @daniel-dona in #1328
Fixes to a regression in noise mixing augmentations
- Enhance
CutSet.mix()
randomness and data utilization by @pzelasko in #1315 - Fix randomness in CutMix transform by @pzelasko in #1316
- select a random sub-region of the noise based on the delta duration by @osadj in #1317
Other improvements
- Add dataset for audio tagging by @marcoyang1998 in #1241
- Fix _get_strided_batch device by @lifeiteng in #1303
- Fix typo in README.md by @yfyeung in #1308
- Fix export of features/array to shar by @pzelasko in #1323
- Fix
trim_to_supervision_groups
by @pzelasko in #1322
New Contributors
- @daniel-dona made their first contribution in #1328
Full Changelog: v1.22...v1.23
v1.22 - Sherpa's Paradise
What's Changed
New features
As an experimental feature, we are extending the API of Lhotse samplers to enable key sampling features for non-audio data such as text. That means text (and other) data can be dynamically multiplexed and bucketed in the same way as audio data with some lightweight wrappers. Please refer to new documentation here: https://lhotse.readthedocs.io/en/latest/datasets.html#customizing-sampling-constraints
- Multi-channel support improvements
Lhotse MultiCut
s:
- are now exportable into Lhotse Shar format
- gained a new method
cut = cut.with_channels([0, 1, ...])
to modify the channels they refer to - can have multi-channel custom Recordings with channels selectable via a special custom key (e.g., if defining
cut.target_recording
, audio can be read viacut.load_target_recording()
and channels will be auto-selected by looking upcut.target_recording_channel_selector
).
Recipes
- Add new recipe: speechio by @yuekaizhang in #1297
- tedlium2 recipe by @JinZr in #1296
Other improvements
- Use audio backends and export custom fields in Lhotse Shar by @pzelasko in #1290
- Documentation for random seeds in lhotse + extended support of lazy r… by @pzelasko in #1291
- Cutconcat fixed max duration by @swigls in #1292
- Fix feature_dim of Spectrogram extractors. by @csukuangfj in #1294
- fix whisper for multi-channel data by @yuekaizhang in #1289
- Xfail flaky SileroVAD tests by @pzelasko in #1300
New Contributors
Full Changelog: v1.21...v1.22
v1.21 - Glaciology
What's Changed
This release patches lhotse to handle cases when libsox is not available for torchaudio. The audio backend code went through additional round of refactoring, and libsndfile
is now preferred as a default since it showed faster audio decoding performance in our testing. Going forward, when LHOTSE_AUDIO_BACKEND
is set, we will use the same backend for audio loading, audio saving, and reading audio metadata (if possible). This release also adds support for Python 3.12 and PyTorch 2.2.
- Add VAD to Supervisions in LibriLight Recipe by @yfyeung in #1280
- Fixes for manifest validation and fixing by @pzelasko in #1284
- Handle error with cachedir creation gracefully by @pzelasko in #1287
AudioBackend
specificsave_audio
andinfo
, managing missing SoX in torchaudio, Python 3.12 / PyTorch 2.2 support, usinglibsndfile
as preferred audio backend by @pzelasko in #1288
Full Changelog: v1.20...v1.21