-
Notifications
You must be signed in to change notification settings - Fork 142
Issues: mosaicml/streaming
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
UnicodeDecodeError: ... Efficient way to debug the dataset with streaming?
enhancement
New feature or request
#820
opened Nov 1, 2024 by
TAYmit
Choose JPEG compression level
enhancement
New feature or request
#811
opened Oct 24, 2024 by
cabreraalex
Running into "FileExistsError: [Errno 17] File exists: '/000000_epoch_shape'" even with single GPU
bug
Something isn't working
#802
opened Oct 13, 2024 by
deepanshu-a2z
Support for on-the-fly filtering
enhancement
New feature or request
#800
opened Oct 9, 2024 by
ColinToft
Make New feature or request
epoch_sample_ids
cachable
enhancement
#792
opened Sep 28, 2024 by
janEbert
Dataset does not work after stopping training
bug
Something isn't working
#781
opened Sep 15, 2024 by
AugustDev
JointWriter: Allow shard file appending
bug
Something isn't working
#775
opened Sep 5, 2024 by
janEbert
File exists: '/000000_epoch_shape' when using the ddp strategy from pytorch lightning
bug
Something isn't working
#767
opened Aug 25, 2024 by
elbamos
Estimate total shards at the beginning of data conversion
enhancement
New feature or request
#742
opened Aug 3, 2024 by
abhijithneilabraham
huge temp files while uploading data using MDS writer
bug
Something isn't working
#734
opened Jul 24, 2024 by
MaxxP0
Replication changes sample order
bug
Something isn't working
#725
opened Jul 15, 2024 by
CodeCreator
'File exists: "/00000_locals"' when integrated with deepspeed training scripts
bug
Something isn't working
#717
opened Jul 8, 2024 by
Clement25
All processes allocate memory on rank 0 during StreamingDataset initialization in a distributed setting
bug
Something isn't working
#716
opened Jul 2, 2024 by
ohallstrom
Optional dependency for different storages?
enhancement
New feature or request
#709
opened Jun 24, 2024 by
huxuan
Suboptimal usage of 8xH100 GPUs - Streaming dataloader speed significantly fluctuates across batches
bug
Something isn't working
#686
opened May 25, 2024 by
VSehwag
Last entry in the dataset is causing "Relative sample index $x is not present" error
bug
Something isn't working
#677
opened May 20, 2024 by
isidentical
clean_stale_shared_memory duplicating the master process when called in a train.py script
bug
Something isn't working
#663
opened Apr 26, 2024 by
antoinedandi
Support large size index.json (20GB +)
enhancement
New feature or request
#662
opened Apr 25, 2024 by
andreamad8
Does it support Preference data (for training Reward / DPO)?
enhancement
New feature or request
#656
opened Apr 17, 2024 by
ericxsun
Azure Databricks MDS write ops in error: MapInPandas write_mds gives message Spark higher-order functions are not supported in Unity Catalog
bug
Something isn't working
#655
opened Apr 15, 2024 by
wolliq
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.