-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[docs] Compress data files #5691
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @stevhliu, for the update on audio/image file extensions and the recommendation to compress text files (this will also have a positive impact in downloading time).
There is however a confusion about the size limits:
- From GitHub docs, there are 2 limits on file sizes:
- One for just "Git" files (non Git-LFS): up to 100 MB (see: https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-large-files-on-github#file-size-limits)
- One for Git-LFS files: up to 5 GB for GitHub Enterprise Cloud (https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-git-large-file-storage#about-git-large-file-storage)
However, the limits above are enforced when pushing to GitHub. For our Hugging Face Hub, I think these size limits are different. I guess in the comment you added, you should refer to the 100 MB (for files not tracked by Git-LFS) file size limit instead of 5 GB (limit for files tracked by Git-LFS). This should be confirmed.
Confirmed with the Hub team the file size limit for the Hugging Face Hub is 10MB :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR and also for the confirmation of file size limits. Great!
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
This PR addresses the comments in #5687 about compressing text file extensions before uploading to the Hub. Also clarified what "too large" means based on the GitLFS docs.