-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: image array should support other formats than uint8 #5365
Conversation
The documentation is not available anymore as the PR was closed or merged. |
Hi, thanks for working on this! I agree that the current type-casting (always cast to PS: To avoid the CI failures, we need to handle two more instances of the cast to |
I've made some changes to the PR. Now the encoding procedure behaves as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good :)
Can you also mention which precisions are supported and which ones are downcasted in the docs ?
Could be in https://huggingface.co/docs/datasets/about_dataset_features for examples (there is a paragraph on audio but none for image yet)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just added some docs :) let me know if it sounds good to you @mariosasko and then we can merge IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two nits regarding the docs
Co-authored-by: Mario Šaško <mariosasko777@gmail.com>
Show benchmarksPyArrow==6.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
Currently images that are provided as ndarrays, but not in
uint8
format are going to loose data. Namely, for example in a depth image where the data is in float32 format, the type-casting to uint8 will basically make the whole image blank.PIL.Image.fromarray
does support modeF
.although maybe some further metadata could be supplied via the Image object.