We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When mapping some datasets with batched=True, datasets may raise an exeception:
batched=True
Traceback (most recent call last): File "/Users/codingl2k1/Work/datasets/venv/lib/python3.11/site-packages/multiprocess/pool.py", line 125, in worker result = (True, func(*args, **kwds)) ^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/utils/py_utils.py", line 1328, in _write_generator_to_queue for i, result in enumerate(func(**kwargs)): File "/Users/codingl2k1/Work/datasets/src/datasets/arrow_dataset.py", line 3483, in _map_single writer.write_batch(batch) File "/Users/codingl2k1/Work/datasets/src/datasets/arrow_writer.py", line 549, in write_batch array = cast_array_to_feature(col_values, col_type) if col_type is not None else col_values ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/table.py", line 1831, in wrapper return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/table.py", line 1831, in <listcomp> return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/table.py", line 2063, in cast_array_to_feature return feature.cast_storage(array) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/features/features.py", line 1098, in cast_storage if min_max["max"] >= self.num_classes: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: '>=' not supported between instances of 'NoneType' and 'int' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/Users/codingl2k1/Work/datasets/t1.py", line 33, in <module> ds = ds.map(transforms, num_proc=14, batched=True, batch_size=5) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/dataset_dict.py", line 850, in map { File "/Users/codingl2k1/Work/datasets/src/datasets/dataset_dict.py", line 851, in <dictcomp> k: dataset.map( ^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/arrow_dataset.py", line 577, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/arrow_dataset.py", line 542, in wrapper out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/src/datasets/arrow_dataset.py", line 3179, in map for rank, done, content in iflatmap_unordered( File "/Users/codingl2k1/Work/datasets/src/datasets/utils/py_utils.py", line 1368, in iflatmap_unordered [async_result.get(timeout=0.05) for async_result in async_results] File "/Users/codingl2k1/Work/datasets/src/datasets/utils/py_utils.py", line 1368, in <listcomp> [async_result.get(timeout=0.05) for async_result in async_results] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/codingl2k1/Work/datasets/venv/lib/python3.11/site-packages/multiprocess/pool.py", line 774, in get raise self._value TypeError: '>=' not supported between instances of 'NoneType' and 'int'
from datasets import load_dataset def transforms(examples): # examples["pixel_values"] = [image.convert("RGB").resize((100, 100)) for image in examples["image"]] return examples ds = load_dataset("scene_parse_150") ds = ds.map(transforms, num_proc=14, batched=True, batch_size=5) print(ds)
map without exception.
Datasets: b8067c0 Python: 3.11.4 System: Macos
The text was updated successfully, but these errors were encountered:
ClassLabel
None
Thanks for reporting! I've opened a PR with a fix.
Sorry, something went wrong.
Successfully merging a pull request may close this issue.
Describe the bug
When mapping some datasets with
batched=True
, datasets may raise an exeception:Steps to reproduce the bug
Expected behavior
map without exception.
Environment info
Datasets: b8067c0
Python: 3.11.4
System: Macos
The text was updated successfully, but these errors were encountered: