Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Data] nyc_taxi_basic_processing example is broken #36618

Closed
bveeramani opened this issue Jun 20, 2023 · 4 comments
Closed

[Data] nyc_taxi_basic_processing example is broken #36618

bveeramani opened this issue Jun 20, 2023 · 4 comments
Labels
data Ray Data-related issues docs An issue or change related to documentation P1 Issue that should be fixed within a few weeks

Comments

@bveeramani
Copy link
Member

bveeramani commented Jun 20, 2023

Description

(MapBatches(<lambda>) pid=28384) Traceback (most recent call last):
--
  | (MapBatches(<lambda>) pid=28384)   File "python/ray/_raylet.pyx", line 1073, in ray._raylet.execute_dynamic_generator_and_store_task_outputs
  | (MapBatches(<lambda>) pid=28384)   File "python/ray/_raylet.pyx", line 3291, in ray._raylet.CoreWorker.store_task_outputs
  | (MapBatches(<lambda>) pid=28384)   File "/ray/python/ray/data/_internal/execution/operators/map_operator.py", line 389, in _map_task
  | (MapBatches(<lambda>) pid=28384)     for b_out in fn(iter(blocks), ctx):
  | (MapBatches(<lambda>) pid=28384)   File "/ray/python/ray/data/_internal/execution/legacy_compat.py", line 305, in do_map
  | (MapBatches(<lambda>) pid=28384)     yield from block_fn(blocks, ctx, *fn_args, **fn_kwargs)
  | (MapBatches(<lambda>) pid=28384)   File "/ray/python/ray/data/_internal/planner/map_batches.py", line 118, in fn
  | (MapBatches(<lambda>) pid=28384)     yield from process_next_batch(batch)
  | (MapBatches(<lambda>) pid=28384)   File "/ray/python/ray/data/_internal/planner/map_batches.py", line 79, in process_next_batch
  | (MapBatches(<lambda>) pid=28384)     batch = batch_fn(batch, *fn_args, **fn_kwargs)
  | (MapBatches(<lambda>) pid=28384)   File "/tmp/tmpu96o5u2m", line 155, in <lambda>
  | (MapBatches(<lambda>) pid=28384)     ds = ds.map_batches(lambda df: df[df["passenger_count"] > 0])
  | (MapBatches(<lambda>) pid=28384) TypeError: unhashable type: 'numpy.ndarray'

Link

https://docs.ray.io/en/latest/data/examples/nyc_taxi_basic_processing.html
https://buildkite.com/ray-project/oss-ci-build-pr/builds/22898#018844e6-35c8-460d-a072-829b57ce9785

See also

#35618

@bveeramani bveeramani added P1 Issue that should be fixed within a few weeks docs An issue or change related to documentation data Ray Data-related issues labels Jun 20, 2023
@LarkinDeity
Copy link

I get this error when i'am running the NYC_TAXI_DATA example from ray data example, I wander how can I fix it?

@anyscalesam
Copy link
Contributor

@bveeramani ?

@bveeramani
Copy link
Member Author

This example has been removed from the documentation.

@LarkinDeity To fix, I think you'd just want to add batch_format="pandas" to the map_batches call.

@LarkinDeity
Copy link

@bveeramani after i add the batch_format param it works fine now, thank you so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Ray Data-related issues docs An issue or change related to documentation P1 Issue that should be fixed within a few weeks
Projects
None yet
Development

No branches or pull requests

3 participants