Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] doc:source/ray-air/doc_code/computer_vision failure #33420

Closed
matthewdeng opened this issue Mar 17, 2023 · 4 comments · Fixed by #33422
Closed

[ci] doc:source/ray-air/doc_code/computer_vision failure #33420

matthewdeng opened this issue Mar 17, 2023 · 4 comments · Fixed by #33422
Assignees

Comments

@matthewdeng
Copy link
Contributor

This test has started failing with ModuleNotFoundError: No module named 'tensorflow_metadata'

Looks like this was introduced in: #32857

Some initial ideas:

  1. Should we make this a lazy import?
  2. Should tensorflow_metadata be installed with tensorflow?

Trace:

(ReadTFRecord->MapBatches(decode_bytes) pid=4890)     from tensorflow_metadata.proto.v0 import schema_pb2
(ReadTFRecord->MapBatches(decode_bytes) pid=4890) ModuleNotFoundError: No module named 'tensorflow_metadata'
ReadTFRecord->MapBatches(decode_bytes) 0:   0%|                                                                                                                                              | 0/46 [00:05<?, ?it/s]Traceback (most recent call last):
  File "doc/source/ray-air/doc_code/computer_vision.py", line 430, in <module>
    main()
  File "doc/source/ray-air/doc_code/computer_vision.py", line 4, in main
    test(framework=framework, datasource=datasource)
  File "doc/source/ray-air/doc_code/computer_vision.py", line 20, in test
    dataset = dataset.limit(32)
  File "/ray/python/ray/data/dataset.py", line 2180, in limit
    block_list = self._plan.execute().truncate_by_rows(limit)
  File "/ray/python/ray/data/_internal/plan.py", line 577, in execute
    dataset_uuid=self._dataset_uuid,
  File "/ray/python/ray/data/_internal/execution/legacy_compat.py", line 101, in execute_to_legacy_block_list
    bundles = executor.execute(dag, initial_stats=stats)
  File "/ray/python/ray/data/_internal/execution/bulk_executor.py", line 85, in execute
    return OutputIterator(execute_recursive(dag))
  File "/ray/python/ray/data/_internal/execution/bulk_executor.py", line 66, in execute_recursive
    output = _naive_run_until_complete(op)
  File "/ray/python/ray/data/_internal/execution/bulk_executor.py", line 109, in _naive_run_until_complete
    op.notify_work_completed(ready)
  File "/ray/python/ray/data/_internal/execution/operators/task_pool_map_operator.py", line 65, in notify_work_completed
    task.output = self._map_ref_to_ref_bundle(ref)
  File "/ray/python/ray/data/_internal/execution/operators/map_operator.py", line 316, in _map_ref_to_ref_bundle
    all_refs = list(ray.get(ref))
  File "/ray/python/ray/_private/client_mode_hook.py", line 105, in wrapper
    return func(*args, **kwargs)
  File "/ray/python/ray/_private/worker.py", line 2426, in get
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(ModuleNotFoundError): ray::ReadTFRecord->MapBatches(decode_bytes)() (pid=4894, ip=172.18.0.3)
  File "/ray/python/ray/data/_internal/execution/operators/map_operator.py", line 374, in _map_task
    for b_out in fn(iter(blocks), ctx):
  File "/ray/python/ray/data/_internal/execution/legacy_compat.py", line 274, in do_map
    yield from block_fn(blocks, ctx, *fn_args, **fn_kwargs)
  File "/ray/python/ray/data/_internal/planner/map_batches.py", line 106, in fn
    for batch in formatted_batch_iter:
  File "/ray/python/ray/data/_internal/block_batching.py", line 172, in batch_blocks
    for formatted_batch in batch_iter:
  File "/ray/python/ray/data/_internal/block_batching.py", line 400, in _format_batches
    for block in block_iter:
  File "/ray/python/ray/data/_internal/block_batching.py", line 363, in _blocks_to_batches
    for block in block_iter:
  File "/ray/python/ray/data/_internal/plan.py", line 1309, in wrapper
    yield from fn(block, ctx, *args, **kwargs)
  File "/ray/python/ray/data/_internal/plan.py", line 1187, in block_fn
    for block in read_fn():
  File "/ray/python/ray/data/datasource/file_based_datasource.py", line 490, in read_files
    for data in read_stream(f, read_path, **reader_args):
  File "/ray/python/ray/data/datasource/tfrecords_datasource.py", line 50, in _read_stream
    yield pa.Table.from_pydict(_convert_example_to_dict(example, tf_schema))
  File "/ray/python/ray/data/datasource/tfrecords_datasource.py", line 94, in _convert_example_to_dict
    record[feature_name] = _get_feature_value(feature, schema_feature_type)
  File "/ray/python/ray/data/datasource/tfrecords_datasource.py", line 146, in _get_feature_value
    from tensorflow_metadata.proto.v0 import schema_pb2
ModuleNotFoundError: No module named 'tensorflow_metadata'
@scottjlee
Copy link
Contributor

Ah yeah, sorry about that. I think we can import it lazily only when the schema is specified. Thoughts?

@matthewdeng
Copy link
Contributor Author

matthewdeng commented Mar 17, 2023 via email

@scottjlee
Copy link
Contributor

@matthewdeng for option 2 above, do you mean that any time we install tensorflow (i.e. included in a requirements-*.txt file), we should also enforce installing tensorflow-metadata?

@matthewdeng
Copy link
Contributor Author

matthewdeng commented Mar 17, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants