You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Process config example for dataset
# global parameters
project_name: 'ray-demo'
executor_type: 'ray'
dataset_path: './demos/process_video_on_ray/data/demo-dataset.jsonl' # path to your dataset directory or file
ray_address: '<head_node_ip>:<port>' #default 'auto' # change to your ray cluster address, e.g., ray://<hostname>:<port>
export_path: './outputs/demo/demo-processed-ray-videos'
# process schedule
# a list of several process operators with their arguments
process:
# Filter ops
- video_duration_filter:
min_duration: 20
max_duration: 100
- video_resolution_filter: # filter samples according to the resolution of videos in them
min_width: 200 # the min resolution of horizontal resolution filter range (unit p)
max_width: 4096 # the max resolution of horizontal resolution filter range (unit p)
min_height: 200 # the min resolution of vertical resolution filter range (unit p)
max_height: 4096 # the max resolution of vertical resolution filter range (unit p)
any_or_all: any
# Mapper ops
- video_split_by_duration_mapper: # Mapper to split video by duration.
split_duration: 10 # duration of each video split in seconds.
min_last_split_duration: 0 # the minimum allowable duration in seconds for the last video split. If the duration of the last split is less than this value, it will be discarded.
keep_original_sample: true
- video_resize_aspect_ratio_mapper:
min_ratio: 1
max_ratio: 1.1
strategy: increase
- video_split_by_key_frame_mapper: # Mapper to split video by key frame.
keep_original_sample: true # whether to keep the original sample. If it's set to False, there will be only cut sample in the final datasets and the original sample will be removed. It's True in default
# Deduplicator ops
- ray_video_deduplicator: # the simple video deduplicator that can run on multi-nodes using md5 hashing exact matching method
redis_host: '<head node ip>' # the host of the redis instance
redis_port: <port-1> # the port of redis instance, please note that the default port of redis is 6379 which is the same as default port for ray, so we need to modify the default redis config to use it in other port
报错日志:
024-12-26 10:31:55 | WARNING | data_juicer.utils.resource_utils:28 - Command nvidia-smi is not found. There might be no GPUs on this machine.
2024-12-26 10:31:55 | INFO | data_juicer.core.ray_executor:42 - Initing Ray ...
2024-12-26 10:31:55,767 INFO worker.py:1586 -- Connecting to existing Ray cluster at address: 192.168.201.69:6379...
2024-12-26 10:31:55,772 INFO worker.py:1762 -- Connected to Ray cluster. View the dashboard at 192.168.201.69:8265
2024-12-26 10:31:55 | INFO | data_juicer.core.ray_executor:53 - Loading dataset with Ray...
2024-12-26 10:31:56,945 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:31:56,945 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:69 - Preparing process operators...
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:83 - Processing data...
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:87 - All Ops are done in 0.005s.
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:90 - Exporting dataset to disk...
2024-12-26 10:31:57,897 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:31:57,897 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON] -> TaskPoolMapOperator[MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write]
Running 0: 0%| | 0/1 [00:00<?, ?it/s]2024-12-26 10:32:36,127(ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(proc2024-12-26 10:32:36,131 ERROR exceptions.py:73 -- Exception occurred in Ray Data or Ray Core internal code. If you continue to see this error, please open an issue on the Ray project GitHub page with the full stack trace below: https://github.com/ray-project/ray/issues/new/choose
2024-12-26 10:32:36,137 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,137 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,157 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,157 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,176 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,176 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,195 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,195 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
--- Logging error in Loguru Handler * minor changes on docs #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=50, microseconds=957976), 'exception': (type=<class 'ray.exceptions.RayTaskError(FileNotFoundError)'>, value=RayTaskError(FileNotFoundError)(FileNotFoundError(2, "Failed to open local file '/root/data-juicer/outputs/demo/demo-processed-ray-videos/560_000000_000000.json'. Detail: [errno 2] No such file or directory")), traceback=<traceback object at 0x7f2e4ca9ef40>), 'extra': {}, 'file': (name='process_data.py', path='/root/data-juicer/tools/process_data.py'), 'function': '', 'level': (name='ERROR', no=40, icon='❌'), 'line': 19, 'message': "An error has been caught in function '', process 'MainProcess' (3261981), thread 'MainThread' (139837157271360):", 'module': 'process_data', 'name': 'main', 'process': (id=3261981, name='MainProcess'), 'thread': (id=139837157271360, name='MainThread'), 'time': datetime(2024, 12, 26, 10, 32, 36, 133482, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800), 'CST'))}
ray.data.exceptions.SystemException
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/data-juicer/djenv/lib/python3.9/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
File "/root/data-juicer/tools/process_data.py", line 15, in main
executor.run()
File "/root/data-juicer/data_juicer/core/ray_executor.py", line 91, in run
dataset.data.write_json(self.cfg.export_path, force_ascii=False)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 2828, in write_json
self.write_datasink(
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 3544, in write_datasink
self._write_ds = Dataset(plan, logical_plan).materialize()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 4502, in materialize
copy._plan.execute()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(FileNotFoundError): ray::MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write() (pid=1620848, ip=192.168.201.68)
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 253, in call
yield from self._block_fn(input, ctx)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/_internal/planner/plan_write_op.py", line 26, in fn
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 128, in write
self.write_block(block_accessor, 0, ctx)
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 254, in write_block
call_with_retry(
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 986, in call_with_retry
raise e from None
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 973, in call_with_retry
return f()
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 250, in write_block_to_path
with self.open_output_stream(write_path) as file:
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 79, in open_output_stream
return self.filesystem.open_output_stream(path, **self.open_stream_args)
File "pyarrow/_fs.pyx", line 887, in pyarrow._fs.FileSystem.open_output_stream
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Failed to open local file '/root/data-juicer/outputs/demo/demo-processed-ray-videos/560_000000_000000.json'. Detail: [errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/data-juicer/djenv/lib/python3.9/site-packages/loguru/_handler.py", line 204, in emit
self._queue.put(str_record)
File "/usr/local/lib/python3.9/multiprocessing/queues.py", line 371, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'ray.exceptions.RayTaskError(FileNotFoundError)'>: attribute lookup RayTaskError(FileNotFoundError) on ray.exceptions failed
--- End of logging error ---
2024-12-26 10:32:36,214 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,214 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,233 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,233 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,253 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,253 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,272 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,272 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
ray.data.exceptions.SystemException
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/data-juicer/tools/process_data.py", line 19, in
main()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
File "/root/data-juicer/tools/process_data.py", line 15, in main
executor.run()
File "/root/data-juicer/data_juicer/core/ray_executor.py", line 91, in run
dataset.data.write_json(self.cfg.export_path, force_ascii=False)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 2828, in write_json
self.write_datasink(
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 3544, in write_datasink
self._write_ds = Dataset(plan, logical_plan).materialize()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 4502, in materialize
copy._plan.execute()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(FileNotFoundError): ray::MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write() (pid=1620848, ip=192.168.201.68)
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 253, in call
yield from self._block_fn(input, ctx)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/_internal/planner/plan_write_op.py", line 26, in fn
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 128, in write
self.write_block(block_accessor, 0, ctx)
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 254, in write_block
call_with_retry(
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 986, in call_with_retry
raise e from None
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 973, in call_with_retry
return f()
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 250, in write_block_to_path
with self.open_output_stream(write_path) as file:
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 79, in open_output_stream
return self.filesystem.open_output_stream(path, **self.open_stream_args)
File "pyarrow/_fs.pyx", line 887, in pyarrow._fs.FileSystem.open_output_stream
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Failed to open local file '/root/data-juicer/outputs/demo/demo-processed-ray-videos/560_000000_000000.json'. Detail: [errno 2] No such file or directory
Additional 额外信息
在之前 当ray的head节点安装了data-juicer,而ray的非head节点没有安装data-juicer时
运行ray分布式任务,在非head节点会报错:No module named data-juicer
请问跑分布式任务,是需要所有节点都安装data-juicer吗?
The text was updated successfully, but these errors were encountered:
Before Asking 在提问之前
I have read the README carefully. 我已经仔细阅读了 README 上的操作指引。
I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
Search before asking 先搜索,再提问
Question
配置:
Installation Method
from source
Data-Juicer Version
v1.0.2
Python Version
3.9.15
Ray Version
2.31.0
Ray Cluster信息
两台机器都clone了data juicer项目并且安装(pip install -v -e .[dist])
错误发生在ray任务被调度到非head节点上时
当ray任务被调度到head节点上时不会报错
相关命令:
python tools/process_data.py --config ./demos/process_video_on_ray/configs/demo.yaml
demos.yaml
报错日志:
024-12-26 10:31:55 | WARNING | data_juicer.utils.resource_utils:28 - Command nvidia-smi is not found. There might be no GPUs on this machine.
2024-12-26 10:31:55 | INFO | data_juicer.core.ray_executor:42 - Initing Ray ...
2024-12-26 10:31:55,767 INFO worker.py:1586 -- Connecting to existing Ray cluster at address: 192.168.201.69:6379...
2024-12-26 10:31:55,772 INFO worker.py:1762 -- Connected to Ray cluster. View the dashboard at 192.168.201.69:8265
2024-12-26 10:31:55 | INFO | data_juicer.core.ray_executor:53 - Loading dataset with Ray...
2024-12-26 10:31:56,945 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:31:56,945 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:69 - Preparing process operators...
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:83 - Processing data...
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:87 - All Ops are done in 0.005s.
2024-12-26 10:31:57 | INFO | data_juicer.core.ray_executor:90 - Exporting dataset to disk...
2024-12-26 10:31:57,897 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:31:57,897 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON] -> TaskPoolMapOperator[MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write]
Running 0: 0%| | 0/1 [00:00<?, ?it/s]2024-12-26 10:32:36,127(ERROR streaming_executor_state.py:455 -- An exception was raised from a task of operator "MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write". Dataset execution will now abort. To ignore this exception and continue, set DataContext.max_errored_blocks.
2024-12-26 10:32:36,137 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,137 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,157 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,157 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,176 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,176 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,195 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,195 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
--- Logging error in Loguru Handler * minor changes on docs #2 ---
Record was: {'elapsed': datetime.timedelta(seconds=50, microseconds=957976), 'exception': (type=<class 'ray.exceptions.RayTaskError(FileNotFoundError)'>, value=RayTaskError(FileNotFoundError)(FileNotFoundError(2, "Failed to open local file '/root/data-juicer/outputs/demo/demo-processed-ray-videos/560_000000_000000.json'. Detail: [errno 2] No such file or directory")), traceback=<traceback object at 0x7f2e4ca9ef40>), 'extra': {}, 'file': (name='process_data.py', path='/root/data-juicer/tools/process_data.py'), 'function': '', 'level': (name='ERROR', no=40, icon='❌'), 'line': 19, 'message': "An error has been caught in function '', process 'MainProcess' (3261981), thread 'MainThread' (139837157271360):", 'module': 'process_data', 'name': 'main', 'process': (id=3261981, name='MainProcess'), 'thread': (id=139837157271360, name='MainThread'), 'time': datetime(2024, 12, 26, 10, 32, 36, 133482, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800), 'CST'))}
ray.data.exceptions.SystemException
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/data-juicer/djenv/lib/python3.9/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
File "/root/data-juicer/tools/process_data.py", line 15, in main
executor.run()
File "/root/data-juicer/data_juicer/core/ray_executor.py", line 91, in run
dataset.data.write_json(self.cfg.export_path, force_ascii=False)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 2828, in write_json
self.write_datasink(
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 3544, in write_datasink
self._write_ds = Dataset(plan, logical_plan).materialize()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 4502, in materialize
copy._plan.execute()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(FileNotFoundError): ray::MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write() (pid=1620848, ip=192.168.201.68)
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 253, in call
yield from self._block_fn(input, ctx)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/_internal/planner/plan_write_op.py", line 26, in fn
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 128, in write
self.write_block(block_accessor, 0, ctx)
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 254, in write_block
call_with_retry(
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 986, in call_with_retry
raise e from None
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 973, in call_with_retry
return f()
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 250, in write_block_to_path
with self.open_output_stream(write_path) as file:
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 79, in open_output_stream
return self.filesystem.open_output_stream(path, **self.open_stream_args)
File "pyarrow/_fs.pyx", line 887, in pyarrow._fs.FileSystem.open_output_stream
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Failed to open local file '/root/data-juicer/outputs/demo/demo-processed-ray-videos/560_000000_000000.json'. Detail: [errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/data-juicer/djenv/lib/python3.9/site-packages/loguru/_handler.py", line 204, in emit
self._queue.put(str_record)
File "/usr/local/lib/python3.9/multiprocessing/queues.py", line 371, in put
obj = _ForkingPickler.dumps(obj)
File "/usr/local/lib/python3.9/multiprocessing/reduction.py", line 51, in dumps
cls(buf, protocol).dump(obj)
_pickle.PicklingError: Can't pickle <class 'ray.exceptions.RayTaskError(FileNotFoundError)'>: attribute lookup RayTaskError(FileNotFoundError) on ray.exceptions failed
--- End of logging error ---
2024-12-26 10:32:36,214 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,214 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,233 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,233 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,253 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,253 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
2024-12-26 10:32:36,272 INFO streaming_executor.py:108 -- Starting execution of Dataset. Full logs are in /tmp/ray/session_2024-12-25_15-41-56_306491_3210611/logs/ray-data
2024-12-26 10:32:36,272 INFO streaming_executor.py:109 -- Execution plan of Dataset: InputDataBuffer[Input] -> TaskPoolMapOperator[ReadJSON]
ray.data.exceptions.SystemException
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/root/data-juicer/tools/process_data.py", line 19, in
main()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/loguru/_logger.py", line 1297, in catch_wrapper
return function(*args, **kwargs)
File "/root/data-juicer/tools/process_data.py", line 15, in main
executor.run()
File "/root/data-juicer/data_juicer/core/ray_executor.py", line 91, in run
dataset.data.write_json(self.cfg.export_path, force_ascii=False)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 2828, in write_json
self.write_datasink(
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 3544, in write_datasink
self._write_ds = Dataset(plan, logical_plan).materialize()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/dataset.py", line 4502, in materialize
copy._plan.execute()
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/exceptions.py", line 86, in handle_trace
raise e.with_traceback(None) from SystemException()
ray.exceptions.RayTaskError(FileNotFoundError): ray::MapBatches(partial)->MapBatches(process_batch_arrow)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(compute_stats_single)->Filter(process_single)->MapBatches(process_batched)->MapBatches(process_single)->MapBatches(process_batched)->MapBatches(compute_stats_single)->Filter(process_single)->Write() (pid=1620848, ip=192.168.201.68)
for b_out in map_transformer.apply_transform(iter(blocks), ctx):
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/execution/operators/map_transformer.py", line 253, in call
yield from self._block_fn(input, ctx)
File "/root/data-juicer/djenv/lib/python3.9/site-packages/ray/data/_internal/planner/plan_write_op.py", line 26, in fn
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 128, in write
self.write_block(block_accessor, 0, ctx)
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 254, in write_block
call_with_retry(
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 986, in call_with_retry
raise e from None
File "/root/djenv/lib/python3.9/site-packages/ray/data/_internal/util.py", line 973, in call_with_retry
return f()
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 250, in write_block_to_path
with self.open_output_stream(write_path) as file:
File "/root/djenv/lib/python3.9/site-packages/ray/data/datasource/file_datasink.py", line 79, in open_output_stream
return self.filesystem.open_output_stream(path, **self.open_stream_args)
File "pyarrow/_fs.pyx", line 887, in pyarrow._fs.FileSystem.open_output_stream
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
FileNotFoundError: [Errno 2] Failed to open local file '/root/data-juicer/outputs/demo/demo-processed-ray-videos/560_000000_000000.json'. Detail: [errno 2] No such file or directory
Additional 额外信息
在之前 当ray的head节点安装了data-juicer,而ray的非head节点没有安装data-juicer时
运行ray分布式任务,在非head节点会报错:No module named data-juicer
请问跑分布式任务,是需要所有节点都安装data-juicer吗?
The text was updated successfully, but these errors were encountered: