Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

报错TypeError: Couldn't cast array of type list<item: string> to null #52

Open
xxcoco763 opened this issue Feb 26, 2024 · 1 comment

Comments

@xxcoco763
Copy link

服务器没办法链接huggingface,只是将pred.py中THU/Longbench的路径换成了本地的/home/eval/LongBench/data,config文件中的模型路径也已经添加,报错如下
CUDA_VISIBLE_DEVICES=7 python pred.py --model llama2-13b-chat-16k
Resolving data files: 100%|████████████████████████████████████| 34/34 [00:00<00:00, 149169.81it/s]
Downloading data files: 100%|██████████████████████████████████████| 1/1 [00:00<00:00, 1417.95it/s]
Extracting data files: 100%|█████████████████████████████████████████| 1/1 [00:00<00:00, 87.24it/s]
Generating train split: 2500 examples [00:00, 4816.93 examples/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1940, in _prepare_split_single
writer.write_table(table)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/arrow_writer.py", line 572, in write_table
pa_table = table_cast(pa_table, self._schema)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2328, in table_cast
return cast_table_to_schema(table, schema)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2287, in cast_table_to_schema
arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()]
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2287, in
arrays = [cast_array_to_feature(table[name], feature) for name, feature in features.items()]
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1831, in wrapper
return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks])
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1831, in
return pa.chunked_array([func(chunk, *args, **kwargs) for chunk in array.chunks])
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2143, in cast_array_to_feature
return array_cast(array, feature(), allow_number_to_str=allow_number_to_str)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 1833, in wrapper
return func(array, *args, **kwargs)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/table.py", line 2028, in array_cast
raise TypeError(f"Couldn't cast array of type\n{array.type}\nto\n{pa_type}")
TypeError: Couldn't cast array of type
list<item: string>
to
null

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/root/zyx/eval/LongBench/pred.py", line 163, in
data = load_dataset('/root/zyx/eval/LongBench/data/data', dataset, split='test')
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/load.py", line 2153, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 954, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1049, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1813, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/root/miniconda3/envs/zyx/lib/python3.10/site-packages/datasets/builder.py", line 1958, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

请问如何解决呢

@bys0318
Copy link
Member

bys0318 commented Feb 27, 2024

如果已经将LongBench的data/下载到了本地,可以用如下方式读入文件以载入数据集:将pred.py第166行改为:

data = [json.loads(line) for line in open(f"LongBench/data/{dataset}.jsonl", encoding="utf-8")]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants