Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llmuses 0.3.2 执行自带的数据集报错:ImportError: cannot import name '_datasets_server' from 'datasets.utils' (/data/anaconda3/envs/eval-scope/lib/python3.10/site-packages/datasets/utils/__init__.py) #76

Open
jackqdldd opened this issue Jul 11, 2024 · 12 comments

Comments

@jackqdldd
Copy link

python -m llmuses.run --model qwen/Qwen2-7B-Instruct --template-type qwen --datasets trivia_qa limit 2

@slin000111
Copy link
Contributor

报错是ModuleNotFoundError: No module named 'llmuses.benchmarks.limit'吗?

@slin000111
Copy link
Contributor

--limit 2

@jackqdldd
Copy link
Author

image
python -m llmuses.run --model qwen/Qwen2-7B-Instruct --template-type qwen --datasets trivia_qa --limit 2

@jackqdldd
Copy link
Author

bbh 这个数据集执行也是上面的报错

@slin000111
Copy link
Contributor

这边测试环境:
python 3.10
modelscope 1.16.0
git clone https://github.com/modelscope/eval-scope.git
cd eval-scope/
pip install -e .

@jackqdldd
Copy link
Author

环境是一样的:
modelscope Version: 1.16.0
llmuses 0.4.0
image

python llmuses/run.py --model qwen/Qwen2-7B-Instruct --template-type qwen --datasets arc --dataset-hub Local --dataset-args '{"arc": {"local_path": "/root/eval-scope/data/arc"}}' --dataset-dir /root/eval-scope/data/

@slin000111
Copy link
Contributor

浮点数例外这个报错貌似和你的环境有关,尝试运行下面例程检测一下:

from modelscope import AutoModelForCausalLM, AutoTokenizer
import torch


device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "qwen/Qwen2-7B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("qwen/Qwen2-7B-Instruct")

prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(device)

generated_ids = model.generate(
    model_inputs.input_ids,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

@jackqdldd
Copy link
Author

image

@slin000111
Copy link
Contributor

显卡支持bf16吗

@jackqdldd
Copy link
Author

果然是环境的问题,请问支持访问部署好的模型吗?比如远程部署了大模型,怎么通过地址来评测这个大模型

@slin000111
Copy link
Contributor

下面测试用了vllm部署的模型,换成你的url、model以及dataset_path,可以先用curl测试远程部署的大模型。

llmuses perf --url 'http://127.0.0.1:8000/v1/chat/completions' --parallel 1 --model '/mnt/workspace/qwen2-7b-instruct/qwen/Qwen2-7B-Instruct' --log-every-n-query 10 --read-timeout=120 --dataset-path '/mnt/workspace/HC3-Chinese/open_qa.jsonl' -n 50 --max-prompt-length 128000 --api openai --stream --dataset openqa

@jackqdldd
Copy link
Author

谢谢,我试了上面的方法是可以通的,不过这个是测性能的吧,对模型结果验证需要怎么做呢?用自带的数据集或者自定义数据集验证模型的能力,模型在远端机器部署着

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants