Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huggingface 版必须要安装flash attention? #70

Open
Armod-I opened this issue Dec 25, 2023 · 8 comments
Open

Huggingface 版必须要安装flash attention? #70

Armod-I opened this issue Dec 25, 2023 · 8 comments

Comments

@Armod-I
Copy link

Armod-I commented Dec 25, 2023

报错如下:
ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn

其它模型可以不用flash attention的

@zhaoxudong01
Copy link
Collaborator

可以使用我们开源的镜像,有安装flash_attn

@Armod-I
Copy link
Author

Armod-I commented Dec 25, 2023

可以使用我们开源的镜像,有安装flash_attn

flash_attn 不支持V100 GPU。

我手工关掉了Flash attention,模型可以跑了,但目前发现无法复现megatron版的输出

模型:Yuan 2.0 2B hf
推理代码:使用Huggingface模型主页的代码
输入:

编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python

输出:

<s> 编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python
</s>100000000000000000000000000000000000000000000000000

@zhaoxudong01
Copy link
Collaborator

我们做了如下测试,输出结果是正常的。
请在“代码如下:”后面的位置加入<sep>试一下。

输入:

编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。示例:string_reverse('hello') olleh 代码如下:

输出:

<sep> ```python
def string_reverse(string):
    return string[::-1]
```<eod>

@Armod-I
Copy link
Author

Armod-I commented Dec 26, 2023

读取模型代码:

import torch, transformers
from transformers import AutoModelForCausalLM,AutoTokenizer,LlamaTokenizer

print("Creat tokenizer...")
tokenizer = LlamaTokenizer.from_pretrained(yuan_path)
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)

print("Creat model...")
model = AutoModelForCausalLM.from_pretrained(yuan_path, torch_dtype=torch.bfloat16, trust_remote_code=True).to('cuda:1')

推理代码:

question = """编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:<sep>
```python
"""

inputs = tokenizer(question, return_tensors="pt")["input_ids"].to("cuda:1")
outputs = model.generate(inputs,do_sample=False,max_length=200)
print(tokenizer.decode(outputs[0]))

输出:

<s> 编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:<sep> ```python
</s>
```
# 单元测试用例:
```python
def test_string_reverse():
    assert string_reverse('hello') == 'olleh'
    assert string_reverse('world') == 'dlrow'
    assert string_reverse('python') == 'nohtyp'
```<eod>

模型没有去实现string_reverse函数,只是写了一些测试用例

@zhaoxudong01
Copy link
Collaborator

@Hicollj

@Hicollj
Copy link
Collaborator

Hicollj commented Dec 26, 2023

请尝试以下输入:

问题描述:编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python
def string_reverse(string):

另外,请务必使用贪婪搜索(greedy decoding)生成代码,可令temperature=1,top_k=1。

@Shawn-IEITSystems
Copy link
Collaborator

@Armod-I 请问问题是否已解决?

@pengyb2001
Copy link

pengyb2001 commented Jan 16, 2024

可以使用我们开源的镜像,有安装flash_attn

flash_attn 不支持V100 GPU。

我手工关掉了Flash attention,模型可以跑了,但目前发现无法复现megatron版的输出

模型:Yuan 2.0 2B hf 推理代码:使用Huggingface模型主页的代码 输入:

编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python

输出:

<s> 编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python
</s>100000000000000000000000000000000000000000000000000

可以使用我们开源的镜像,有安装flash_attn

flash_attn 不支持V100 GPU。

我手工关掉了Flash attention,模型可以跑了,但目前发现无法复现megatron版的输出

模型:Yuan 2.0 2B hf 推理代码:使用Huggingface模型主页的代码 输入:

编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python

输出:

<s> 编写一个 Python 函数,它接受一个字符串作为参数,并返回该字符串的反转版本。
示例:
>>> string_reverse('hello')
olleh
代码如下:
```python
</s>100000000000000000000000000000000000000000000000000

请问您是如何手动关掉flash_attn的呢,我想用CPU跑这个模型,我尝试按huggingface上面的https://huggingface.co/IEITYuan/Yuan2-2B-hf/blob/main/README.md
的调用方法并修改为

import torch, transformers
import sys, os
sys.path.append(
    os.path.abspath(os.path.join(os.path.dirname(__file__), os.path.pardir)))
from transformers import AutoModelForCausalLM, AutoTokenizer, LlamaTokenizer

print("Creating tokenizer...")
tokenizer = LlamaTokenizer.from_pretrained('/mnt/disk1/models/Yuan2-2B-hf', add_eos_token=False, add_bos_token=False, eos_token='<eod>')
tokenizer.add_tokens(['<sep>', '<pad>', '<mask>', '<predict>', '<FIM_SUFFIX>', '<FIM_PREFIX>', '<FIM_MIDDLE>','<commit_before>','<commit_msg>','<commit_after>','<jupyter_start>','<jupyter_text>','<jupyter_code>','<jupyter_output>','<empty_output>'], special_tokens=True)

print("Creating model...")
# 注意这里移除了对GPU的特定参数
model = AutoModelForCausalLM.from_pretrained('/mnt/disk1/models/Yuan2-2B-hf', use_flash_attention=False)
print(model.config)

inputs = tokenizer("请问目前最先进的机器学习算法有哪些?", return_tensors="pt")["input_ids"]
outputs = model.generate(inputs, do_sample=False, max_length=100)
print(tokenizer.decode(outputs[0])) 

但是还是会报错ImportError: This modeling file requires the following packages that were not found in your environment: flash_attn. Run pip install flash_attn
后续:我手动关了flash_attn 见该issue92

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants