Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Dear teachers, when i try to use spu on torch, i found a problem #771

Closed
zhangwaer opened this issue Jul 16, 2024 · 7 comments
Closed

Comments

@zhangwaer
Copy link

zhangwaer commented Jul 16, 2024

Issue Type

Others

Modules Involved

Others

Have you reproduced the bug with SPU HEAD?

Yes

Have you searched existing issues?

Yes

SPU Version

spu0.9.1b0

OS Platform and Distribution

ubuntu18.04

Python Version

3.10.14

Compiler Version

GCC11.2.1

Current Behavior?

Dear teachers, when i try to use gpt2 on spu, i found a problem that i cannot transfer the params successfully, could you please give me some advice on how to transfer it, thanks very much!!

Traceback (most recent call last):
File "/a-tmp/g/a.py", line 32, in
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 1302, in forward
transformer_outputs = self.transformer(
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 1002, in forward
input_shape = input_ids.size()
^^^^^^^^^^^^^^
AttributeError: 'collections.OrderedDict' object has no attribute 'size'

import torch
from transformers import AutoTokenizer, GPT2LMHeadModel, GPT2Config
import json
import urllib
from collections import OrderedDict
from jax.tree_util import tree_map
configuration = GPT2Config()
model =GPT2LMHeadModel(configuration)
pre_model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer=AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")
params=pre_model.state_dict()
res = model(params, inputs)
print(res)

Standalone code to reproduce the issue

print("A bug")

Relevant log output

No response

@tpppppub
Copy link
Member

Your code and errors seem to be that you are using the transformers GPT2 model incorrectly. I don't see anywhere you are calling SPU.

@zhangwaer
Copy link
Author

zhangwaer commented Jul 16, 2024

Your code and errors seem to be that you are using the transformers GPT2 model incorrectly. I don't see anywhere you are calling SPU.

sorry, i use the following code to run gpt2 on spu, but it output "I0000 00:00:1721093883.848833 2880 cpu_client.cc:405] TfrtCpuClient created" and stuck for a long time, so i hope to get some advice on how to transfer the params, thank you very much!!

import torch
from transformers import AutoTokenizer, GPT2LMHeadModel, GPT2Config
import spu.utils.distributed as ppd
import json
import urllib
from collections import OrderedDict
from jax.tree_util import tree_map
with open("3pc.json", 'r') as file:
conf = json.load(file)
ppd.init(conf["nodes"], conf["devices"], framework=ppd.Framework.EXP_TORCH)
configuration = GPT2Config()
model =GPT2LMHeadModel(configuration)
pre_model = GPT2LMHeadModel.from_pretrained("gpt2")
tokenizer=AutoTokenizer.from_pretrained("gpt2")
inputs = tokenizer.encode("Hello, my dog is cute", return_tensors="pt")
params=pre_model.state_dict()
params = ppd.device("P1")(lambda input: tree_map(lambda x: x.detach().numpy(), input))(params)
inputs= ppd.device("P2")(lambda x: x.detach().numpy())(inputs)
res = ppd.device("SPU")(model)(params, inputs)

@tpppppub
Copy link
Member

replace res = ppd.device("SPU")(model)(params, inputs) with res = ppd.device("SPU")(pre_model)(params, inputs)

@zhangwaer zhangwaer reopened this Jul 16, 2024
@zhangwaer
Copy link
Author

zhangwaer commented Jul 16, 2024

replace res = ppd.device("SPU")(model)(params, inputs) with res = ppd.device("SPU")(pre_model)(params, inputs)

thanks very much, but i have a question that the pre_model can load params through "
res = ppd.device("SPU")(pre_model)(params, inputs)", but model cannot load it, thanks very much!!

@tpppppub
Copy link
Member

a possible reason is model which is initialized with an empty config has a different DAG with pre_model

@zhangwaer zhangwaer reopened this Jul 16, 2024
@tpppppub
Copy link
Member

Currently, SPU for PyTorch is experimental and in the early stage, using some dirty hacks to support torch.nn.Module inference. As a result, for a PyTorch model (no matter from Hugginface or somewhere else), running it on SPU doesn't fully align with its plaintext version.

@zhangwaer
Copy link
Author

zhangwaer commented Jul 16, 2024

Currently, SPU for PyTorch is experimental and in the early stage, using some dirty hacks to support torch.nn.Module inference. As a result, for a PyTorch model (no matter from Hugginface or somewhere else), running it on SPU doesn't fully align with its plaintext version.
Thanks very much!!!!!!! looking forward to the better version!!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants