-
Notifications
You must be signed in to change notification settings - Fork 201
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong device when using device="cpu" with torch.device #503
Comments
I understand the concern, however I do not think this should be fixed.
The behavior is changeable, by being extremely defensive about those global modifiers that are the context managers. However as a general rule, I'd avoid using any rules depending on torch internals, since they tend to change a lot (this context manager didn't exist when this library was created). Also, in the future, there might be ways to skip CPU allocation entirely meaning the internals would have to be rewritten again, introducing potential breaking changes (which is highly avoided in this library). Given all these considerations, I'm marking this as wontfix, and instead encourage you and other users to not use contradicting device locations from a context manager. So: Either with torch.device("cuda:0"):
# Fix the context manager, that make everything much faster.
with torch.device("cpu"):
weights = load_file(filename) Or # Remove the context manager.
weights = load_file(filename, device="cpu") Just to note that with torch.device("cuda:0"):
weights = load_file(filename) is totally valid intent behavior and will work transparently currently (contrary to what would happen should we be defensive). |
I agree with you broken designs (context managers have all kinds of problems, and we are inventing high performance compute here, where cycles matter), so its better to say no. How viable would a deprecation warning be? If context manager -> poof. |
@chrisdesa May I interpret what is going on here? I believe your assertion as to what is happening isn't quite right. Here are the steps:
One of the problems with this flow is that the CPU is entirely responsible for pushing data around. Another problem is that the CPU is executing the There are more advanced techniques that could involve no host memory buffer copies at all (why copy, when you could map instead?). For more detail, see: on-demand paged loading of models in vllm Also, @Narsil, if you are the maintainer of safetensor, I am a super fan. Very nicely done. Now if we could only eject the 6 other serialization approaches we have in vllm ;-) |
System Info
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.
transformers
version: 4.43.1Information
Reproduction
Expected behavior
The expected behavior is for the
device='cpu'
argument to override the torch default and load the tensors on the CPU, but the actual behavior is it loads the tensor onto the'cuda:0'
GPU. Thedevice='cpu'
argument seems to be interpreted as "whatever the default device is right now" and not as the actual CPU.The text was updated successfully, but these errors were encountered: