-
Notifications
You must be signed in to change notification settings - Fork 744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PyTorch] Update to 2.2.1 and minor changes #1466
Conversation
@HGuillemet thanks for the update and especially adding the AOTInductor mapping. I think that's an interesting new variant to be able to use the new optimizations at least for inference. Training still needs to happen in Python in that case but we can export it to a C++ lib and then use that from Java with the bindings. Have you been able to try it already?I'll give it a try myself, just curious if you got something working already. |
No I haven't, yet. |
I don't understand the linking error on Windows:
|
There's probably some template somewhere that requires them. You'll probably get the same error on Linux and Mac if you try to link with |
Thanks for the suggestion. Adding the linker option raised an error about cudnn not linked to jnitorch_cuda. Let's see but I doubt it's related to the error on windows. |
It seems it has been spotted and fixed upstream in pytorch/pytorch@79ba397 after 2.2.0 release. |
To enable the setting of hooks in autograd graphs, I need to virtualize new Info("std::vector<torch::Tensor>", "std::vector<at::Tensor>", "std::vector<torch::autograd::Variable>", "torch::autograd::variable_list")
.valueTypes("@Cast({\"\", \"std::vector<torch::Tensor>\"}) @StdMove TensorVector")
.pointerTypes("TensorVector").define()) I wonder why this There are some other types with this kind of |
If you're not getting any compile errors, then I guess PyTorch's API was improved so that we don't need them anymore, yes |
…ensorVector valueTypes.
@sbrunk could you run your tests on this PR ? |
Tests are looking good! |
If you're not planning on making more changes for now, we can merge this? |
Ok for me. |
Could you try to fix this? We probably just need to uninstall a couple of large unnecessary packages... |
I had already added a bunch of |
Really? Could you point me to that revert and I'll try it on the actions branch here |
I'm seeing it was on deploy-centos, in fact: I'm reviewing deploy-ubuntu and adding similar cleanup. Shall I push the commit here or on a new PR ? |
Ah, that won't be enough. We'll probably need to remove a lot more stuff. You can try it here, but we won't know if it works until actual deploy, so I don't know. Let's check how many more GB we can with |
I pushed aaa37a1 on my branch but it doesn't update this PR now that it's merged. Anyway, this commit will indeed only save ~ 700Mb for pytorch build, due to mkl archive removal. If it's not enough, what about a maven clean phase on main artifact after its deploy phase and before the deploy phase of the platform/ext artifact ? This should get rid of cppbuild directory (about 7 or 8G) |
That could work, yes
|
Work in Progress
Included in this PR:
ExampleStack
andTensorExampleStack
constructorsCUDAFunctions.h
(wrappers around commonly used CUDA API functions)AddAOTInductor
(new way to run models exported from Python)FunctionPreHook
andFunctionPostHook
to enable setting hooks on autograd graphModule.asXXX
to test if aModule
is of a specific subclass (and do the cast).Float8_e5m2fnuz
andFloat8_e4m3fnuz
with unsigned zeronvrtc-builtins
(fixes Problems deploying pytorch 2.1.2-1.5.10 #1468)