-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use AWS Neuron sdk 2.18 #547
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@JingyaHuang the neuronx SD cache test is failing. |
@michaelbenayoun some trainium tests are failing. It may be related to changes in the way neuronx-distributed loads weights (safetensors related error messages). |
74d9cba
to
e5a72b5
Compare
@michaelbenayoun there is a newly failing distributed test with AWS 2.18, probably after your latest changes. |
It's weird that it fails.
It might be linked to |
c6f8fdd
to
fa21917
Compare
No regression found but the SDXL separate weights: let's merge this ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, let's merge it! Thanks for taking care of the upgrade.
What does this PR do?
This pull-request bumps the AWS Neuron SDK version to 2.18.
It also bumps the TGI router version to 1.4.4 to fix build issues due to underlying rust packages updates.
To update your local host, do the following:
$ sudo apt update $ sudo apt install -u aws-neuronx-dkms aws-neuronx-runtime-lib aws-neuronx-collectives aws-neuronx-tools $ pip install -U neuronx-cc torch-neuronx==1.13.* transformers-neuronx neuronx-distributed