-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Neuronx compile cache proxy and use it for LLM decoder models #410
Conversation
544a734
to
c16d996
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
c16d996
to
f274d5e
Compare
6c1ef86
to
b78972c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments, mostly nits.
It looks very promising!
docs/source/guides/cache_system.mdx
Outdated
set Set the name of the Neuron cache repo to use locally (trainium only). | ||
add Add a model to the cache of your choice (trainium only). | ||
list List models in a cache repo (trainium only). | ||
synchronize Synchronize local compiler cache with the hub cache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it both for trainium and inferentia? Otherwise I would specify that it's inferentia only as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't sure: it uploads everything under the directory corresponding to NEURON_COMPILE_CACHE_URL. I will put inferentia for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright.
Also, could you copy everything under a directory in the cache repo for now. I do not know how it will collide with the current system otherwise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not collide, because the neuronx cached files are automatically put under a root neuronxcc-<version>
directory.
docs/source/guides/cache_system.mdx
Outdated
|
||
These parameters are used to compute a hash. This hash is then used to compare local hashes for our training session against hashes stored on the Hugging Face Hub, and act accordingly (download or push). | ||
**It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
**It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.** | |
**It is important to keep in mind that even a small change in the model configuration will trigger a recompilation.** |
def get_hub_cache(): | ||
HUB_CACHE = "aws-neuron/optimum-neuron-cache" | ||
return os.getenv("CUSTOM_CACHE_REPO", HUB_CACHE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But that's expected no? On staging the aws-neuron/optimum-neuron-cache
repo does not exist.
This extends the current Hub compiler cache to allow any model compiled with
torch_neuronx
to use it transparently through a dedicated context.The
NeuronModelForCausalLM
class is modified to always use the Hub compiler cache.New API and CLI methods are added to synchronize the local neuronx compiler cache with the hub compiler cache.