Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Neuronx compile cache proxy and use it for LLM decoder models #410

Merged
merged 13 commits into from
Jan 17, 2024

Conversation

dacorvo
Copy link
Collaborator

@dacorvo dacorvo commented Jan 15, 2024

This extends the current Hub compiler cache to allow any model compiled with torch_neuronx to use it transparently through a dedicated context.

The NeuronModelForCausalLM class is modified to always use the Hub compiler cache.

New API and CLI methods are added to synchronize the local neuronx compiler cache with the hub compiler cache.

@dacorvo dacorvo force-pushed the neuronx_cc_hub_cache branch 2 times, most recently from 544a734 to c16d996 Compare January 15, 2024 13:29
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@dacorvo dacorvo marked this pull request as ready for review January 15, 2024 14:00
@dacorvo dacorvo force-pushed the neuronx_cc_hub_cache branch from c16d996 to f274d5e Compare January 16, 2024 09:14
@dacorvo dacorvo force-pushed the neuronx_cc_hub_cache branch from 6c1ef86 to b78972c Compare January 16, 2024 10:11
Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments, mostly nits.
It looks very promising!

docs/source/guides/cache_system.mdx Outdated Show resolved Hide resolved
docs/source/guides/cache_system.mdx Outdated Show resolved Hide resolved
set Set the name of the Neuron cache repo to use locally (trainium only).
add Add a model to the cache of your choice (trainium only).
list List models in a cache repo (trainium only).
synchronize Synchronize local compiler cache with the hub cache.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it both for trainium and inferentia? Otherwise I would specify that it's inferentia only as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure: it uploads everything under the directory corresponding to NEURON_COMPILE_CACHE_URL. I will put inferentia for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright.

Also, could you copy everything under a directory in the cache repo for now. I do not know how it will collide with the current system otherwise?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not collide, because the neuronx cached files are automatically put under a root neuronxcc-<version> directory.

optimum/commands/neuron/cache.py Outdated Show resolved Hide resolved
optimum/neuron/utils/hub_neuronx_cache.py Outdated Show resolved Hide resolved
optimum/neuron/utils/hub_neuronx_cache.py Show resolved Hide resolved
optimum/neuron/utils/hub_neuronx_cache.py Show resolved Hide resolved
optimum/neuron/utils/hub_neuronx_cache.py Outdated Show resolved Hide resolved
optimum/neuron/utils/hub_neuronx_cache.py Show resolved Hide resolved
tests/cache/test_neuronx_cache.py Show resolved Hide resolved

These parameters are used to compute a hash. This hash is then used to compare local hashes for our training session against hashes stored on the Hugging Face Hub, and act accordingly (download or push).
**It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.**
**It is important to keep in mind that even a small change in the model configuration will trigger a recompilation.**

Comment on lines +179 to +181
def get_hub_cache():
HUB_CACHE = "aws-neuron/optimum-neuron-cache"
return os.getenv("CUSTOM_CACHE_REPO", HUB_CACHE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that's expected no? On staging the aws-neuron/optimum-neuron-cache repo does not exist.

tests/cache/test_neuronx_cache.py Show resolved Hide resolved
@dacorvo dacorvo merged commit f81c365 into main Jan 17, 2024
6 of 7 checks passed
@dacorvo dacorvo deleted the neuronx_cc_hub_cache branch January 17, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants