Add Neuronx compile cache proxy and use it for LLM decoder models #410

dacorvo · 2024-01-15T11:40:16Z

This extends the current Hub compiler cache to allow any model compiled with torch_neuronx to use it transparently through a dedicated context.

The NeuronModelForCausalLM class is modified to always use the Hub compiler cache.

New API and CLI methods are added to synchronize the local neuronx compiler cache with the hub compiler cache.

HuggingFaceDocBuilderDev · 2024-01-15T13:32:52Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

michaelbenayoun

Left a few comments, mostly nits.
It looks very promising!

docs/source/guides/cache_system.mdx

michaelbenayoun · 2024-01-16T10:09:40Z

docs/source/guides/cache_system.mdx

+    set                 Set the name of the Neuron cache repo to use locally (trainium only).
+    add                 Add a model to the cache of your choice (trainium only).
+    list                List models in a cache repo (trainium only).
+    synchronize         Synchronize local compiler cache with the hub cache.


Is it both for trainium and inferentia? Otherwise I would specify that it's inferentia only as well.

I wasn't sure: it uploads everything under the directory corresponding to NEURON_COMPILE_CACHE_URL. I will put inferentia for now.

Alright.

Also, could you copy everything under a directory in the cache repo for now. I do not know how it will collide with the current system otherwise?

It does not collide, because the neuronx cached files are automatically put under a root neuronxcc-<version> directory.

optimum/commands/neuron/cache.py

optimum/neuron/utils/hub_neuronx_cache.py

tests/cache/test_neuronx_cache.py

michaelbenayoun · 2024-01-16T13:48:54Z

docs/source/guides/cache_system.mdx


-These parameters are used to compute a hash. This hash is then used to compare local hashes for our training session against hashes stored on the Hugging Face Hub, and act accordingly (download or push).
+**It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.**


Suggested change

**It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.**

**It is important to keep in mind that even a small change in the model configuration will trigger a recompilation.**

michaelbenayoun · 2024-01-16T13:50:53Z

optimum/neuron/utils/hub_neuronx_cache.py

+def get_hub_cache():
+    HUB_CACHE = "aws-neuron/optimum-neuron-cache"
+    return os.getenv("CUSTOM_CACHE_REPO", HUB_CACHE)


But that's expected no? On staging the aws-neuron/optimum-neuron-cache repo does not exist.

tests/cache/test_neuronx_cache.py

dacorvo force-pushed the neuronx_cc_hub_cache branch 2 times, most recently from 544a734 to c16d996 Compare January 15, 2024 13:29

dacorvo marked this pull request as ready for review January 15, 2024 14:00

dacorvo requested review from michaelbenayoun, JingyaHuang, philschmid and 5cp January 15, 2024 14:00

dacorvo added 6 commits January 16, 2024 07:45

feat: add HF Hub neuronx cache proxy

3444e17

feat(decoders): always use hub neuronx cache

4a48980

feat(cli): add cache synchronize command

5dfa1f8

doc: add reference to new cache for NeuronModelForCausalLM

874a94a

ci: add neuronx cache tests

1701fe3

feat(cache): only except when synchronizing

a9d10a7

dacorvo force-pushed the neuronx_cc_hub_cache branch from c16d996 to f274d5e Compare January 16, 2024 09:14

dacorvo added 2 commits January 16, 2024 10:11

feat(cache): catch more errors

048db0e

feat(cache): add warning on cache miss

b78972c

dacorvo force-pushed the neuronx_cc_hub_cache branch from 6c1ef86 to b78972c Compare January 16, 2024 10:11

michaelbenayoun reviewed Jan 16, 2024

View reviewed changes

dacorvo added 4 commits January 16, 2024 12:09

review: address comments

ecbec9c

fix(cli): avoid undefined symbol

37ec795

fix(utils): avoid possible circular import

af84868

review: use existing require helper

9af4039

dacorvo requested a review from michaelbenayoun January 16, 2024 12:50

michaelbenayoun reviewed Jan 16, 2024

View reviewed changes

review: address comment

f36da23

dacorvo requested a review from michaelbenayoun January 16, 2024 15:37

michaelbenayoun approved these changes Jan 17, 2024

View reviewed changes

dacorvo merged commit f81c365 into main Jan 17, 2024
6 of 7 checks passed

dacorvo deleted the neuronx_cc_hub_cache branch January 17, 2024 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Neuronx compile cache proxy and use it for LLM decoder models #410

Add Neuronx compile cache proxy and use it for LLM decoder models #410

dacorvo commented Jan 15, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 15, 2024

michaelbenayoun left a comment

michaelbenayoun Jan 16, 2024

dacorvo Jan 16, 2024

michaelbenayoun Jan 16, 2024

dacorvo Jan 16, 2024

michaelbenayoun Jan 16, 2024

michaelbenayoun Jan 16, 2024


		These parameters are used to compute a hash. This hash is then used to compare local hashes for our training session against hashes stored on the Hugging Face Hub, and act accordingly (download or push).
		It is important to keep in mind that even a small change in the Neuron configuration will trigger a recompilation.

Add Neuronx compile cache proxy and use it for LLM decoder models #410

Add Neuronx compile cache proxy and use it for LLM decoder models #410

Conversation

dacorvo commented Jan 15, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 15, 2024

michaelbenayoun left a comment

Choose a reason for hiding this comment

michaelbenayoun Jan 16, 2024

Choose a reason for hiding this comment

dacorvo Jan 16, 2024

Choose a reason for hiding this comment

michaelbenayoun Jan 16, 2024

Choose a reason for hiding this comment

dacorvo Jan 16, 2024

Choose a reason for hiding this comment

michaelbenayoun Jan 16, 2024

Choose a reason for hiding this comment

michaelbenayoun Jan 16, 2024

Choose a reason for hiding this comment

dacorvo commented Jan 15, 2024 •

edited

Loading