Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neuronx cache registry #442

Merged
merged 9 commits into from
Jan 26, 2024
Merged

Add neuronx cache registry #442

merged 9 commits into from
Jan 26, 2024

Conversation

dacorvo
Copy link
Collaborator

@dacorvo dacorvo commented Jan 25, 2024

This extends the neuronx caching for decoder models to store the cached neuron configurations under a specific registry folder for each neuronxcc compiler version.

aws-neuron/optimum-neuron-cache/                                                
└── neuronxcc-2.12.54.0+f631c2365
    └── registry
       ├── hf-internal-testing
       │   └── tiny-random-gpt2
       │       └── 8019d93dd8eda6d8da82.json
       └── mistralai
            └── Mistral-7B-Instruct-v0.1
                └── b12f53f65aebf02fdfa9.json

This also adds a get_hub_cached_entries helper and CLI command to lookup cached configurations for a specific model_id.

The output looks like this:

$ optimum-cli neuron cache lookup mistralai/Mistral-7B-Instruct-v0.1
*** 1 entrie(s) found in cache for mistralai/Mistral-7B-Instruct-v0.1 ***


task: text-generation
batch_size: 1
num_cores: 2
auto_cast_type: bf16
sequence_length: 2048
compiler_type: neuronx-cc
compiler_version: 2.12.54.0+f631c2365
checkpoint_id: mistralai/Mistral-7B-Instruct-v0.1
checkpoint_revision: 9ab9e76e2b09f9f29ea2d56aa5bd139e4445c59e

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Comment on lines +609 to +611
preprocess_logits_for_metrics=(
preprocess_logits_for_metrics if training_args.do_eval and not is_torch_tpu_available() else None
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just formatting?
Anyways those example are automatically generated from the Transformers library so this change will be overriden when we synchronize with the next release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I don't apply them, then the check-code-quality fails. I could exclude the examples from it though ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I just wanted to check if the change was only about styling. After reading the rest of the PR I understood it was automatic reformatting.

optimum/neuron/utils/hub_neuronx_cache.py Outdated Show resolved Hide resolved
Co-authored-by: Michael Benayoun <mickbenayoun@gmail.com>
Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

@dacorvo dacorvo merged commit 0f7bf4a into main Jan 26, 2024
8 checks passed
@dacorvo dacorvo deleted the cache_registry branch January 26, 2024 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants