Do not upload NeuronModelForCausalLM weights when they can be reconstructed from the hub #413

dacorvo · 2024-01-17T14:26:19Z

When serializing a model, the checkpoint files (i.e. the original
weights formatted for neuron) and the compiled artifacts are stored
under two folders in the model directory.
This modifies the NeuronForCausalLM.push_to_hub method to exclude
the folder containing the checkpoint files if they can be instead
reconstructed from a model on the hub.

Note that it is still possible to upload checkpoint files by using the
huggingface_cli directly on the saved model folder instead of
push_to_hub.

To allow optimum-neuron to fetch and reconstruct the checkpoint when
fetchin a neuron model from the hub, the original repository that was
used to export the model and its revision are stored in the neuron config.

Example model here:

https://huggingface.co/dacorvo/llama-2-7b-chat-hf-neuronx-bs1-seq2048-no-checkpoint/tree/main

"neuron": {
  "auto_cast_type": "fp16",
  "batch_size": 1,
  "checkpoint_id": "meta-llama/Llama-2-7b-chat-hf",
  "checkpoint_revision": "c1b0db933684edbfe29a06fa47eb19cc48025e93",
  "compiler_type": "neuronx-cc",
  "compiler_version": "2.12.54.0+f631c2365",
  "num_cores": 2,
  "sequence_length": 2048,
  "task": "text-generation"
}

HuggingFaceDocBuilderDev · 2024-01-17T14:29:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

When serializing a model, the checkpoint files (i.e. the original weights formatted for neuron) and the compiled artifacts are stored under two folder in the model directory. This modifies the NeuronForCausalLM.push_to_hub method to exclude the folder containing the checkpoint files if they can be instead reconstructed from a model on the hub. Note that it is still possible to upload checkpoint files by using the huggingface_cli directly on the saved model folder instead of push_to_hub. To allow optimum-neuron to fetch and reconstruct the checkpoint when fetchin a neuron model from the hub, the original repository that was used to export the model and its revision are stored in the neuron config.

michaelbenayoun

LGTM

JingyaHuang

LGTM!

JingyaHuang · 2024-01-18T10:40:22Z

tests/generation/test_hub.py

+    "model_id, revision",
+    [
+        ["dacorvo/tiny-random-gpt2-neuronx", "1b3456cf877cc42c053ee8464f1067021eccde4b"],
+        ["dacorvo/tiny-random-gpt2-neuronx-no-checkpoint", "78eb2313ab7e149bbc22ff32257db93ba09e3033"],


For the test with no checkpoint, shall we check there is no checkpoint after push to hub?

It is a download here: the check is done in the other test.

refactor(decoder): use huggingface_hub.get_token

49e5fac

dacorvo added 3 commits January 17, 2024 14:33

refactor(decoder): prepare to exclude checkpoints from serialization

68a852b

test(hub): use common test helper

25d00d0

dacorvo force-pushed the do_not_upload_model_weights branch from 298c840 to 87c7341 Compare January 17, 2024 14:42

refactor(decoder): add require helper

f07d0b1

dacorvo force-pushed the do_not_upload_model_weights branch from 87c7341 to f07d0b1 Compare January 17, 2024 14:52

tests(pipelines): update model revision

71fed25

dacorvo requested review from JingyaHuang and michaelbenayoun January 17, 2024 20:27

dacorvo marked this pull request as ready for review January 17, 2024 20:27

michaelbenayoun approved these changes Jan 18, 2024

View reviewed changes

JingyaHuang approved these changes Jan 18, 2024

View reviewed changes

dacorvo merged commit c60935b into main Jan 18, 2024
6 of 7 checks passed

dacorvo deleted the do_not_upload_model_weights branch January 18, 2024 10:49

dacorvo mentioned this pull request Jan 30, 2024

Throttling issue when uploading compiled LLaMA 2 model for inference to HF Hub #311

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Do not upload NeuronModelForCausalLM weights when they can be reconstructed from the hub #413

Do not upload NeuronModelForCausalLM weights when they can be reconstructed from the hub #413

dacorvo commented Jan 17, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jan 17, 2024

michaelbenayoun left a comment

JingyaHuang left a comment

JingyaHuang Jan 18, 2024

dacorvo Jan 18, 2024

Do not upload NeuronModelForCausalLM weights when they can be reconstructed from the hub #413

Do not upload NeuronModelForCausalLM weights when they can be reconstructed from the hub #413

Conversation

dacorvo commented Jan 17, 2024 • edited Loading

HuggingFaceDocBuilderDev commented Jan 17, 2024

michaelbenayoun left a comment

Choose a reason for hiding this comment

JingyaHuang left a comment

Choose a reason for hiding this comment

JingyaHuang Jan 18, 2024

Choose a reason for hiding this comment

dacorvo Jan 18, 2024

Choose a reason for hiding this comment

dacorvo commented Jan 17, 2024 •

edited

Loading