Add tools for auto filling traced models cache #537

JingyaHuang · 2024-03-28T10:26:10Z

What does this PR do?

Add a tool to populate the cache of traced models.
Use the same way as export to construct the configs to hash, to avoid hash key difference.

Tests

Test with config file

{
  "hf-internal-testing/tiny-stable-diffusion-torch": [
      {   "batch_size": 1, "height": 64, "width": 64, "num_images_per_prompt": 1, "auto_cast": "matmul", "auto_cast_type": "bf16" }
  ],
  "hf-internal-testing/tiny-random-gpt2": [
      {  "batch_size": 1, "sequence_length": 512, "num_cores": 1, "auto_cast_type": "fp16"  }
  ],
  "hf-internal-testing/tiny-random-BertModel": [
      {  "task": "text-classification", "batch_size": 1, "sequence_length": 16, "auto_cast": "matmul", "auto_cast_type": "fp16"  }
  ]
  }

Command:

python tools/auto_fill_inference_cache.py --config_file inference_cache_test_config.json

Test to cache a single model

Encoder

python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-BertModel --task text-classification --batch_size 1 --sequence_length 64 --auto_cast matmul --auto_cast_type bf16

Decoder

python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-gpt2 --batch_size 1 --sequence_length 512 --num_cores 1 --auto_cast_type bf16

Stable Diffusion

python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-stable-diffusion-torch --batch_size 1 --height 64 --width 64 --auto_cast matmul --auto_cast_type bf16

Next steps

Will update existing configs in aws-neuron/optimum-neuron-cache
Will upload configs for the encoder and stable diffusion models

HuggingFaceDocBuilderDev · 2024-03-28T15:46:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/exporters/neuron/__main__.py

tools/auto_fill_stable_diffusion_cache.py

tools/auto_fill_inference_cache.py

dacorvo · 2024-04-03T09:26:19Z

tools/auto_fill_inference_cache.py

                compile_and_cache_model(
                    hf_model_id=model_id,
+                    inference_type=model_config["inference_type"],


This field does not exist in the model configs, and I think it will be a bit tedious to add it every single time. It should be deduced from the model_id IMHO.

See for instance the config for llama variants:

https://huggingface.co/aws-neuron/optimum-neuron-cache/blob/main/inference-cache-config/llama-variants.json

So basically, what I am requesting is that the script continues to accept the existing config files with minimal changes.

dacorvo · 2024-04-03T09:27:57Z

tools/auto_fill_inference_cache.py

@@ -111,23 +203,39 @@ def compile_and_cache_model(hf_model_id, batch_size, sequence_length, num_cores,
 if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Compile and cache a model to the Hugging Face Hub.")
    parser.add_argument("--hf_model_id", type=str, help="Hugging Face model ID to compile.")
+    parser.add_argument(


I think we could deduce this from the model_id.

yeah, theoretically we can leverage TasksManager.infer_task_from_model but it's not working for now for stable diffusion models, I made a pull request in Optimum: huggingface/optimum#1793. I can perhaps put a workaround here before it makes its way to a stable optimum release.

I think that we need to at least specify in the config files that the task is "text-generation" for decoders if we want to keep just one script for encoder / decoder and stable diffusion. Event with the TaskManager, the default task inferred from eg. gpt2 is feature-extraction.

I don't think so. The neuronx exporter code works without specifying a task AFAIK, even for gpt2.

optimum-neuron/optimum/exporters/neuron/__main__.py

Line 561 in 3005c77

def main():

To figure out the model is a decoder, I get the task, then the config, all from the model_id.

Are you sure? Without specifying the task text-generation, I am getting the following error when trying to export decoder:

optimum-cli export neuron --model hf-internal-testing/tiny-random-gpt2 --sequence_length 64 --batch_size 1 --auto_cast_type bf16 --num_cores 1 tiny_gpt

Traceback (most recent call last): File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 616, in <module> main() File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 567, in main input_shapes, neuron_config_class = get_input_shapes_and_config_class(task, args) File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 119, in get_input_shapes_and_config_class neuron_config_constructor = TasksManager.get_exporter_config_constructor( File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/tasks.py", line 1961, in get_exporter_config_constructor raise ValueError( ValueError: gpt2 doesn't support task feature-extraction for the neuron backend. Supported tasks are: text-generation. Traceback (most recent call last): File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/bin/optimum-cli", line 8, in <module> sys.exit(main()) File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/commands/optimum_cli.py", line 163, in main service.run() File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/commands/export/neuronx.py", line 266, in run subprocess.run(full_command, shell=True, check=True) File "/usr/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model hf-internal-testing/tiny-random-gpt2 --sequence_length 64 --batch_size 1 --auto_cast_type bf16 --num_cores 1 tiny_gpt' returned non-zero exit status 1.

For gpt2 it works, but not for the tiny model.

michaelbenayoun

+1 to @dacorvo's comment, LGTM otherwise

tools/auto_fill_inference_cache.py

JingyaHuang · 2024-04-03T15:38:01Z

tools/auto_fill_inference_cache.py


    # Log time taken
    logger.info(f"Compiled and cached model {hf_model_id} w{time.time() - start:.2f} seconds")


+def infer_task_from_model_path(model_id: str):


@dacorvo I put a workaroud here to deduce if the task is text-generation, in this case we could continue using previous configs, but I feel it quite suboptimal, as we are not supposed to be able to infer the task of let's say gpt from its model id, since it could also support tasks like "text-classification" besides "text-generation" although it's not yet the case in Optimum Neuron.

OK, so I should update the inference cache config files then.

dacorvo

Thank you for the pull-request !

JingyaHuang added 2 commits March 28, 2024 10:24

add tool

b83a134

improve sd hash

363dcf2

root and others added 3 commits March 29, 2024 16:02

fix

3696c5e

Merge branch 'main' into traced-cache-tools

9e965f8

remove json

f829482

JingyaHuang requested review from michaelbenayoun, philschmid and dacorvo March 29, 2024 16:06

michaelbenayoun reviewed Mar 29, 2024

View reviewed changes

JingyaHuang added 2 commits April 1, 2024 22:10

fix tests

df4e568

fix tests

b51137e

dacorvo reviewed Apr 2, 2024

View reviewed changes

tools/auto_fill_stable_diffusion_cache.py Outdated Show resolved Hide resolved

apply suggestions

8ce6c8c

JingyaHuang requested review from dacorvo and michaelbenayoun April 3, 2024 08:47

dacorvo reviewed Apr 3, 2024

View reviewed changes

michaelbenayoun approved these changes Apr 3, 2024

View reviewed changes

JingyaHuang added 2 commits April 3, 2024 13:24

use task

d24b30e

remove debug

2d7d7ab

dacorvo reviewed Apr 3, 2024

View reviewed changes

tools/auto_fill_inference_cache.py Outdated Show resolved Hide resolved

JingyaHuang added 2 commits April 3, 2024 15:32

infer task and decouple a func in __main__

79b2bec

restore assert

b8daad6

JingyaHuang commented Apr 3, 2024

View reviewed changes

dacorvo approved these changes Apr 3, 2024

View reviewed changes

JingyaHuang merged commit 6856557 into main Apr 3, 2024
14 checks passed

JingyaHuang deleted the traced-cache-tools branch April 3, 2024 19:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tools for auto filling traced models cache #537

Add tools for auto filling traced models cache #537

JingyaHuang commented Mar 28, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Mar 28, 2024

dacorvo Apr 3, 2024

dacorvo Apr 3, 2024

dacorvo Apr 3, 2024

JingyaHuang Apr 3, 2024

JingyaHuang Apr 3, 2024

dacorvo Apr 3, 2024

JingyaHuang Apr 3, 2024

dacorvo Apr 3, 2024

michaelbenayoun left a comment

JingyaHuang Apr 3, 2024 •

edited

Loading

dacorvo Apr 3, 2024

dacorvo left a comment

Add tools for auto filling traced models cache #537

Add tools for auto filling traced models cache #537

Conversation

JingyaHuang commented Mar 28, 2024 • edited Loading

What does this PR do?

Tests

Next steps

HuggingFaceDocBuilderDev commented Mar 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

michaelbenayoun left a comment

Choose a reason for hiding this comment

JingyaHuang Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dacorvo left a comment

Choose a reason for hiding this comment

JingyaHuang commented Mar 28, 2024 •

edited

Loading

JingyaHuang Apr 3, 2024 •

edited

Loading