Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tools for auto filling traced models cache #537

Merged
merged 12 commits into from
Apr 3, 2024
Merged

Conversation

JingyaHuang
Copy link
Collaborator

@JingyaHuang JingyaHuang commented Mar 28, 2024

What does this PR do?

  • Add a tool to populate the cache of traced models.
  • Use the same way as export to construct the configs to hash, to avoid hash key difference.

Tests

  • Test with config file

    {
      "hf-internal-testing/tiny-stable-diffusion-torch": [
          {   "batch_size": 1, "height": 64, "width": 64, "num_images_per_prompt": 1, "auto_cast": "matmul", "auto_cast_type": "bf16" }
      ],
      "hf-internal-testing/tiny-random-gpt2": [
          {  "batch_size": 1, "sequence_length": 512, "num_cores": 1, "auto_cast_type": "fp16"  }
      ],
      "hf-internal-testing/tiny-random-BertModel": [
          {  "task": "text-classification", "batch_size": 1, "sequence_length": 16, "auto_cast": "matmul", "auto_cast_type": "fp16"  }
      ]
      }
    

    Command:

    python tools/auto_fill_inference_cache.py --config_file inference_cache_test_config.json
  • Test to cache a single model

    • Encoder
    python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-BertModel --task text-classification --batch_size 1 --sequence_length 64 --auto_cast matmul --auto_cast_type bf16
    
    • Decoder
    python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-random-gpt2 --batch_size 1 --sequence_length 512 --num_cores 1 --auto_cast_type bf16
    
    • Stable Diffusion
    python tools/auto_fill_inference_cache.py --hf_model_id hf-internal-testing/tiny-stable-diffusion-torch --batch_size 1 --height 64 --width 64 --auto_cast matmul --auto_cast_type bf16
    

Next steps

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

optimum/exporters/neuron/__main__.py Show resolved Hide resolved
tools/auto_fill_stable_diffusion_cache.py Outdated Show resolved Hide resolved
tools/auto_fill_stable_diffusion_cache.py Outdated Show resolved Hide resolved
tools/auto_fill_stable_diffusion_cache.py Outdated Show resolved Hide resolved
tools/auto_fill_inference_cache.py Outdated Show resolved Hide resolved
compile_and_cache_model(
hf_model_id=model_id,
inference_type=model_config["inference_type"],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field does not exist in the model configs, and I think it will be a bit tedious to add it every single time. It should be deduced from the model_id IMHO.

See for instance the config for llama variants:

https://huggingface.co/aws-neuron/optimum-neuron-cache/blob/main/inference-cache-config/llama-variants.json

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically, what I am requesting is that the script continues to accept the existing config files with minimal changes.

@@ -111,23 +203,39 @@ def compile_and_cache_model(hf_model_id, batch_size, sequence_length, num_cores,
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Compile and cache a model to the Hugging Face Hub.")
parser.add_argument("--hf_model_id", type=str, help="Hugging Face model ID to compile.")
parser.add_argument(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could deduce this from the model_id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, theoretically we can leverage TasksManager.infer_task_from_model but it's not working for now for stable diffusion models, I made a pull request in Optimum: huggingface/optimum#1793. I can perhaps put a workaround here before it makes its way to a stable optimum release.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we need to at least specify in the config files that the task is "text-generation" for decoders if we want to keep just one script for encoder / decoder and stable diffusion. Event with the TaskManager, the default task inferred from eg. gpt2 is feature-extraction.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. The neuronx exporter code works without specifying a task AFAIK, even for gpt2.


To figure out the model is a decoder, I get the task, then the config, all from the model_id.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure? Without specifying the task text-generation, I am getting the following error when trying to export decoder:

optimum-cli export neuron --model hf-internal-testing/tiny-random-gpt2 --sequence_length 64 --batch_size 1 --auto_cast_type bf16 --num_cores 1 tiny_gpt
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 616, in <module>
    main()
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 567, in main
    input_shapes, neuron_config_class = get_input_shapes_and_config_class(task, args)
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 119, in get_input_shapes_and_config_class
    neuron_config_constructor = TasksManager.get_exporter_config_constructor(
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/tasks.py", line 1961, in get_exporter_config_constructor
    raise ValueError(
ValueError: gpt2 doesn't support task feature-extraction for the neuron backend. Supported tasks are: text-generation.
Traceback (most recent call last):
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/bin/optimum-cli", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/commands/optimum_cli.py", line 163, in main
    service.run()
  File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/commands/export/neuronx.py", line 266, in run
    subprocess.run(full_command, shell=True, check=True)
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model hf-internal-testing/tiny-random-gpt2 --sequence_length 64 --batch_size 1 --auto_cast_type bf16 --num_cores 1 tiny_gpt' returned non-zero exit status 1.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For gpt2 it works, but not for the tiny model.

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to @dacorvo's comment, LGTM otherwise


# Log time taken
logger.info(f"Compiled and cached model {hf_model_id} w{time.time() - start:.2f} seconds")


def infer_task_from_model_path(model_id: str):
Copy link
Collaborator Author

@JingyaHuang JingyaHuang Apr 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dacorvo I put a workaroud here to deduce if the task is text-generation, in this case we could continue using previous configs, but I feel it quite suboptimal, as we are not supposed to be able to infer the task of let's say gpt from its model id, since it could also support tasks like "text-classification" besides "text-generation" although it's not yet the case in Optimum Neuron.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so I should update the inference cache config files then.

Copy link
Collaborator

@dacorvo dacorvo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the pull-request !

@JingyaHuang JingyaHuang merged commit 6856557 into main Apr 3, 2024
14 checks passed
@JingyaHuang JingyaHuang deleted the traced-cache-tools branch April 3, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants