-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tools for auto filling traced models cache #537
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
tools/auto_fill_inference_cache.py
Outdated
compile_and_cache_model( | ||
hf_model_id=model_id, | ||
inference_type=model_config["inference_type"], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This field does not exist in the model configs, and I think it will be a bit tedious to add it every single time. It should be deduced from the model_id IMHO.
See for instance the config for llama variants:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically, what I am requesting is that the script continues to accept the existing config files with minimal changes.
tools/auto_fill_inference_cache.py
Outdated
@@ -111,23 +203,39 @@ def compile_and_cache_model(hf_model_id, batch_size, sequence_length, num_cores, | |||
if __name__ == "__main__": | |||
parser = argparse.ArgumentParser(description="Compile and cache a model to the Hugging Face Hub.") | |||
parser.add_argument("--hf_model_id", type=str, help="Hugging Face model ID to compile.") | |||
parser.add_argument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could deduce this from the model_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, theoretically we can leverage TasksManager.infer_task_from_model
but it's not working for now for stable diffusion models, I made a pull request in Optimum: huggingface/optimum#1793. I can perhaps put a workaround here before it makes its way to a stable optimum release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we need to at least specify in the config files that the task is "text-generation" for decoders if we want to keep just one script for encoder / decoder and stable diffusion. Event with the TaskManager, the default task inferred from eg. gpt2 is feature-extraction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so. The neuronx exporter code works without specifying a task AFAIK, even for gpt2.
def main(): |
To figure out the model is a decoder, I get the task, then the config, all from the model_id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? Without specifying the task text-generation
, I am getting the following error when trying to export decoder:
optimum-cli export neuron --model hf-internal-testing/tiny-random-gpt2 --sequence_length 64 --batch_size 1 --auto_cast_type bf16 --num_cores 1 tiny_gpt
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 616, in <module>
main()
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 567, in main
input_shapes, neuron_config_class = get_input_shapes_and_config_class(task, args)
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/neuron/__main__.py", line 119, in get_input_shapes_and_config_class
neuron_config_constructor = TasksManager.get_exporter_config_constructor(
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/exporters/tasks.py", line 1961, in get_exporter_config_constructor
raise ValueError(
ValueError: gpt2 doesn't support task feature-extraction for the neuron backend. Supported tasks are: text-generation.
Traceback (most recent call last):
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/bin/optimum-cli", line 8, in <module>
sys.exit(main())
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/commands/optimum_cli.py", line 163, in main
service.run()
File "/home/ubuntu/pyvenv/aws_neuron_venv2.18_pt212/lib/python3.8/site-packages/optimum/commands/export/neuronx.py", line 266, in run
subprocess.run(full_command, shell=True, check=True)
File "/usr/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python3 -m optimum.exporters.neuron --model hf-internal-testing/tiny-random-gpt2 --sequence_length 64 --batch_size 1 --auto_cast_type bf16 --num_cores 1 tiny_gpt' returned non-zero exit status 1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For gpt2 it works, but not for the tiny model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to @dacorvo's comment, LGTM otherwise
|
||
# Log time taken | ||
logger.info(f"Compiled and cached model {hf_model_id} w{time.time() - start:.2f} seconds") | ||
|
||
|
||
def infer_task_from_model_path(model_id: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dacorvo I put a workaroud here to deduce if the task is text-generation
, in this case we could continue using previous configs, but I feel it quite suboptimal, as we are not supposed to be able to infer the task of let's say gpt from its model id, since it could also support tasks like "text-classification" besides "text-generation" although it's not yet the case in Optimum Neuron.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, so I should update the inference cache config files then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pull-request !
What does this PR do?
Tests
Test with config file
Command:
Test to cache a single model
Next steps
aws-neuron/optimum-neuron-cache