-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export to ExecuTorch: Initial Integration #2090
base: main
Are you sure you want to change the base?
Export to ExecuTorch: Initial Integration #2090
Conversation
return self.et_model.forward(input_ids, cache_position)[0] | ||
|
||
@classmethod | ||
def _from_pretrained( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the example here https://github.com/huggingface/optimum/tree/7e8d857d1ed6be32046324bf8f424690f116b4e9?tab=readme-ov-file#run-the-exported-model-using-onnx-runtime correctly, this method is to load a exported model from local file system, correct?
def generate( | ||
self, | ||
prompt_tokens: List[int], | ||
) -> List[int]: | ||
""" | ||
Generate a sequence of token ids using the ExecuTorchModule. | ||
|
||
`pipeline()` is where everything puts together. It consists of the tokenizer for encoding the inputs and decoding the model generated outputs. | ||
""" | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idk it this method belongs to this class. This should be the place where we execute the model with delegated kernel libraries via pybinding. For ONNX, it seems to be the ort.InferenceSession
(can't find its definition anywhere)
def load_executorch_pipeline( | ||
model, | ||
targeted_task, | ||
load_tokenizer, | ||
tokenizer, | ||
feature_extractor, | ||
load_feature_extractor, | ||
SUPPORTED_TASKS, | ||
subfolder: str = "", | ||
token: Optional[Union[bool, str]] = None, | ||
revision: str = "main", | ||
model_kwargs: Optional[Dict[str, Any]] = None, | ||
config: AutoConfig = None, | ||
**kwargs, | ||
): | ||
raise NotImplementedError("Executorch pipeline is not implemented yet.") | ||
|
||
|
||
MAPPING_LOADING_FUNC = { | ||
"ort": load_ort_pipeline, | ||
"bettertransformer": load_bettertransformer, | ||
"executorch": load_executorch_pipeline, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this pipeline ever used? If yes, can you show me examples?
The example in the landing page is using the transformers pipeline directly: https://github.com/huggingface/optimum?tab=readme-ov-file#run-the-exported-model-using-onnx-runtime. I fail to understand how it is possible to run the exported model with transformers pipeline. If the exported model may have a modified signature that is different from eager, which won't work with the eager generate
I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @echarlaix answer here.
The pipelines should use the proper auto model class for Optimum. About your question for generate, that is true, I think we fail when we do not support exactly the same features as Transformers, is that true @echarlaix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes with the current integration executorch models will not be compatible with transformers pipelines, which is not a big deal as I don't think many people uses pipelines with optimum models
4ad0b3d
to
328069d
Compare
Fixed the export path. Now |
f775391
to
6b38215
Compare
6b38215
to
757f152
Compare
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. I have to say it is really great work what you have done here since it is not easy to go from our main ONNX codebase and generalize!
Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?
"--recipe", | ||
type=str, | ||
default="xnnpack", | ||
help='Pre-defined recipes for export to ExecuTorch. Defaults to "xnnpack".', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we link to the list of available recipes?
Is it relevant to limit the field of possibilities by adding a set of allowed choices?
): | ||
super().__init__(model, config) | ||
self.et_model = model | ||
print(f"DEBUG all static methods: {self.et_model.method_names()}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: forgotten print?
if use_auth_token is not None: | ||
warnings.warn( | ||
"The `use_auth_token` argument is deprecated and will be removed soon. Please use the `token` argument instead.", | ||
FutureWarning, | ||
) | ||
if token is not None: | ||
raise ValueError("You cannot use both `use_auth_token` and `token` arguments at the same time.") | ||
token = use_auth_token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support use_auth_token
or just skip it since it has been deprecated?
It has not been deprecated in Transformers 4.26.0, which we officially support, but just wanted to raise the question @echarlaix
raise RuntimeError(f"The recipe '{recipe}' isn't registered. Detailed error: {e}") | ||
|
||
executorch_prog = recipe_func(model, task, **kwargs) | ||
# print(f"Exported program: {executorch_prog.exported_program().graph}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# print(f"Exported program: {executorch_prog.exported_program().graph}") |
full_path = os.path.join(f"{output_dir}", "model.pte") | ||
with open(full_path, "wb") as f: | ||
executorch_prog.write_to_file(f) | ||
print(f"Saved exported program to {full_path}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you replace this print
by a logger.info
please?
task: str, | ||
**kwargs, | ||
): | ||
print(f"DEBUG: model={model}, task={task}, kwargs={kwargs}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
print(f"DEBUG: model={model}, task={task}, kwargs={kwargs}") |
def load_executorch_pipeline( | ||
model, | ||
targeted_task, | ||
load_tokenizer, | ||
tokenizer, | ||
feature_extractor, | ||
load_feature_extractor, | ||
SUPPORTED_TASKS, | ||
subfolder: str = "", | ||
token: Optional[Union[bool, str]] = None, | ||
revision: str = "main", | ||
model_kwargs: Optional[Dict[str, Any]] = None, | ||
config: AutoConfig = None, | ||
**kwargs, | ||
): | ||
raise NotImplementedError("Executorch pipeline is not implemented yet.") | ||
|
||
|
||
MAPPING_LOADING_FUNC = { | ||
"ort": load_ort_pipeline, | ||
"bettertransformer": load_bettertransformer, | ||
"executorch": load_executorch_pipeline, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll let @echarlaix answer here.
The pipelines should use the proper auto model class for Optimum. About your question for generate, that is true, I think we fail when we do not support exactly the same features as Transformers, is that true @echarlaix ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file should be removed right?
return self.et_model.forward((input_ids, cache_position))[0] | ||
|
||
@classmethod | ||
def from_pretrained( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here should be _from_pretrained
which is used to load the model once exported
def from_pretrained( | |
def _from_pretrained( |
optimum/optimum/modeling_base.py
Line 436 in a7a807c
from_pretrained_method = cls._from_transformers if export else cls._from_pretrained |
def from_pretrained( | ||
cls, | ||
model_dir_path: Union[str, Path], | ||
task: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove, it will always be "text-generation"
in this case
task: str, |
cls, | ||
model_dir_path: Union[str, Path], | ||
task: str, | ||
recipe: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be removed as well I think as will only be used for export
recipe: str, |
def load_executorch_pipeline( | ||
model, | ||
targeted_task, | ||
load_tokenizer, | ||
tokenizer, | ||
feature_extractor, | ||
load_feature_extractor, | ||
SUPPORTED_TASKS, | ||
subfolder: str = "", | ||
token: Optional[Union[bool, str]] = None, | ||
revision: str = "main", | ||
model_kwargs: Optional[Dict[str, Any]] = None, | ||
config: AutoConfig = None, | ||
**kwargs, | ||
): | ||
raise NotImplementedError("Executorch pipeline is not implemented yet.") | ||
|
||
|
||
MAPPING_LOADING_FUNC = { | ||
"ort": load_ort_pipeline, | ||
"bettertransformer": load_bettertransformer, | ||
"executorch": load_executorch_pipeline, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes with the current integration executorch models will not be compatible with transformers pipelines, which is not a big deal as I don't think many people uses pipelines with optimum models
What does this PR do?
This PR is the first initiative to create an e2e path for "Export to ExecuTorch".
In this very first revision, I'm focusing outline the skeleton of the integration work:
./setup.py
: Specify new dependency in order to "Export to ExecuTorch". In this case, it's adding a new pip packageexecutorch
(Beta version) and require the latest releasedtransformfers
(models from older versions may not work with ExecuTorch)Env setup
./optimum/commands/export/executorch.py
: To support running export with ExecuTorch backend via cli.Export AOT(Ahead-of-time)
./optimum/exporters/executorch/
: Main entry point to export via ExecuTorch.coverter.py
which defines the common workflow to export atransformers
model from 🤗 to ExecuTorch.CausalLM
w/ cache must be loaded withgeneration_config
and exported viatransformers.integrations.executorch
.Run with ExecuTorch model
optimum/executorchruntime/modeling_executorch.py
defines the python class that wrap the TransformersAutoModelForXXX
classes. We start withExecuTorchModelForCausalLM
in this file, which inherits the baseOptimizedModel
and overrides all abstract methods. This is where the ExecuTorch pybinding and the runtime gets integrated.optimum/pipelines/pipelines_base.py
defines a pipeline for ExecuTorch where everything is put together, deciding which model/task/recipe to load and run. It also consists of tokenizers that paired with the model for encoding/decoding.CC: @mergennachin
Tests
Export AOT(Ahead-of-time)
Export to ExecuTorch via CLI:
optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b"
It generates the ExecuTorch model to
meta_llama3_2_1b/model.pte
.Run with ExecuTorch model (
model.pte
)python test_executorch.py
which essentially runs the following two lines of code:And we got:
"Hey, can you tell me any fun things to do in New York? I’m going to be there for a week and I’m not sure what to do. I’m a little bit of a history buff and I’m also a little bit of a foodie. I’m not sure if I should go to the museum or the zoo or the aquarium. I’m not sure if I should go to the theater or the opera or the ballet..."
Before submitting
Who can review?
@michaelbenayoun @echarlaix