Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to ExecuTorch: Initial Integration #2090

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

guangy10
Copy link

@guangy10 guangy10 commented Nov 6, 2024

What does this PR do?

This PR is the first initiative to create an e2e path for "Export to ExecuTorch".

In this very first revision, I'm focusing outline the skeleton of the integration work:

./setup.py: Specify new dependency in order to "Export to ExecuTorch". In this case, it's adding a new pip package executorch(Beta version) and require the latest released transformfers (models from older versions may not work with ExecuTorch)

Env setup

./optimum/commands/export/executorch.py: To support running export with ExecuTorch backend via cli.

Export AOT(Ahead-of-time)

./optimum/exporters/executorch/: Main entry point to export via ExecuTorch.

optimum/exporters/executorch/
├── __main__.py
├── causal_lm.py
├── convert.py
├── recipe_registry.py
├── task_registry.py
└── xnnpack.py
  • It contains the coverter.py which defines the common workflow to export a transformers model from 🤗 to ExecuTorch.
  • The export workflow can be called with different recipes (e.g. quantize and delegate to XNNPACK, Core ML, QNN, MPS, etc.).
  • For each model that performs different tasks may require different patches, this is handled via registered tasks. For example, CausalLM w/ cache must be loaded with generation_config and exported via transformers.integrations.executorch.

Run with ExecuTorch model

optimum/executorchruntime/modeling_executorch.py defines the python class that wrap the Transformers AutoModelForXXX classes. We start with ExecuTorchModelForCausalLM in this file, which inherits the base OptimizedModel and overrides all abstract methods. This is where the ExecuTorch pybinding and the runtime gets integrated.

optimum/pipelines/pipelines_base.py defines a pipeline for ExecuTorch where everything is put together, deciding which model/task/recipe to load and run. It also consists of tokenizers that paired with the model for encoding/decoding.

CC: @mergennachin

Tests

Export AOT(Ahead-of-time)

Export to ExecuTorch via CLI:
optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b"
It generates the ExecuTorch model to meta_llama3_2_1b/model.pte.

Run with ExecuTorch model (model.pte)

python test_executorch.py which essentially runs the following two lines of code:

model = ExecuTorchModelForCausalLM.from_pretrained(
    model="meta_llama3_2_1b", 
    task="text-generation", 
    recipe="xnnpack"
)
print(model.text_generation(
    tokenizer=AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B"), 
    prompt="Hey, can you tell me any fun things to do in New York?"
))

And we got:
"Hey, can you tell me any fun things to do in New York? I’m going to be there for a week and I’m not sure what to do. I’m a little bit of a history buff and I’m also a little bit of a foodie. I’m not sure if I should go to the museum or the zoo or the aquarium. I’m not sure if I should go to the theater or the opera or the ballet..."

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

@michaelbenayoun @echarlaix

return self.et_model.forward(input_ids, cache_position)[0]

@classmethod
def _from_pretrained(
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the example here https://github.com/huggingface/optimum/tree/7e8d857d1ed6be32046324bf8f424690f116b4e9?tab=readme-ov-file#run-the-exported-model-using-onnx-runtime correctly, this method is to load a exported model from local file system, correct?

Comment on lines 149 to 158
def generate(
self,
prompt_tokens: List[int],
) -> List[int]:
"""
Generate a sequence of token ids using the ExecuTorchModule.

`pipeline()` is where everything puts together. It consists of the tokenizer for encoding the inputs and decoding the model generated outputs.
"""
pass
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idk it this method belongs to this class. This should be the place where we execute the model with delegated kernel libraries via pybinding. For ONNX, it seems to be the ort.InferenceSession (can't find its definition anywhere)

Comment on lines +296 to 318
def load_executorch_pipeline(
model,
targeted_task,
load_tokenizer,
tokenizer,
feature_extractor,
load_feature_extractor,
SUPPORTED_TASKS,
subfolder: str = "",
token: Optional[Union[bool, str]] = None,
revision: str = "main",
model_kwargs: Optional[Dict[str, Any]] = None,
config: AutoConfig = None,
**kwargs,
):
raise NotImplementedError("Executorch pipeline is not implemented yet.")


MAPPING_LOADING_FUNC = {
"ort": load_ort_pipeline,
"bettertransformer": load_bettertransformer,
"executorch": load_executorch_pipeline,
}
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this pipeline ever used? If yes, can you show me examples?

The example in the landing page is using the transformers pipeline directly: https://github.com/huggingface/optimum?tab=readme-ov-file#run-the-exported-model-using-onnx-runtime. I fail to understand how it is possible to run the exported model with transformers pipeline. If the exported model may have a modified signature that is different from eager, which won't work with the eager generate I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @echarlaix answer here.
The pipelines should use the proper auto model class for Optimum. About your question for generate, that is true, I think we fail when we do not support exactly the same features as Transformers, is that true @echarlaix ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes with the current integration executorch models will not be compatible with transformers pipelines, which is not a big deal as I don't think many people uses pipelines with optimum models

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch 2 times, most recently from 4ad0b3d to 328069d Compare November 12, 2024 00:28
@guangy10
Copy link
Author

Fixed the export path. Now optimum-cli export executorch --model "meta-llama/Llama-3.2-1B" --task "text-generation" --recipe "xnnpack" --output_dir="meta_llama3_2_1b" can work as expected

@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch 3 times, most recently from f775391 to 6b38215 Compare November 14, 2024 00:14
@guangy10 guangy10 force-pushed the executorch_integration_skeleton branch from 6b38215 to 757f152 Compare November 14, 2024 00:37
@guangy10 guangy10 changed the title Export to ExecuTorch: Code Skeleton Export to ExecuTorch: Initial Integration Nov 14, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@michaelbenayoun michaelbenayoun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments. I have to say it is really great work what you have done here since it is not easy to go from our main ONNX codebase and generalize!

Do you think we could also write little tests in the same spirit as what's done for ONNX, where we validate the exported model against a vanilla PyTorch model?

"--recipe",
type=str,
default="xnnpack",
help='Pre-defined recipes for export to ExecuTorch. Defaults to "xnnpack".',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we link to the list of available recipes?
Is it relevant to limit the field of possibilities by adding a set of allowed choices?

):
super().__init__(model, config)
self.et_model = model
print(f"DEBUG all static methods: {self.et_model.method_names()}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: forgotten print?

Comment on lines +82 to +89
if use_auth_token is not None:
warnings.warn(
"The `use_auth_token` argument is deprecated and will be removed soon. Please use the `token` argument instead.",
FutureWarning,
)
if token is not None:
raise ValueError("You cannot use both `use_auth_token` and `token` arguments at the same time.")
token = use_auth_token
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we support use_auth_token or just skip it since it has been deprecated?

It has not been deprecated in Transformers 4.26.0, which we officially support, but just wanted to raise the question @echarlaix

raise RuntimeError(f"The recipe '{recipe}' isn't registered. Detailed error: {e}")

executorch_prog = recipe_func(model, task, **kwargs)
# print(f"Exported program: {executorch_prog.exported_program().graph}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# print(f"Exported program: {executorch_prog.exported_program().graph}")

full_path = os.path.join(f"{output_dir}", "model.pte")
with open(full_path, "wb") as f:
executorch_prog.write_to_file(f)
print(f"Saved exported program to {full_path}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace this print by a logger.info please?

task: str,
**kwargs,
):
print(f"DEBUG: model={model}, task={task}, kwargs={kwargs}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
print(f"DEBUG: model={model}, task={task}, kwargs={kwargs}")

Comment on lines +296 to 318
def load_executorch_pipeline(
model,
targeted_task,
load_tokenizer,
tokenizer,
feature_extractor,
load_feature_extractor,
SUPPORTED_TASKS,
subfolder: str = "",
token: Optional[Union[bool, str]] = None,
revision: str = "main",
model_kwargs: Optional[Dict[str, Any]] = None,
config: AutoConfig = None,
**kwargs,
):
raise NotImplementedError("Executorch pipeline is not implemented yet.")


MAPPING_LOADING_FUNC = {
"ort": load_ort_pipeline,
"bettertransformer": load_bettertransformer,
"executorch": load_executorch_pipeline,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let @echarlaix answer here.
The pipelines should use the proper auto model class for Optimum. About your question for generate, that is true, I think we fail when we do not support exactly the same features as Transformers, is that true @echarlaix ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should be removed right?

return self.et_model.forward((input_ids, cache_position))[0]

@classmethod
def from_pretrained(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here should be _from_pretrained which is used to load the model once exported

Suggested change
def from_pretrained(
def _from_pretrained(

from_pretrained_method = cls._from_transformers if export else cls._from_pretrained

def from_pretrained(
cls,
model_dir_path: Union[str, Path],
task: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove, it will always be "text-generation"in this case

Suggested change
task: str,

cls,
model_dir_path: Union[str, Path],
task: str,
recipe: str,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can be removed as well I think as will only be used for export

Suggested change
recipe: str,

Comment on lines +296 to 318
def load_executorch_pipeline(
model,
targeted_task,
load_tokenizer,
tokenizer,
feature_extractor,
load_feature_extractor,
SUPPORTED_TASKS,
subfolder: str = "",
token: Optional[Union[bool, str]] = None,
revision: str = "main",
model_kwargs: Optional[Dict[str, Any]] = None,
config: AutoConfig = None,
**kwargs,
):
raise NotImplementedError("Executorch pipeline is not implemented yet.")


MAPPING_LOADING_FUNC = {
"ort": load_ort_pipeline,
"bettertransformer": load_bettertransformer,
"executorch": load_executorch_pipeline,
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes with the current integration executorch models will not be compatible with transformers pipelines, which is not a big deal as I don't think many people uses pipelines with optimum models

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants