Add pix2struct support to ONNX #962

arvisioncode · 2023-04-10T15:42:11Z

This PR adds support for pix2struct models to be exported to ONNX format.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

fxmarty · 2023-04-11T12:35:38Z

optimum/exporters/tasks.py

@@ -636,6 +636,10 @@ class TasksManager:
            "sequence-classification",
            onnx="PerceiverOnnxConfig",
        ),
+        "pix2struct": supported_tasks_mapping(
+            "default",


you should probably remove the default here (AutoModel can not load pix2struct), and just have image-to-text.

The issue is that there is no autoclass in transformers for image-to-text, so we will need to add a way to specify explicitly in tasks.py that for pix2struct the class to use is Pix2StructForConditionalGeneration. Let me do a PR for this.

fxmarty · 2023-04-11T12:37:40Z

tests/exporters/exporters_utils.py

@@ -82,6 +82,7 @@
        "hf-internal-testing/tiny-random-language_perceiver": ["masked-lm", "sequence-classification"],
        "hf-internal-testing/tiny-random-vision_perceiver_conv": ["image-classification"],
    },
+    "pix2struct": "google/pix2struct-base",


It would be great if you can upload a tiny model to the hub

fxmarty · 2023-04-11T14:22:32Z

@arvisioncode The mapping _CUSTOM_CLASSES introduced in #967 should help!

fxmarty · 2023-04-12T12:04:31Z

Hi @arvisioncode , #967 was merged. You can see I added pix2strcut in a custom class dict

optimum/optimum/exporters/tasks.py

Line 214 in f7f1ef1

    
           ("pt", "pix2struct", "image-to-text"): ("transformers", "Pix2StructForConditionalGeneration"),

that will allow it to correctly be loaded from the corresponding class.

All you need should be to add the entry "pix2struct": supported_tasks_mapping("image-to-text", onnx="Pix2StructOnnxConfig") in the _SUPPORTED_MODEL_TYPE dict (as you did).

Then then only thing I am not sure of is whether Pix2StructOnnxConfig is inheriting from ViTOnnxConfig is fine, but you'll be able to find out if the tests fail.

arvisioncode · 2023-05-05T08:16:37Z

Hello @fxmarty and sorry for the delay in answering...
I have downloaded the repo again to load your changes and I have added the change that you told me from default to image-to-text.
However when launching the tests they keep failing. I have tried changing the class from which the Pix2StructConfig inherits, using ViTOnnxConfig, TextAndVisionOnnxConfig and VisionOnnxConfig, but in all cases, it fails

I show you here the traces of executing the tests with ViTOnnxConfig:

$ pytest tests/exporters/onnx/test_*.py -k "pix2struct" -s

======================================================================== test session starts ========================================================================
platform win32 -- Python 3.9.12, pytest-7.3.1, pluggy-1.0.0
rootdir: E:\INETUM\INETUM_Projects\Document Analysis\optimum
configfile: pyproject.toml
plugins: xdist-3.2.1
collected 3138 items / 3119 deselected / 19 selected

tests\exporters\onnx\test_exporters_onnx_cli.py sssFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text.
Fsssssssss
tests\exporters\onnx\test_onnx_export.py ssss

============================================================================= FAILURES ============================================================================== 
_________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text _________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith _____________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith>,), kw = {}  

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_340_pix2struct_no_task ____________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_340_pix2struct_no_task>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:289: in main_export
    models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = Pix2StructForConditionalGeneration(
  (encoder): Pix2StructVisionModel(
    (embeddings): Pix2StructVisionEmbeddings(
...  (dropout): Dropout(p=0.2, inplace=False)
    (lm_head): Linear(in_features=768, out_features=50244, bias=False)
  )
)
config = <optimum.exporters.onnx.model_configs.Pix2StructOnnxConfig object at 0x000001607D07E220>

    def get_encoder_decoder_models_for_export(
        model: Union["PreTrainedModel", "TFPreTrainedModel"], config: "OnnxConfig"
    ) -> Dict[str, Tuple[Union["PreTrainedModel", "TFPreTrainedModel"], "OnnxConfig"]]:
        """
        Returns the encoder and decoder parts of the model and their subsequent onnx configs.

        Args:
            model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
                The model to export.
            config ([`~exporters.onnx.config.OnnxConfig`]):
                The ONNX configuration associated with the exported model.

        Returns:
            `Dict[str, Tuple[Union[`PreTrainedModel`, `TFPreTrainedModel`], `OnnxConfig`]: A Dict containing the model and
            onnx configs for the encoder and decoder parts of the model.
        """
        models_for_export = {}

        encoder_model = model.get_encoder()
>       encoder_onnx_config = config.with_behavior("encoder")
E       AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'

optimum\exporters\onnx\utils.py:105: AttributeError
========================================================================= warnings summary ========================================================================== 
.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30
  E:\INETUM\INETUM_Projects\Document Analysis\optimum\.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
    deprecate(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== short test summary info ====================================================================== 
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text - ValueError: Unrecognized 
configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith - ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_340_pix2struct_no_task - AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'
===================================================== 3 failed, 16 skipped, 3119 deselected, 1 warning in 8.86s =====================================================

Do you know from which class the pix2struct config should inherit? Any idea how to fix the errors?

Thank you so much!

Add pix2struct support based on donut-swim

02ccc9d

arvisioncode changed the title ~~Add pix2struct support~~ Add pix2struct support to ONNX Apr 10, 2023

fxmarty reviewed Apr 11, 2023

View reviewed changes

arvisioncode closed this by deleting the head repository May 4, 2023

arvisioncode mentioned this pull request May 5, 2023

Add pix2struct to ONNX support (v2) #1034

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pix2struct support to ONNX #962

Add pix2struct support to ONNX #962

arvisioncode commented Apr 10, 2023 •

edited

Loading

fxmarty Apr 11, 2023

arvisioncode May 5, 2023

fxmarty Apr 11, 2023

fxmarty commented Apr 11, 2023

fxmarty commented Apr 12, 2023 •

edited

Loading

arvisioncode commented May 5, 2023

Add pix2struct support to ONNX #962

Add pix2struct support to ONNX #962

Conversation

arvisioncode commented Apr 10, 2023 • edited Loading

Before submitting

fxmarty Apr 11, 2023

Choose a reason for hiding this comment

arvisioncode May 5, 2023

Choose a reason for hiding this comment

fxmarty Apr 11, 2023

Choose a reason for hiding this comment

fxmarty commented Apr 11, 2023

fxmarty commented Apr 12, 2023 • edited Loading

arvisioncode commented May 5, 2023

arvisioncode commented Apr 10, 2023 •

edited

Loading

fxmarty commented Apr 12, 2023 •

edited

Loading