Add pix2struct to ONNX support (v2) #1034

arvisioncode · 2023-05-05T08:27:19Z

What does this PR do?

Add support for pix2struct to ONNX. Continuation of #962 as it was closed by removing the forked repository.

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

arvisioncode · 2023-05-05T08:28:56Z

Hello @fxmarty and sorry for the delay in answering...
I have downloaded the repo again to load your changes and I have added the change that you told me from default to image-to-text.
However when launching the tests they keep failing. I have tried changing the class from which the Pix2StructConfig inherits, using ViTOnnxConfig, TextAndVisionOnnxConfig and VisionOnnxConfig, but in all cases, it fails

I show you here the traces of executing the tests with ViTOnnxConfig:

$ pytest tests/exporters/onnx/test_*.py -k "pix2struct" -s

======================================================================== test session starts ========================================================================
platform win32 -- Python 3.9.12, pytest-7.3.1, pluggy-1.0.0
rootdir: E:\INETUM\INETUM_Projects\Document Analysis\optimum
configfile: pyproject.toml
plugins: xdist-3.2.1
collected 3138 items / 3119 deselected / 19 selected

tests\exporters\onnx\test_exporters_onnx_cli.py sssFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text.
Fsssssssss
tests\exporters\onnx\test_onnx_export.py ssss

============================================================================= FAILURES ============================================================================== 
_________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text _________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith _____________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith>,), kw = {}  

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_340_pix2struct_no_task ____________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_340_pix2struct_no_task>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:289: in main_export
    models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = Pix2StructForConditionalGeneration(
  (encoder): Pix2StructVisionModel(
    (embeddings): Pix2StructVisionEmbeddings(
...  (dropout): Dropout(p=0.2, inplace=False)
    (lm_head): Linear(in_features=768, out_features=50244, bias=False)
  )
)
config = <optimum.exporters.onnx.model_configs.Pix2StructOnnxConfig object at 0x000001607D07E220>

    def get_encoder_decoder_models_for_export(
        model: Union["PreTrainedModel", "TFPreTrainedModel"], config: "OnnxConfig"
    ) -> Dict[str, Tuple[Union["PreTrainedModel", "TFPreTrainedModel"], "OnnxConfig"]]:
        """
        Returns the encoder and decoder parts of the model and their subsequent onnx configs.

        Args:
            model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
                The model to export.
            config ([`~exporters.onnx.config.OnnxConfig`]):
                The ONNX configuration associated with the exported model.

        Returns:
            `Dict[str, Tuple[Union[`PreTrainedModel`, `TFPreTrainedModel`], `OnnxConfig`]: A Dict containing the model and
            onnx configs for the encoder and decoder parts of the model.
        """
        models_for_export = {}

        encoder_model = model.get_encoder()
>       encoder_onnx_config = config.with_behavior("encoder")
E       AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'

optimum\exporters\onnx\utils.py:105: AttributeError
========================================================================= warnings summary ========================================================================== 
.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30
  E:\INETUM\INETUM_Projects\Document Analysis\optimum\.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
    deprecate(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== short test summary info ====================================================================== 
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text - ValueError: Unrecognized 
configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith - ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_340_pix2struct_no_task - AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'
===================================================== 3 failed, 16 skipped, 3119 deselected, 1 warning in 8.86s =====================================================

Do you know from which class the pix2struct config should inherit? Any idea how to fix the errors?

Thank you so much!

arvisioncode · 2023-05-05T08:34:50Z

I have another question, if I want to transform other pix2struct models to ONNX like: google/pix2struct-textcaps-base, google/pix2struct-chartqa-base, google/pix2struct-docvqa-base, google/pix2struct-screen2words-base, google/pix2struct-ai2d-base, google/deplot ... would I have to make any other changes? as for example add more supported tasks here:

        "pix2struct": supported_tasks_mapping(
            "image-to-text",
            onnx="Pix2StructOnnxConfig",
        ),

fxmarty · 2023-05-30T08:20:40Z

Hi @arvisioncode , apologies for my late reply - I was off the past few weeks. I believe there was a bug in a previous PR of mine, should be fixed in #1075, that should avoid you the Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq error.

Looking into how the onnx config should look like.

HuggingFaceDocBuilderDev · 2023-05-30T08:38:29Z

The documentation is not available anymore as the PR was closed or merged.

fxmarty · 2023-05-31T04:32:25Z

One specificity about pix2struct seem to be that the inputs are flattened_patches and attention_mask, so that patches are generated in the preprocessing rather than in the model itself as ViT.

Moreover pix2struct seem to be a seq2seq model, so its ONNX config should be closer to TextSeq2SeqOnnxConfig (and inherit from (OnnxSeq2SeqConfigWithPast), except that it is with image inputs.

fxmarty · 2023-06-01T09:54:56Z

@arvisioncode I took the liberty to implement the config, it is a bit painful. We will need huggingface/transformers#23932 to be released first. Next is the support of visual-question-answering pipeline, which is not yet supported.

optimum/exporters/tasks.py

fxmarty

LGTM @arvisioncode thank you for your contribution, I worked a bit on your PR to have it merged!

arvisioncode · 2023-06-14T09:31:57Z

thank you very much for your help @fxmarty ! I'm glad it's working now

harsh1509c · 2023-08-09T05:01:26Z

hey @arvisioncode @fxmarty I like to know how did you converted the pix2struct base model to ONNX step by step.
I have task of converting piz2struct to ONNX.
It would be a great help.

Add pix2struct to ONNX support

882eeb6

arvisioncode and others added 2 commits May 8, 2023 09:52

update config

338286a

can I push?

d50cc9c

fxmarty mentioned this pull request May 31, 2023

Support for Pix2Struct #1054

Closed

fxmarty added 3 commits May 31, 2023 22:25

Merge branch 'upstream_main'

f37ee75

support pix2struct

71a8187

tests pass

b697c7e

fxmarty requested review from mht-sharma and michaelbenayoun June 1, 2023 09:53

add vqa

7d0a5f1

michaelbenayoun reviewed Jun 2, 2023

View reviewed changes

optimum/exporters/tasks.py Show resolved Hide resolved

fxmarty added 4 commits June 12, 2023 14:23

workflow

b9805ba

Merge branch 'upstream_main'

009e891

fix tests

c3f37fe

trigger ci

8e48d99

fxmarty approved these changes Jun 14, 2023

View reviewed changes

fxmarty merged commit 1dfd3ac into huggingface:main Jun 14, 2023

fxmarty mentioned this pull request Jun 14, 2023

Pix2struct convert to ONNX #937

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add pix2struct to ONNX support (v2) #1034

Add pix2struct to ONNX support (v2) #1034

arvisioncode commented May 5, 2023 •

edited

Loading

arvisioncode commented May 5, 2023

arvisioncode commented May 5, 2023

fxmarty commented May 30, 2023

HuggingFaceDocBuilderDev commented May 30, 2023 •

edited

Loading

fxmarty commented May 31, 2023

fxmarty commented Jun 1, 2023

fxmarty left a comment •

edited

Loading

arvisioncode commented Jun 14, 2023

harsh1509c commented Aug 9, 2023

Add pix2struct to ONNX support (v2) #1034

Add pix2struct to ONNX support (v2) #1034

Conversation

arvisioncode commented May 5, 2023 • edited Loading

What does this PR do?

arvisioncode commented May 5, 2023

arvisioncode commented May 5, 2023

fxmarty commented May 30, 2023

HuggingFaceDocBuilderDev commented May 30, 2023 • edited Loading

fxmarty commented May 31, 2023

fxmarty commented Jun 1, 2023

fxmarty left a comment • edited Loading

Choose a reason for hiding this comment

arvisioncode commented Jun 14, 2023

harsh1509c commented Aug 9, 2023

arvisioncode commented May 5, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented May 30, 2023 •

edited

Loading

fxmarty left a comment •

edited

Loading