Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pix2struct to ONNX support (v2) #1034

Merged
merged 11 commits into from
Jun 14, 2023
Merged

Conversation

arvisioncode
Copy link
Contributor

@arvisioncode arvisioncode commented May 5, 2023

What does this PR do?

Add support for pix2struct to ONNX. Continuation of #962 as it was closed by removing the forked repository.

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@arvisioncode
Copy link
Contributor Author

Hello @fxmarty and sorry for the delay in answering...
I have downloaded the repo again to load your changes and I have added the change that you told me from default to image-to-text.
However when launching the tests they keep failing. I have tried changing the class from which the Pix2StructConfig inherits, using ViTOnnxConfig, TextAndVisionOnnxConfig and VisionOnnxConfig, but in all cases, it fails

I show you here the traces of executing the tests with ViTOnnxConfig:

$ pytest tests/exporters/onnx/test_*.py -k "pix2struct" -s

======================================================================== test session starts ========================================================================
platform win32 -- Python 3.9.12, pytest-7.3.1, pluggy-1.0.0
rootdir: E:\INETUM\INETUM_Projects\Document Analysis\optimum
configfile: pyproject.toml
plugins: xdist-3.2.1
collected 3138 items / 3119 deselected / 19 selected

tests\exporters\onnx\test_exporters_onnx_cli.py sssFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text.
Fsssssssss
tests\exporters\onnx\test_onnx_export.py ssss

============================================================================= FAILURES ============================================================================== 
_________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text _________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith _____________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith>,), kw = {}  

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_340_pix2struct_no_task ____________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_340_pix2struct_no_task>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:289: in main_export
    models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = Pix2StructForConditionalGeneration(
  (encoder): Pix2StructVisionModel(
    (embeddings): Pix2StructVisionEmbeddings(
...  (dropout): Dropout(p=0.2, inplace=False)
    (lm_head): Linear(in_features=768, out_features=50244, bias=False)
  )
)
config = <optimum.exporters.onnx.model_configs.Pix2StructOnnxConfig object at 0x000001607D07E220>

    def get_encoder_decoder_models_for_export(
        model: Union["PreTrainedModel", "TFPreTrainedModel"], config: "OnnxConfig"
    ) -> Dict[str, Tuple[Union["PreTrainedModel", "TFPreTrainedModel"], "OnnxConfig"]]:
        """
        Returns the encoder and decoder parts of the model and their subsequent onnx configs.

        Args:
            model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
                The model to export.
            config ([`~exporters.onnx.config.OnnxConfig`]):
                The ONNX configuration associated with the exported model.

        Returns:
            `Dict[str, Tuple[Union[`PreTrainedModel`, `TFPreTrainedModel`], `OnnxConfig`]: A Dict containing the model and
            onnx configs for the encoder and decoder parts of the model.
        """
        models_for_export = {}

        encoder_model = model.get_encoder()
>       encoder_onnx_config = config.with_behavior("encoder")
E       AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'

optimum\exporters\onnx\utils.py:105: AttributeError
========================================================================= warnings summary ========================================================================== 
.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30
  E:\INETUM\INETUM_Projects\Document Analysis\optimum\.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
    deprecate(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== short test summary info ====================================================================== 
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text - ValueError: Unrecognized 
configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith - ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_340_pix2struct_no_task - AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'
===================================================== 3 failed, 16 skipped, 3119 deselected, 1 warning in 8.86s ===================================================== 

Do you know from which class the pix2struct config should inherit? Any idea how to fix the errors?

Thank you so much!

@arvisioncode
Copy link
Contributor Author

I have another question, if I want to transform other pix2struct models to ONNX like: google/pix2struct-textcaps-base, google/pix2struct-chartqa-base, google/pix2struct-docvqa-base, google/pix2struct-screen2words-base, google/pix2struct-ai2d-base, google/deplot ... would I have to make any other changes? as for example add more supported tasks here:

        "pix2struct": supported_tasks_mapping(
            "image-to-text",
            onnx="Pix2StructOnnxConfig",
        ),

arvisioncode and others added 2 commits May 8, 2023 09:52
@fxmarty
Copy link
Contributor

fxmarty commented May 30, 2023

Hi @arvisioncode , apologies for my late reply - I was off the past few weeks. I believe there was a bug in a previous PR of mine, should be fixed in #1075, that should avoid you the Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq error.

Looking into how the onnx config should look like.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 30, 2023

The documentation is not available anymore as the PR was closed or merged.

@fxmarty
Copy link
Contributor

fxmarty commented May 31, 2023

One specificity about pix2struct seem to be that the inputs are flattened_patches and attention_mask, so that patches are generated in the preprocessing rather than in the model itself as ViT.

Moreover pix2struct seem to be a seq2seq model, so its ONNX config should be closer to TextSeq2SeqOnnxConfig (and inherit from (OnnxSeq2SeqConfigWithPast), except that it is with image inputs.

@fxmarty fxmarty mentioned this pull request May 31, 2023
@fxmarty
Copy link
Contributor

fxmarty commented Jun 1, 2023

@arvisioncode I took the liberty to implement the config, it is a bit painful. We will need huggingface/transformers#23932 to be released first. Next is the support of visual-question-answering pipeline, which is not yet supported.

Copy link
Contributor

@fxmarty fxmarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @arvisioncode thank you for your contribution, I worked a bit on your PR to have it merged!

@fxmarty fxmarty merged commit 1dfd3ac into huggingface:main Jun 14, 2023
@arvisioncode
Copy link
Contributor Author

thank you very much for your help @fxmarty ! I'm glad it's working now

@harsh1509c
Copy link

hey @arvisioncode @fxmarty I like to know how did you converted the pix2struct base model to ONNX step by step.
I have task of converting piz2struct to ONNX.
It would be a great help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants