Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add pix2struct support to ONNX #962

Closed
wants to merge 1 commit into from
Closed

Add pix2struct support to ONNX #962

wants to merge 1 commit into from

Conversation

arvisioncode
Copy link
Contributor

@arvisioncode arvisioncode commented Apr 10, 2023

This PR adds support for pix2struct models to be exported to ONNX format.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@arvisioncode arvisioncode changed the title Add pix2struct support Add pix2struct support to ONNX Apr 10, 2023
@@ -636,6 +636,10 @@ class TasksManager:
"sequence-classification",
onnx="PerceiverOnnxConfig",
),
"pix2struct": supported_tasks_mapping(
"default",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should probably remove the default here (AutoModel can not load pix2struct), and just have image-to-text.

The issue is that there is no autoclass in transformers for image-to-text, so we will need to add a way to specify explicitly in tasks.py that for pix2struct the class to use is Pix2StructForConditionalGeneration. Let me do a PR for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -82,6 +82,7 @@
"hf-internal-testing/tiny-random-language_perceiver": ["masked-lm", "sequence-classification"],
"hf-internal-testing/tiny-random-vision_perceiver_conv": ["image-classification"],
},
"pix2struct": "google/pix2struct-base",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you can upload a tiny model to the hub

@fxmarty
Copy link
Contributor

fxmarty commented Apr 11, 2023

@arvisioncode The mapping _CUSTOM_CLASSES introduced in #967 should help!

@fxmarty
Copy link
Contributor

fxmarty commented Apr 12, 2023

Hi @arvisioncode , #967 was merged. You can see I added pix2strcut in a custom class dict

("pt", "pix2struct", "image-to-text"): ("transformers", "Pix2StructForConditionalGeneration"),
that will allow it to correctly be loaded from the corresponding class.

All you need should be to add the entry "pix2struct": supported_tasks_mapping("image-to-text", onnx="Pix2StructOnnxConfig") in the _SUPPORTED_MODEL_TYPE dict (as you did).

Then then only thing I am not sure of is whether Pix2StructOnnxConfig is inheriting from ViTOnnxConfig is fine, but you'll be able to find out if the tests fail.

@arvisioncode arvisioncode closed this by deleting the head repository May 4, 2023
@arvisioncode
Copy link
Contributor Author

Hello @fxmarty and sorry for the delay in answering...
I have downloaded the repo again to load your changes and I have added the change that you told me from default to image-to-text.
However when launching the tests they keep failing. I have tried changing the class from which the Pix2StructConfig inherits, using ViTOnnxConfig, TextAndVisionOnnxConfig and VisionOnnxConfig, but in all cases, it fails

I show you here the traces of executing the tests with ViTOnnxConfig:

$ pytest tests/exporters/onnx/test_*.py -k "pix2struct" -s

======================================================================== test session starts ========================================================================
platform win32 -- Python 3.9.12, pytest-7.3.1, pluggy-1.0.0
rootdir: E:\INETUM\INETUM_Projects\Document Analysis\optimum
configfile: pyproject.toml
plugins: xdist-3.2.1
collected 3138 items / 3119 deselected / 19 selected

tests\exporters\onnx\test_exporters_onnx_cli.py sssFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
FFramework not specified. Using pt to export to ONNX.
Automatic task detection to image-to-text.
Fsssssssss
tests\exporters\onnx\test_onnx_export.py ssss

============================================================================= FAILURES ============================================================================== 
_________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text _________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith _____________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith>,), kw = {}  

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:169: in main_export
    model = TasksManager.get_model_from_task(
optimum\exporters\tasks.py:1385: in get_model_from_task
    model = model_class.from_pretrained(model_name_or_path, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'transformers.models.auto.modeling_auto.AutoModelForVision2Seq'>, pretrained_model_name_or_path = 'google/pix2struct-base', model_args = ()
kwargs = {'torch_dtype': None}
config = Pix2StructConfig {
  "_commit_hash": "f17649865bf61db64bb697ed4a3da7e0bc7413d5",
  "_name_or_path": "google/pix2struct..."torchscript": false,
    "transformers_version": "4.29.0.dev0",
    "typical_p": 1.0,
    "use_bfloat16": false
  }
}

trust_remote_code = False, hub_kwargs_names = ['cache_dir', 'force_download', 'local_files_only', 'proxies', 'resume_download', 'revision', ...]
hub_kwargs = {'cache_dir': None, 'force_download': False, 'local_files_only': False, 'revision': 'main', ...}, kwargs_copy = {'_from_auto': True, 'torch_dtype': None}
    @classmethod
    def from_pretrained(cls, pretrained_model_name_or_path, *model_args, **kwargs):
        config = kwargs.pop("config", None)
        trust_remote_code = kwargs.pop("trust_remote_code", False)
        kwargs["_from_auto"] = True
        hub_kwargs_names = [
            "cache_dir",
            "force_download",
            "local_files_only",
            "proxies",
            "resume_download",
            "revision",
            "subfolder",
            "use_auth_token",
        ]
        hub_kwargs = {name: kwargs.pop(name) for name in hub_kwargs_names if name in kwargs}
        if not isinstance(config, PretrainedConfig):
            kwargs_copy = copy.deepcopy(kwargs)
            # ensure not to pollute the config object with torch_dtype="auto" - since it's
            # meaningless in the context of the config object - torch.dtype values are acceptable
            if kwargs_copy.get("torch_dtype", None) == "auto":
                _ = kwargs_copy.pop("torch_dtype")

            config, kwargs = AutoConfig.from_pretrained(
                pretrained_model_name_or_path,
                return_unused_kwargs=True,
                trust_remote_code=trust_remote_code,
                **hub_kwargs,
                **kwargs_copy,
            )
        if hasattr(config, "auto_map") and cls.__name__ in config.auto_map:
            if not trust_remote_code:
                raise ValueError(
                    f"Loading {pretrained_model_name_or_path} requires you to execute the modeling file in that repo "
                    "on your local machine. Make sure you have read the code there to avoid malicious use, then set "
                    "the option `trust_remote_code=True` to remove this error."
                )
            class_ref = config.auto_map[cls.__name__]
            model_class = get_class_from_dynamic_module(
                class_ref, pretrained_model_name_or_path, **hub_kwargs, **kwargs
            )
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
        elif type(config) in cls._model_mapping.keys():
            model_class = _get_model_class(config, cls._model_mapping)
            return model_class.from_pretrained(
                pretrained_model_name_or_path, *model_args, config=config, **hub_kwargs, **kwargs
            )
>       raise ValueError(
            f"Unrecognized configuration class {config.__class__} for this kind of AutoModel: {cls.__name__}.\n"
            f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapping.keys())}."
        )
E       ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoModelForVision2Seq.
E       Model type should be one of BlipConfig, Blip2Config, VisionEncoderDecoderConfig.

.virtualenv\lib\site-packages\transformers\models\auto\auto_factory.py:471: ValueError
____________________________________________ OnnxCLIExportTestCase.test_exporters_cli_pytorch_cpu_340_pix2struct_no_task ____________________________________________ 

a = (<tests.exporters.onnx.test_exporters_onnx_cli.OnnxCLIExportTestCase testMethod=test_exporters_cli_pytorch_cpu_340_pix2struct_no_task>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

.virtualenv\lib\site-packages\parameterized\parameterized.py:620:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
tests\exporters\onnx\test_exporters_onnx_cli.py:145: in test_exporters_cli_pytorch_cpu
    self._onnx_export(model_name, task, monolith, no_post_process)
tests\exporters\onnx\test_exporters_onnx_cli.py:116: in _onnx_export
    main_export(
optimum\exporters\onnx\__main__.py:289: in main_export
    models_and_onnx_configs = get_encoder_decoder_models_for_export(model, onnx_config)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

model = Pix2StructForConditionalGeneration(
  (encoder): Pix2StructVisionModel(
    (embeddings): Pix2StructVisionEmbeddings(
...  (dropout): Dropout(p=0.2, inplace=False)
    (lm_head): Linear(in_features=768, out_features=50244, bias=False)
  )
)
config = <optimum.exporters.onnx.model_configs.Pix2StructOnnxConfig object at 0x000001607D07E220>

    def get_encoder_decoder_models_for_export(
        model: Union["PreTrainedModel", "TFPreTrainedModel"], config: "OnnxConfig"
    ) -> Dict[str, Tuple[Union["PreTrainedModel", "TFPreTrainedModel"], "OnnxConfig"]]:
        """
        Returns the encoder and decoder parts of the model and their subsequent onnx configs.

        Args:
            model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
                The model to export.
            config ([`~exporters.onnx.config.OnnxConfig`]):
                The ONNX configuration associated with the exported model.

        Returns:
            `Dict[str, Tuple[Union[`PreTrainedModel`, `TFPreTrainedModel`], `OnnxConfig`]: A Dict containing the model and
            onnx configs for the encoder and decoder parts of the model.
        """
        models_for_export = {}

        encoder_model = model.get_encoder()
>       encoder_onnx_config = config.with_behavior("encoder")
E       AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'

optimum\exporters\onnx\utils.py:105: AttributeError
========================================================================= warnings summary ========================================================================== 
.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30
  E:\INETUM\INETUM_Projects\Document Analysis\optimum\.virtualenv\lib\site-packages\diffusers\models\cross_attention.py:30: FutureWarning: Importing from cross_attention is deprecated. Please import from diffusers.models.attention_processor instead.
    deprecate(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================== short test summary info ====================================================================== 
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_338_pix2struct_image_to_text - ValueError: Unrecognized 
configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_339_pix2struct_image_to_text_monolith - ValueError: Unrecognized configuration class <class 'transformers.models.pix2struct.configuration_pix2struct.Pix2StructConfig'> for this kind of AutoModel: AutoM...
FAILED tests/exporters/onnx/test_exporters_onnx_cli.py::OnnxCLIExportTestCase::test_exporters_cli_pytorch_cpu_340_pix2struct_no_task - AttributeError: 'Pix2StructOnnxConfig' object has no attribute 'with_behavior'
===================================================== 3 failed, 16 skipped, 3119 deselected, 1 warning in 8.86s ===================================================== 

Do you know from which class the pix2struct config should inherit? Any idea how to fix the errors?

Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants