Add Idefics 3! #32473

andimarafioti · 2024-08-06T15:30:35Z

What does this PR do?

Adding the Idefics 3 model.

There are still a few things to do before merging this PR. The results are not exactly the same as with our codebase and the tests are not done. We are opening it to unblock our release.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Models:

vision models: @amyeroberts

HuggingFaceDocBuilderDev · 2024-08-06T15:50:19Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

leloykun

just have a few comments regarding consistency with other VLMs like Chameleon + nits

src/transformers/models/idefics3/processing_idefics3.py

src/transformers/models/idefics3/convert_idefics3_weights_to_hf.py

src/transformers/models/idefics3/configuration_idefics3.py

src/transformers/models/idefics3/processing_idefics3.py

src/transformers/models/idefics3/modeling_idefics3.py

zucchini-nlp

Yaaay, Idefics3 is here!

I just left a few comments to make the ongoing work on unifying a bit VLMs easier for us, but didn't really review the whole PR

src/transformers/models/idefics3/configuration_idefics3.py

src/transformers/models/idefics3/modeling_idefics3.py

zucchini-nlp · 2024-08-07T04:28:23Z

src/transformers/models/idefics3/modeling_idefics3.py

+        past_seen_tokens = 0
+        return_legacy_cache = False
+        if use_cache:
+            if not isinstance(past_key_values, Cache):  # kept for BC (non `Cache` `past_key_values` inputs)
+                return_legacy_cache = True
+                past_key_values = DynamicCache.from_legacy_cache(past_key_values)
+            past_seen_tokens = past_key_values.get_seq_length()
+
+        if inputs_embeds is not None and input_ids is None and past_seen_tokens == 0:


We don't have to support old-style cache for new models, can go directly with DynamicCache. Finally deprecated tuple cache in all decoder-only models, yay :)

And btw, past-cache-length should be obtained from cache_position, afaik we'll stop using the get_seq_length() some time in the future

Awesome! I removed the support for the old-style cache.

But I tried to use cache_position and got this error:
AttributeError: 'DynamicCache' object has no attribute 'cache_position'
So I reverted to get_seq_length()

Because past_key_values can be None, it results in an exception if we load the model without specifying use_case=False. E.g.,

Idefics3ForConditionalGeneration.from_pretrained(base_model_id, use_cache=False)

I'll take care of cache-related changes after the PR is merged, as part of work going on "new-cache compatibility"

And yes, past_kv can be None, but the logic with get_seq_length() should work as long as we check for Noneness :)

src/transformers/models/idefics3/modeling_idefics3.py

src/transformers/models/idefics3/processing_idefics3.py

amyeroberts

Thanks for all the work adding this model!

The main comment is for the tests to be properly aligned with the new model behaviours, in particular the processor and image processor.

Some general nits for the modeling file - the main one being all classes and method which come from idefics2 should have # Copied from comments.

Before the PR can be merged, all slow tests will need to be run & pass. These should be triggered in subsequent commits (I might need to approve the workflow for them to run)

docs/source/en/model_doc/idefics3.md

tests/models/idefics3/test_image_processing_idefics3.py

src/transformers/models/idefics3/configuration_idefics3.py

src/transformers/models/idefics3/modeling_idefics3.py

src/transformers/models/idefics3/processing_idefics3.py

amyeroberts · 2024-08-08T10:12:17Z

src/transformers/models/idefics3/image_processing_idefics3.py

+    if isinstance(image, Image.Image):
+        width, height = image.size


The images should never be Image.Image here

I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.

I'm also raising a warning once if the input is not PIL images. But it will still work for numpy arrays or other types of inputs.

Removed the warning as now it works perfectly with numpy arrays :)

This still should be removed - we shouldn't be processing PIL images

amyeroberts · 2024-08-08T10:13:12Z

src/transformers/models/idefics3/image_processing_idefics3.py

+                if isinstance(image, Image.Image):
+                    cropped_image = image.crop((start_x, start_y, end_x, end_y))


Same here - there shouldn't be any PIL code for the transformations

I left this here as it is a custom transforms for idefics3. If the images are passed as PIL objects, we don't convert them to numpy arrays until later in the processing pipeline.

After further discussion with Amy, I added a large change to support processing the images as numpy arrays. Here: 5a0c0f4

src/transformers/models/idefics3/configuration_idefics3.py

src/transformers/models/idefics3/processing_idefics3.py

tests/models/idefics3/test_processing_idefics3.py

src/transformers/models/idefics3/processing_idefics3.py

andimarafioti · 2024-08-15T11:22:23Z

updated main

amyeroberts · 2024-08-15T15:45:21Z

@andimarafioti I can see that you re-requested review, but there's still some debugging commits being pushed so will hold of reviewing until this has been resolved. I'm going to unsubscribe to prevent getting notifications for every push - as soon as you have a question or want to let me know it's ready, just ping me with @amyeroberts and I'll get a notification :)

andimarafioti · 2024-08-16T08:07:20Z

@amyeroberts ready to review! There is still the multi-gpu tests that is queued but if those fail I would skip them. They run OOM in the CI on single GPU and there is an issue open for the same on idefics2 #32288. If my fix here works, I already opened a PR for that fix as well: #32840

andimarafioti · 2024-08-16T12:47:20Z

Talked to @molbap and he said that there is an issue with the multi-gpu workers getting stuck in the queue. But it's not related to this PR.

amyeroberts · 2024-08-20T11:15:35Z

@andimarafioti Yes, unfortunately we're having issues at the moment with single GPU and multi-GPU runners taking a long time to run / never running cc @ydshieh.

It looks like the multi GPU tests did eventually run and there's at least one test which is currently failing to be addressed

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

andimarafioti · 2024-09-18T12:31:16Z

I rebased on main

emanuelevivoli · 2024-09-23T09:21:27Z

src/transformers/models/idefics3/modeling_idefics3.py

+
+    # Copied from transformers.models.idefics2.modeling_idefics2.Idefics2PreTrainedModel._init_weights
+    def _init_weights(self, module):
+        std = (


As it is, this always assigns self.config.text_config.initializer_range while, from what I understand, it should assign self.config.initializer_range in case hasattr(self.config, "initializer_range"). Is it possible?

yes, you're right. This seems to also be a mistake on idefics2. Thanks!

amyeroberts

Looks great - I think we're good to go!

@zucchini-nlp

* Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

@zucchini-nlp

* Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

@zucchini-nlp

* Add Idefics 3! * fixes to make both pipelines identical * fix for quantized models * First pass at the review * remove vocab size from the main config (it's still in the text_config) * hot fix for merve * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * re-add model_type for text_config * remove support for old_cache * remove hidden_size from main config * rename idefics3 HF repo * few changes suggested in the PR * fix to input_data_format computation * remove overwrite of _autoset_attn_implementation following @zucchini-nlp suggestion * improve example * few improvements from amy's review * big change to enable processing input images as numpy arrays * Changes to the code to uniformize processor kwargs * image processing tests * image processing tests fixes and some bugs they discovered * addressed review comments from Yoni * fix modeling tests * remove special tokens that are not special * fixes tests * skip failing tests - they also fail for idefics2 * added paper and readded the tests with multi gpu, who knows * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * review amy until image_processing_idefics3 * last comments from Amy * review amy * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update docs/source/en/model_doc/idefics3.md Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * doc improvement - amy review * fix runtime error during fine-tuning * amy's review * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update src/transformers/models/idefics3/modeling_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * ruff * amy's comment on the order * ruff ruff * fix copies * square images when they are not splitted * ruff :( * Update src/transformers/models/idefics3/image_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * Update tests/models/idefics3/test_processing_idefics3.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix small bug introduced in refactor * amy's image processing changes * fixes peft tests and ruff * modify to_pil_image from transformers. and review from emanuele. * add modified to_pil_image --------- Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ywang96 mentioned this pull request Aug 6, 2024

[Model]Refactor MiniCPMV vllm-project/vllm#7020

Merged

leloykun reviewed Aug 6, 2024

View reviewed changes

snova-jonathanl reviewed Aug 7, 2024

View reviewed changes

src/transformers/models/idefics3/modeling_idefics3.py Show resolved Hide resolved

zucchini-nlp reviewed Aug 7, 2024

View reviewed changes

amyeroberts reviewed Aug 7, 2024

View reviewed changes

amyeroberts added the run-slow label Aug 7, 2024

amyeroberts reviewed Aug 8, 2024

View reviewed changes

src/transformers/models/idefics3/configuration_idefics3.py Outdated Show resolved Hide resolved

andimarafioti commented Aug 9, 2024

View reviewed changes

src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved

andimarafioti commented Aug 9, 2024

View reviewed changes

src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved

amitbcp mentioned this pull request Aug 12, 2024

Idefics3 Addition open-compass/VLMEvalKit#379

Merged

andimarafioti force-pushed the idefics3 branch from 1c10a48 to 5a0c0f4 Compare August 13, 2024 08:50

andimarafioti linked an issue Aug 13, 2024 that may be closed by this pull request

Can MPS use FP16 when training?Why I can't? #32648

Closed

4 tasks

andimarafioti removed a link to an issue Aug 13, 2024

Can MPS use FP16 when training?Why I can't? #32648

Closed

4 tasks

yonigozlan reviewed Aug 13, 2024

View reviewed changes

src/transformers/models/idefics3/processing_idefics3.py Outdated Show resolved Hide resolved

yonigozlan reviewed Aug 13, 2024

View reviewed changes

tests/models/idefics3/test_processing_idefics3.py Outdated Show resolved Hide resolved

EricLBuehler reviewed Aug 13, 2024

View reviewed changes

src/transformers/models/idefics3/processing_idefics3.py Show resolved Hide resolved

andimarafioti force-pushed the idefics3 branch from f67ed1e to e71711d Compare August 15, 2024 11:20

andimarafioti requested a review from amyeroberts August 15, 2024 11:22

andimarafioti force-pushed the idefics3 branch from 0f7a8e6 to 4756044 Compare August 15, 2024 13:06

andimarafioti force-pushed the idefics3 branch 3 times, most recently from ef576cb to 483d5d8 Compare August 16, 2024 07:36

andimarafioti and others added 13 commits September 18, 2024 12:31

fix runtime error during fine-tuning

6325fbc

amy's review

76b8892

Update src/transformers/models/idefics3/image_processing_idefics3.py

9a20306

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update src/transformers/models/idefics3/image_processing_idefics3.py

3129920

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update src/transformers/models/idefics3/modeling_idefics3.py

e1a10b3

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ruff

4c3756f

amy's comment on the order

fbaf07e

ruff ruff

87fa179

fix copies

23d4cf8

square images when they are not splitted

9e925b9

ruff :(

215b636

Update src/transformers/models/idefics3/image_processing_idefics3.py

2967974

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

Update tests/models/idefics3/test_processing_idefics3.py

ee041bf

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

andimarafioti force-pushed the idefics3 branch from 7ee2ec8 to ee041bf Compare September 18, 2024 12:31

andimarafioti added 3 commits September 18, 2024 12:35

fix small bug introduced in refactor

4aad266

amy's image processing changes

f1ae8ae

fixes peft tests and ruff

39d88b2

andimarafioti force-pushed the idefics3 branch from 1bbf7ba to 39d88b2 Compare September 20, 2024 09:32

andimarafioti requested a review from amyeroberts September 20, 2024 09:36

emanuelevivoli reviewed Sep 23, 2024

View reviewed changes

andimarafioti added 2 commits September 23, 2024 13:17

modify to_pil_image from transformers. and review from emanuele.

383f0db

add modified to_pil_image

682b82b

amyeroberts mentioned this pull request Sep 25, 2024

[DO NOT MERGE] Idefics3 - resolve data_format for idefics3 image processing #33599

Closed

amyeroberts approved these changes Sep 25, 2024

View reviewed changes

andimarafioti merged commit f2c388e into huggingface:main Sep 25, 2024
24 checks passed

andimarafioti deleted the idefics3 branch November 21, 2024 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Idefics 3! #32473

Add Idefics 3! #32473

andimarafioti commented Aug 6, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Aug 6, 2024

leloykun left a comment

zucchini-nlp left a comment

zucchini-nlp Aug 7, 2024

andimarafioti Aug 8, 2024

fsommers Sep 1, 2024

zucchini-nlp Sep 2, 2024

amyeroberts left a comment

amyeroberts Aug 8, 2024

andimarafioti Aug 12, 2024

andimarafioti Aug 12, 2024

andimarafioti Aug 15, 2024

amyeroberts Aug 28, 2024

andimarafioti Aug 30, 2024

amyeroberts Aug 8, 2024

andimarafioti Aug 12, 2024

andimarafioti Aug 13, 2024

andimarafioti commented Aug 15, 2024

amyeroberts commented Aug 15, 2024

andimarafioti commented Aug 16, 2024

andimarafioti commented Aug 16, 2024

amyeroberts commented Aug 20, 2024

andimarafioti commented Sep 18, 2024

emanuelevivoli Sep 23, 2024

andimarafioti Sep 23, 2024

amyeroberts left a comment

		if isinstance(image, Image.Image):
		width, height = image.size

		if isinstance(image, Image.Image):
		cropped_image = image.crop((start_x, start_y, end_x, end_y))

Add Idefics 3! #32473

Add Idefics 3! #32473

Conversation

andimarafioti commented Aug 6, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 6, 2024

leloykun left a comment

Choose a reason for hiding this comment

zucchini-nlp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andimarafioti commented Aug 15, 2024

amyeroberts commented Aug 15, 2024

andimarafioti commented Aug 16, 2024

andimarafioti commented Aug 16, 2024

amyeroberts commented Aug 20, 2024

andimarafioti commented Sep 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

andimarafioti commented Aug 6, 2024 •

edited

Loading