Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Kosmos-2 model #24709

Merged
merged 91 commits into from
Oct 30, 2023
Merged

Add Kosmos-2 model #24709

merged 91 commits into from
Oct 30, 2023

Conversation

ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Jul 7, 2023

What does this PR do?

Add KOSMOS-2 model.

  • I decide not to expose Kosmos2TextModel and Kosmos2VisionModel to the main __init__ file:
    • as they are really only building blocks. Moreover, loading checkpoints of Kosmos2ForConditionalGeneration into those 2 models won't work.
    • loading Kosmos2ForConditionalGeneration into Kosmos2Model works.
    • (and therefore no corresponding tests for those 2 models)

TODO (follow-up PRs):

  • add a checkpoint conversion script in a follow up PR. (It's there, I just need to clean up the mess code.)
  • upload checkpoint to microsoft and change the used checkpoint repo id.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jul 7, 2023

The documentation is not available anymore as the PR was closed or merged.

@Rajmehta123
Copy link

any updates?

@ydshieh
Copy link
Collaborator Author

ydshieh commented Jul 19, 2023

WIP, but a bit slow pace

@ydshieh ydshieh force-pushed the add_kosmos_2 branch 2 times, most recently from 76d619f to 6edb6e2 Compare August 25, 2023 12:50
@ydshieh ydshieh force-pushed the add_kosmos_2 branch 3 times, most recently from fe01858 to 2ab66a4 Compare September 4, 2023 17:47
@ydshieh ydshieh force-pushed the add_kosmos_2 branch 5 times, most recently from c484300 to 92f6cef Compare September 8, 2023 08:48
@ydshieh ydshieh changed the title [WIP] Add kosmos-2 Add Kosmos-2 model Sep 8, 2023
@ydshieh ydshieh marked this pull request as ready for review September 8, 2023 15:12
Constructs an KOSMOS-2 processor which wraps a KOSMOS-2 image processor and a KOSMOS-2 tokenizer into a single
processor.

[`Kosmos2Processor`] offers all the functionalities of [`Kosmos2ImageProcessor`] and some functionalities of
Copy link
Collaborator Author

@ydshieh ydshieh Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are interested, here is some details

Kosmos2Processor applies some changes to the input string (adding some extra tokens related to image information, etc.), and it needs to return a image_features_mask.

It also allows to add EOS token depending on if we are doing generation or not. For this to work correctly between the slow/fast tokenizer, we need to add BOS token or not at string level.

In order to make the above 2 things work correctly together with padding, it calls the tokenizer without padding, compute image_features_mask, then adding pad tokens if necessary.

However, this can't provide all the (huge number) functionalities our base tokenizer class provides.

Copy link
Collaborator Author

@ydshieh ydshieh Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model_name="ydshieh/temp-testing-kosmos-2",
# TODO (ydshieh): add a revision once we push to `microsoft` org
revision=None,
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thorough tests in Kosmos2ProcessorTest covers the real use cases. But I can duplicate the tests to here if @ArthurZucker prefer.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this model!

Incredible piece of work. Thank you for taking the time to make the complex processing code so well tested and documented with comments etc and clearly laid out. Really excited to have this model added! :D

It would be great to have a second review from @ArthurZucker on this, particularly for the tokenizer and processor.

cc @rafaelpadilla for reference


## Overview

The KOSMOS-2 model was proposed in [Kosmos-2: Grounding Multimodal Large Language Models to the World]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a few sentences about the model and what is does here? A good example is MusicGen

src/transformers/models/kosmos2/processing_kosmos2.py Outdated Show resolved Hide resolved
src/transformers/models/kosmos2/processing_kosmos2.py Outdated Show resolved Hide resolved
@ydshieh ydshieh force-pushed the add_kosmos_2 branch 3 times, most recently from b4fe493 to ad15fb2 Compare September 21, 2023 07:40
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want a good example, you can checkout pegasus or byteT5

src/transformers/models/kosmos2/tokenization_kosmos2.py Outdated Show resolved Hide resolved
src/transformers/models/kosmos2/tokenization_kosmos2.py Outdated Show resolved Hide resolved
src/transformers/models/kosmos2/tokenization_kosmos2.py Outdated Show resolved Hide resolved
@ydshieh
Copy link
Collaborator Author

ydshieh commented Sep 26, 2023

Hi @ArthurZucker Ready for you to take a review a again :-)

Would be great if we can merge before Thursday 🙏 (if everything is good) as we want to have an announcement with the original Kosmos-2 authors/team. Thank you in advance!

@ydshieh ydshieh merged commit 691fd8f into main Oct 30, 2023
23 checks passed
@ydshieh ydshieh deleted the add_kosmos_2 branch October 30, 2023 12:32
@BIGBALLON
Copy link

@ydshieh great work !!!! thanks again!

EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
* Add KOSMOS-2 model

* update

* update

* update

* address review comment - 001

* address review comment - 002

* address review comment - 003

* style

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

* address review comment - 004

* address review comment - 005

* address review comment - 006

* address review comment - 007

* address review comment - 008

* address review comment - 009

* address review comment - 010

* address review comment - 011

* update readme

* fix

* fix

* fix

* [skip ci] fix

* revert the change in _decode

* fix docstring

* fix docstring

* Update docs/source/en/model_doc/kosmos-2.md

Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>

* no more Kosmos2Tokenizer

* style

* remove "returned when being computed by the model"

* Apply suggestions from code review

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* UTM5 Atten

* fix attn mask

* use present_key_value_states instead of next_decoder_cache

* style

* conversion scripts

* conversion scripts

* conversion scripts

* Add _reorder_cache

* fix doctest and copies

* rename 1

* rename 2

* rename 3

* make fixup

* fix table

* fix docstring

* rename 4

* change repo_id

* remove tip

* update md file

* make style

* update md file

* put docs/source/en/model_doc/kosmos-2.md to slow

* update conversion script

* Use CLIPImageProcessor in Kosmos2Processor

* Remove Kosmos2ImageProcessor

* Remove to_dict in Kosmos2Config

* Remove files

* fix import

* Update conversion

* normalized=False

* Not using hardcoded values like <image>

* elt --> element

* Apply suggestion

* Not using hardcoded values like </image>

* No assert

* No nested functions

* Fix md file

* copy

* update doc

* fix docstring

* fix name

* Remove _add_remove_spaces_around_tag_tokens

* Remove dummy docstring of _preprocess_single_example

* Use `BatchEncoding`

* temp

* temp

* temp

* Update

* Update

* Make Kosmos2ProcessorTest a bit pretty

* Update gradient checkpointing

* Fix gradient checkpointing test

* Remove one liner remove_special_fields

* Simplify conversion script

* fix add_eos_token

* update readme

* update tests

* Change to microsoft/kosmos-2-patch14-224

* style

* Fix doc

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com>
Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants