-
Notifications
You must be signed in to change notification settings - Fork 27.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Kosmos-2
model
#24709
Add Kosmos-2
model
#24709
Conversation
The documentation is not available anymore as the PR was closed or merged. |
any updates? |
WIP, but a bit slow pace |
76d619f
to
6edb6e2
Compare
fe01858
to
2ab66a4
Compare
c484300
to
92f6cef
Compare
5b0eec5
to
9c6e954
Compare
Constructs an KOSMOS-2 processor which wraps a KOSMOS-2 image processor and a KOSMOS-2 tokenizer into a single | ||
processor. | ||
|
||
[`Kosmos2Processor`] offers all the functionalities of [`Kosmos2ImageProcessor`] and some functionalities of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are interested, here is some details
Kosmos2Processor
applies some changes to the input string (adding some extra tokens related to image information, etc.), and it needs to return a image_features_mask
.
It also allows to add EOS
token depending on if we are doing generation or not. For this to work correctly between the slow/fast tokenizer, we need to add BOS
token or not at string level.
In order to make the above 2 things work correctly together with padding, it calls the tokenizer
without padding, compute image_features_mask
, then adding pad tokens if necessary.
However, this can't provide all the (huge number) functionalities our base tokenizer class provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
model_name="ydshieh/temp-testing-kosmos-2", | ||
# TODO (ydshieh): add a revision once we push to `microsoft` org | ||
revision=None, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thorough tests in Kosmos2ProcessorTest
covers the real use cases. But I can duplicate the tests to here if @ArthurZucker prefer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding this model!
Incredible piece of work. Thank you for taking the time to make the complex processing code so well tested and documented with comments etc and clearly laid out. Really excited to have this model added! :D
It would be great to have a second review from @ArthurZucker on this, particularly for the tokenizer and processor.
cc @rafaelpadilla for reference
|
||
## Overview | ||
|
||
The KOSMOS-2 model was proposed in [Kosmos-2: Grounding Multimodal Large Language Models to the World] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a few sentences about the model and what is does here? A good example is MusicGen
b4fe493
to
ad15fb2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want a good example, you can checkout pegasus
or byteT5
Hi @ArthurZucker Ready for you to take a review a again :-) Would be great if we can merge before Thursday 🙏 (if everything is good) as we want to have an announcement with the original Kosmos-2 authors/team. Thank you in advance! |
@ydshieh great work !!!! thanks again! |
* Add KOSMOS-2 model * update * update * update * address review comment - 001 * address review comment - 002 * address review comment - 003 * style * Apply suggestions from code review Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> * fix * address review comment - 004 * address review comment - 005 * address review comment - 006 * address review comment - 007 * address review comment - 008 * address review comment - 009 * address review comment - 010 * address review comment - 011 * update readme * fix * fix * fix * [skip ci] fix * revert the change in _decode * fix docstring * fix docstring * Update docs/source/en/model_doc/kosmos-2.md Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> * no more Kosmos2Tokenizer * style * remove "returned when being computed by the model" * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * UTM5 Atten * fix attn mask * use present_key_value_states instead of next_decoder_cache * style * conversion scripts * conversion scripts * conversion scripts * Add _reorder_cache * fix doctest and copies * rename 1 * rename 2 * rename 3 * make fixup * fix table * fix docstring * rename 4 * change repo_id * remove tip * update md file * make style * update md file * put docs/source/en/model_doc/kosmos-2.md to slow * update conversion script * Use CLIPImageProcessor in Kosmos2Processor * Remove Kosmos2ImageProcessor * Remove to_dict in Kosmos2Config * Remove files * fix import * Update conversion * normalized=False * Not using hardcoded values like <image> * elt --> element * Apply suggestion * Not using hardcoded values like </image> * No assert * No nested functions * Fix md file * copy * update doc * fix docstring * fix name * Remove _add_remove_spaces_around_tag_tokens * Remove dummy docstring of _preprocess_single_example * Use `BatchEncoding` * temp * temp * temp * Update * Update * Make Kosmos2ProcessorTest a bit pretty * Update gradient checkpointing * Fix gradient checkpointing test * Remove one liner remove_special_fields * Simplify conversion script * fix add_eos_token * update readme * update tests * Change to microsoft/kosmos-2-patch14-224 * style * Fix doc --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: NielsRogge <48327001+NielsRogge@users.noreply.github.com> Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
Add
KOSMOS-2
model.Kosmos2TextModel
andKosmos2VisionModel
to the main__init__
file:Kosmos2ForConditionalGeneration
into those 2 models won't work.Kosmos2ForConditionalGeneration
intoKosmos2Model
works.TODO (follow-up PRs):
microsoft
and change the used checkpoint repo id.