-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataComp] Add T-MARS #374
Conversation
a5dd774
to
71f705a
Compare
Do all the model and config files need to be included in the |
647c9ab
to
131d27b
Compare
188562e
to
76a0172
Compare
4d32e54
to
95a3889
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NielsRogge!
Looks good overall but I think there are many added files for the text detection component that can be removed (will make review easier). Maybe let's stick to one model for now and only keeps the configs necessary for that one
examples/pipelines/datacomp/components/detect_text/src/models/head/psenet_head.py
Outdated
Show resolved
Hide resolved
examples/pipelines/datacomp/components/detect_text/src/models/loss/dice_loss.py
Outdated
Show resolved
Hide resolved
examples/pipelines/datacomp/components/detect_text/src/models/neck/fpem_v2.py
Outdated
Show resolved
Hide resolved
examples/pipelines/datacomp/components/detect_text/src/models/neck/fpn.py
Outdated
Show resolved
Hide resolved
@@ -32,6 +32,13 @@ produces: | |||
clipl14score: | |||
type: float32 | |||
|
|||
textembedding: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering whether it won't be more intuitive to have the embeddings as a field rather than a subset. It makes more sense since different modalities (subsets) can have different embeddings, Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I know, however image data and image embeddings are both heavy in terms of data, which is probably why the subsets
were created in the first place. So if both image data and embeddings are part of the same subset, then Fondant will first always load both first, and then only provide the embeddings if the user specified this, right?
72160f6
to
7cdc018
Compare
ae6df4a
to
7f56da0
Compare
7f56da0
to
e52dfda
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Niels!
Left a few additional comments. Can you please revert back the changes to the Dockerfiles/images? Otherwise the CI/CD pipelines will fail to update them
|
||
# Install Fondant | ||
# This is split from other requirements to leverage caching | ||
ARG FONDANT_VERSION=6f8d908c4231785afab8c2b3e87f66f9abf0452b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert to main
|
||
# Install Fondant | ||
# This is split from other requirements to leverage caching | ||
ARG FONDANT_VERSION=6f8d908c4231785afab8c2b3e87f66f9abf0452b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
node_pool_label="node_pool", | ||
node_pool_name="model-inference-mega-pool", | ||
number_of_gpus=1, | ||
cache=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove cache=False
now
@@ -16,8 +19,8 @@ | |||
pipeline_name="datacomp-filtering-pipeline", | |||
pipeline_description="A pipeline for filtering the Datacomp dataset", | |||
base_path=PipelineConfigs.BASE_PATH, | |||
# base_path="/Users/nielsrogge/Documents/fondant_artifacts_datacomp", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be removed
2fcc090
to
2b6e2c5
Compare
2b6e2c5
to
a751cac
Compare
This PR adds the [T-MARS](https://tmars-clip.github.io/) pipeline to the Datacomp folder. Initial discussion at locuslab/T-MARS#3. <img width="386" alt="Screenshot 2023-08-31 at 11 11 19" src="https://github.com/ml6team/fondant/assets/48327001/a697121d-9851-4445-94ac-19b257719344"> To do: - remove commit hashes and replace with dev tag before merging --------- Co-authored-by: Philippe Moussalli <philippe.moussalli95@gmail.com>
This PR adds the T-MARS pipeline to the Datacomp folder. Initial discussion at locuslab/T-MARS#3.
To do: