diff --git a/README.md b/README.md index 38fc1c81f..eabcd3b62 100644 --- a/README.md +++ b/README.md @@ -108,7 +108,7 @@ Fondant comes with a library of reusable components, which can jumpstart your pi | [embedding_based_laion_retrieval](https://github.com/ml6team/fondant/tree/main/components/embedding_based_laion_retrieval) | Retrieve images-text pairs from LAION using embedding similarity | | [download_images](https://github.com/ml6team/fondant/tree/main/components/download_images) | Download images from urls | | **Image processing** | | -| [image_embedding](https://github.com/ml6team/fondant/tree/main/components/image_embedding) | Create embeddings for images using a model from the HF Hub | +| [embed_images](https://github.com/ml6team/fondant/tree/main/components/embed_images) | Create embeddings for images using a model from the HF Hub | | [image_resolution_extraction](https://github.com/ml6team/fondant/tree/main/components/image_resolution_extraction) | Extract the resolution from images | | [filter_image_resolution](https://github.com/ml6team/fondant/tree/main/components/filter_image_resolution) | Filter images based on their resolution | | [caption images](https://github.com/ml6team/fondant/tree/main/components/caption_images) | Generate captions for images using a model from the HF Hub | diff --git a/components/image_embedding/Dockerfile b/components/embed_images/Dockerfile similarity index 100% rename from components/image_embedding/Dockerfile rename to components/embed_images/Dockerfile diff --git a/components/embed_images/README.md b/components/embed_images/README.md new file mode 100644 index 000000000..126e6844b --- /dev/null +++ b/components/embed_images/README.md @@ -0,0 +1,9 @@ +# Embed images + +### Description +This component takes images as input and embeds them using a CLIP model from Hugging Face. +The embeddings are stored in a new colum as arrays of floats. + +### **Inputs/Outputs** + +See [`fondant_component.yaml`](fondant_component.yaml) for a more detailed description on all the input/output parameters. \ No newline at end of file diff --git a/components/image_embedding/fondant_component.yaml b/components/embed_images/fondant_component.yaml similarity index 72% rename from components/image_embedding/fondant_component.yaml rename to components/embed_images/fondant_component.yaml index e4bd7a9c6..d56868031 100644 --- a/components/image_embedding/fondant_component.yaml +++ b/components/embed_images/fondant_component.yaml @@ -1,6 +1,6 @@ -name: Image embedding +name: Embed images description: Component that embeds images using CLIP -image: ghcr.io/ml6team/image_embedding:dev +image: ghcr.io/ml6team/embed_images:dev consumes: images: @@ -18,7 +18,7 @@ produces: args: model_id: - description: Model id on the Hugging Face hub (e.g. "openai/clip-vit-large-patch14") + description: Model id of a CLIP model on the Hugging Face hub type: str default: "openai/clip-vit-large-patch14" batch_size: diff --git a/components/image_embedding/requirements.txt b/components/embed_images/requirements.txt similarity index 100% rename from components/image_embedding/requirements.txt rename to components/embed_images/requirements.txt diff --git a/components/image_embedding/src/main.py b/components/embed_images/src/main.py similarity index 97% rename from components/image_embedding/src/main.py rename to components/embed_images/src/main.py index 1483cc8c6..9e4f9d240 100644 --- a/components/image_embedding/src/main.py +++ b/components/embed_images/src/main.py @@ -51,7 +51,7 @@ def embed_image_batch(image_batch: pd.DataFrame, *, model: CLIPVisionModelWithPr class EmbedImagesComponent(PandasTransformComponent): - """Component that captions images using a model from the Hugging Face hub.""" + """Component that embeds images using a CLIP model from the Hugging Face hub.""" def __init__( self, diff --git a/docs/getting_started.md b/docs/getting_started.md index 6009aa767..58a87954a 100644 --- a/docs/getting_started.md +++ b/docs/getting_started.md @@ -295,7 +295,7 @@ You will see that the components runs sequentially and that each has its own log Note that with custom components the image will be built as part of running the pipeline by leveraging a `build` spec in the docker-compose file. This means that you can change the code of your component and run the pipeline again without having to rebuild the image manually. -We now have a simple pipeline that downloads a dataset from huggingface hub and extracts the width and height of the images. A possible next step is to create a component that [filters the data based on the aspect ratio](https://github.com/ml6team/fondant/tree/main/components/filter_image_resolution) ? Or run a [clip model on the images to get captions](https://github.com/ml6team/fondant/tree/main/components/image_embedding)? +We now have a simple pipeline that downloads a dataset from huggingface hub and extracts the width and height of the images. A possible next step is to create a component that [filters the data based on the aspect ratio](https://github.com/ml6team/fondant/tree/main/components/filter_image_resolution) ? Or run a [CLIP model on the images to get embeddings](https://github.com/ml6team/fondant/tree/main/components/embed_images)? ## Explore the data diff --git a/examples/pipelines/datacomp/pipeline.py b/examples/pipelines/datacomp/pipeline.py index 546ddfa84..010d7b65d 100644 --- a/examples/pipelines/datacomp/pipeline.py +++ b/examples/pipelines/datacomp/pipeline.py @@ -76,7 +76,7 @@ cache=False, ) embed_images_op = ComponentOp.from_registry( - name="image_embedding", + name="embed_images", arguments={ "batch_size": 2, }, diff --git a/examples/pipelines/finetune_stable_diffusion/pipeline.py b/examples/pipelines/finetune_stable_diffusion/pipeline.py index 4914124fa..1e598121a 100644 --- a/examples/pipelines/finetune_stable_diffusion/pipeline.py +++ b/examples/pipelines/finetune_stable_diffusion/pipeline.py @@ -36,8 +36,8 @@ name="image_resolution_extraction" ) -image_embedding_op = ComponentOp.from_registry( - name="image_embedding", +embed_images_op = ComponentOp.from_registry( + name="embed_images", arguments={ "model_id": "openai/clip-vit-large-patch14", "batch_size": 10, @@ -90,8 +90,8 @@ pipeline.add_op(load_from_hub_op) # pipeline.add_op(image_resolution_extraction_op, dependencies=load_from_hub_op) -# pipeline.add_op(image_embedding_op, dependencies=image_resolution_extraction_op) -# pipeline.add_op(laion_retrieval_op, dependencies=image_embedding_op) +# pipeline.add_op(embed_images_op, dependencies=image_resolution_extraction_op) +# pipeline.add_op(laion_retrieval_op, dependencies=embed_images_op) # pipeline.add_op(download_images_op, dependencies=laion_retrieval_op) # pipeline.add_op(caption_images_op, dependencies=download_images_op) # pipeline.add_op(write_to_hub, dependencies=caption_images_op)