-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use from_registry for generic components #285
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,148 @@ | ||
# Components | ||
|
||
Fondant makes it easy to build data preparation pipelines leveraging reusable components. Fondant | ||
provides a lot of components out of the box | ||
([overview](https://github.com/ml6team/fondant/tree/main/components)), but you can also define your | ||
own custom components. | ||
|
||
## The anatomy of a component | ||
|
||
A component is completely defined by its [component specification](component_spec.md) and a | ||
docker image. The specification defines the docker image fondant should run to execute the | ||
component, which data it consumes and produces, and which arguments it takes. | ||
|
||
## Component types | ||
|
||
We can distinguish three different types of components: | ||
|
||
- **Reusable components** can be used out of the box and can be loaded from the fondant | ||
component registry | ||
- **Custom components** are completely defined and implemented by the user | ||
- **Generic components** leverage a reusable implementation, but require a custom component | ||
specification | ||
|
||
### Reusable components | ||
|
||
Reusable components are completely defined and implemented by fondant. You can easily add them | ||
to your pipeline by creating an operation using `ComponentOp.from_registry()`. | ||
|
||
```python | ||
from fondant.pipeline import ComponentOp | ||
|
||
component_op = ComponentOp.from_registry( | ||
name="reusable_component", | ||
arguments={ | ||
"arg": "value" | ||
} | ||
) | ||
``` | ||
|
||
??? "fondant.pipeline.ComponentOp.from_registry" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. are those meant to end up here? there are quite a few of them There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, you can see the result in the built documentation: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looks neat ✨ |
||
|
||
::: fondant.pipeline.ComponentOp.from_registry | ||
handler: python | ||
options: | ||
show_source: false | ||
|
||
You can find an overview of the reusable components offered by fondant | ||
[here](https://github.com/ml6team/fondant/tree/main/components). Check their | ||
`fondant_component.yaml` file for information on which arguments they accept and which data they | ||
consume and produce. | ||
|
||
### Custom components | ||
|
||
To define your own custom component, you can build your code into a docker image and write an | ||
accompanying component specification that refers to it. | ||
|
||
A typical file structure for a custom component looks like this: | ||
``` | ||
|- components | ||
| |- custom_component | ||
| |- src | ||
| | |- main.py | ||
| |- Dockerfile | ||
| |- fondant_component.yaml | ||
|- pipeline.py | ||
``` | ||
|
||
The `Dockerfile` is used to build the code into a docker image, which is then referred to in the | ||
`fondant_component.yaml`. | ||
|
||
```yaml title="components/custom_component/fondant_component.yaml" | ||
name: Custom component | ||
description: This is a custom component | ||
image: custom_component:latest | ||
``` | ||
|
||
You can add a custom component to your pipeline by creating a `ComponentOp` and passing in the path | ||
to the directory containing your `fondant_component.yaml`. | ||
|
||
```python title="pipeline.py" | ||
from fondant.pipeline import ComponentOp | ||
|
||
component_op = ComponentOp( | ||
component_dir="components/custom_component", | ||
arguments={ | ||
"arg": "value" | ||
} | ||
) | ||
``` | ||
|
||
??? "fondant.pipeline.ComponentOp" | ||
|
||
::: fondant.pipeline.ComponentOp | ||
handler: python | ||
options: | ||
members: [] | ||
show_source: false | ||
|
||
See our [best practices on creating a custom component](custom_component.md). | ||
|
||
### Generic components | ||
|
||
A generic component is a component leveraging a reusable docker image, but requiring a custom | ||
`fondant_component.yaml` specification. | ||
|
||
Since a generic component only requires a custom `fondant_component.yaml`, its file structure | ||
looks like this: | ||
``` | ||
|- components | ||
| |- generic_component | ||
| |- fondant_component.yaml | ||
|- pipeline.py | ||
``` | ||
|
||
The `fondant_component.yaml` refers to the reusable image it leverages: | ||
|
||
```yaml title="components/generic_component/fondant_component.yaml" | ||
name: Generic component | ||
description: This is a generic component | ||
image: reusable_component:latest | ||
``` | ||
|
||
You can add a generic component to your pipeline by creating a `ComponentOp` and passing in the path | ||
to the directory containing your custom `fondant_component.yaml`. | ||
|
||
```python title="pipeline.py" | ||
from fondant.pipeline import ComponentOp | ||
|
||
component_op = ComponentOp( | ||
component_dir="components/generic_component", | ||
arguments={ | ||
"arg": "value" | ||
} | ||
) | ||
``` | ||
|
||
??? "fondant.pipeline.ComponentOp" | ||
|
||
::: fondant.pipeline.ComponentOp | ||
handler: python | ||
options: | ||
members: [] | ||
show_source: false | ||
|
||
An example of a generic component is the | ||
[`load_from_hf_hub`](https://github.com/ml6team/fondant/tree/main/components/load_from_hf_hub) | ||
components. It can read any dataset from the HuggingFace hub, but it requires the user to define | ||
the schema of the produced dataset in a custom `fondant_component.yaml` specification. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -9,5 +9,5 @@ root_path=$(dirname "$scripts_path") | |
|
||
pushd "$root_path" | ||
rm -rf src/fondant/components | ||
cp -r components src/fondant/ | ||
find components/ -type f | grep -i yaml$ | xargs -i cp --parents {} src/fondant/ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Only copy the component specifications and keep the same structure. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you validate this on mac @GeorgesLorre? |
||
popd |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -170,10 +170,11 @@ def _generate_spec(self, pipeline: Pipeline, extra_volumes: list) -> dict: | |
"volumes": volumes, | ||
} | ||
|
||
if component_op.local_component: | ||
services[safe_component_name][ | ||
"build" | ||
] = f"./{Path(component_op.component_spec_path).parent}" | ||
if component_op.dockerfile_path is not None: | ||
logger.info( | ||
f"Found Dockerfile for {component_name}, adding build step.", | ||
) | ||
services[safe_component_name]["build"] = str(component_op.component_dir) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The previous implementation failed for absolute paths, this works for both absolute and relative paths. |
||
else: | ||
services[safe_component_name][ | ||
"image" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the descriptions, quite clear