-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't use from_registry for generic components #285
Conversation
6364f0a
to
6673b4c
Compare
@@ -9,5 +9,5 @@ root_path=$(dirname "$scripts_path") | |||
|
|||
pushd "$root_path" | |||
rm -rf src/fondant/components | |||
cp -r components/ src/fondant/ | |||
find components/ -type f | grep -i yaml$ | xargs -i cp --parents {} src/fondant/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only copy the component specifications and keep the same structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you validate this on mac @GeorgesLorre?
"build" | ||
] = f"./{Path(component_op.component_spec_path).parent}" | ||
if component_op.dockerfile_path is not None: | ||
services[safe_component_name]["build"] = str(component_op.component_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation failed for absolute paths, this works for both absolute and relative paths.
assert ( | ||
load_from_hub_custom_op.component_spec | ||
== load_from_hub_default_op.component_spec | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these checks were trivial but harder to replicate, so only kept the one that is non-trivial.
self.component_spec.specification, | ||
) | ||
def dockerfile_path(self) -> t.Optional[Path]: | ||
path = self.component_dir / "Dockerfile" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dockerfile
is the default and should be used but I have seen people use dockerfile
before (and it also works) so maybe we should also check for that ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have to, since we can expect these components to be implemented for fondant
specifically. So we can impose our own expectations. I don't think that's a bad thing to do here, since we adhere to the standard and keep things simple.
Did you check the docs for needed changes ? I think at least the getting-started might need some updates aswel. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Robbe!
I just have a few doubts about whether this can be clear for the end user. Right now we have:
- Custom components
ComponentOp(component_dir="")
- Resuable components
ComponentOp.from_registry(name="")
- Generic components
Can be considered to be a hybrid of custom component (require a customized component spec) and a reusable component (images are already built and available in the registry).
However, right now it seems like we're defining them as if they're fully custom
ComponentOp(component_dir="")
At first glance, it seems like the directory should contain all the component files (src_code, Dockerfile, ...) similar to the custom components where they are used to build the images for the local runner. But in fact, we're just looking for the spec file within the specified directory.
Wondering whether it wouldn't better to have generic components defined more explicitly by just passing in the component spec path since the users will anyway have to create a custom component spec for that component:
ComponentOp.from_spec(component_spec_path="")
I think it would be nice to have separate constructors for each component type, Wdyt?
Not yet, I'll have a look.
I partially agree. I agree that we need to make sure it is clear to the user, but on the other hand, there's not really that big of a difference between a custom and a generic component. Both only need a component specification referring to an existing image to work. You could for instance define a custom component as follows and it would work as well:
So while splitting the constructors kind of makes sense logically, it actually creates two ways to achieve the same thing. You could use both constructors to create either a custom or a generic component. component_op(component_dir="components/custom_component")
component_op.from_spec(component_spec_path="components/custom_component/fondant_component.yaml") The only functionality for which we need more than the component specification, is to automatically build the images, which we currently only do in the local runner. That's the only reason why it might make sense to split the constructors, but this doesn't really matter for your pipeline definition, only execution. So I would move this responsibility to the runner and let it decide when to build (based on presence of a So I would select only one of the two interfaces, and then I see the following (dis)advantages:
I don't have a strong preference, but since we have to choose, I would go with: componentOp(component_dir="components/custom_component") and clearly document the expected structure of the
(All the other files are actually don't matter to Happy to hear other opinions. |
966cc5a
to
2f93a47
Compare
Thanks for the detailed explanation and options: I suggest we go for (also not a strong preference) The key will be documenting it very well and maybe expanding the logging with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Robbe! Looks good overall and the documentation is clear.
Just a few minor things to adjust in the documentation
) | ||
``` | ||
|
||
??? "fondant.pipeline.ComponentOp.from_registry" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are those meant to end up here? there are quite a few of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you can see the result in the built documentation:
https://fondant--285.org.readthedocs.build/en/285/components/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks neat ✨
docker image. The specification defines the docker image fondant should run to execute the | ||
component, which data it consumes and produces, and which arguments it takes. | ||
|
||
## Component types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the descriptions, quite clear
docs/components.md
Outdated
from fondant.pipeline import ComponentOp | ||
|
||
component_op = ComponentOp( | ||
component_dir="components/custom_component", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
custom_component-> generic_component
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch
Fixes ml6team#251, ml6team#252, ml6team#253 Before this PR, the operation for a generic component (a reusable component that dynamically takes a component specification), needed to be created as follows: ```python component_op = ComponentOp.from_registry( name="generic_component", component_spec_path="components/custom_generic_component/fondant_component.yaml", ) ``` But the provided name wasn't actually used, since the component specification already contains a reference to the reusable image that it should use. Now we can define both custom and generic components as follows: ```python component_op = ComponentOp( component_dir="components/custom_generic_component", ) ``` There is still a difference in how we want to handle custom and generic components though. Custom components should be built by the local runner, while generic components should not, since they use a reusable image. To make this differentiation, we now simply check if a `Dockerfile` is present in the provided `Component_dir`. This will be the case for a custom component, but not for a generic component. --- This has a nice implicit result for reusable components as well, which can still be defined as: ```python component_op = ComponentOp.from_registry( name="reusable_component", ) ``` Which `fondant` now resolves to: ```python component_op = ComponentOp( component_dir="{fondant_install_dir}/components/custom_generic_component", ) ``` I added a change to this PR which no longer packages the reusable component code with the `fondant` package, but only the component specifications, as those are all that is needed since they contain a reference to the reusable image on the registry. This means that the `component_dir` above doesn't contain a `Dockerfile` when `fondant` is installed from PyPI, but does when you locally install `fondant` using `poetry install`. So the local runner doesn't build reusable components when users install `fondant` from PyPI, but it does when you're working on the `fondant` repo, which is useful for us `fondant` developers. --- The only thing we still need is an option on the runner to provide `build_args`, so we can pass in the `fondant` version to build into the image. But I'll open a separate PR for that.
Fixes #251, #252, #253 Before this PR, the operation for a generic component (a reusable component that dynamically takes a component specification), needed to be created as follows: ```python component_op = ComponentOp.from_registry( name="generic_component", component_spec_path="components/custom_generic_component/fondant_component.yaml", ) ``` But the provided name wasn't actually used, since the component specification already contains a reference to the reusable image that it should use. Now we can define both custom and generic components as follows: ```python component_op = ComponentOp( component_dir="components/custom_generic_component", ) ``` There is still a difference in how we want to handle custom and generic components though. Custom components should be built by the local runner, while generic components should not, since they use a reusable image. To make this differentiation, we now simply check if a `Dockerfile` is present in the provided `Component_dir`. This will be the case for a custom component, but not for a generic component. --- This has a nice implicit result for reusable components as well, which can still be defined as: ```python component_op = ComponentOp.from_registry( name="reusable_component", ) ``` Which `fondant` now resolves to: ```python component_op = ComponentOp( component_dir="{fondant_install_dir}/components/custom_generic_component", ) ``` I added a change to this PR which no longer packages the reusable component code with the `fondant` package, but only the component specifications, as those are all that is needed since they contain a reference to the reusable image on the registry. This means that the `component_dir` above doesn't contain a `Dockerfile` when `fondant` is installed from PyPI, but does when you locally install `fondant` using `poetry install`. So the local runner doesn't build reusable components when users install `fondant` from PyPI, but it does when you're working on the `fondant` repo, which is useful for us `fondant` developers. --- The only thing we still need is an option on the runner to provide `build_args`, so we can pass in the `fondant` version to build into the image. But I'll open a separate PR for that.
Fixes #251, #252, #253
Before this PR, the operation for a generic component (a reusable component that dynamically takes a component specification), needed to be created as follows:
But the provided name wasn't actually used, since the component specification already contains a reference to the reusable image that it should use. Now we can define both custom and generic components as follows:
There is still a difference in how we want to handle custom and generic components though. Custom components should be built by the local runner, while generic components should not, since they use a reusable image. To make this differentiation, we now simply check if a
Dockerfile
is present in the providedComponent_dir
. This will be the case for a custom component, but not for a generic component.This has a nice implicit result for reusable components as well, which can still be defined as:
Which
fondant
now resolves to:I added a change to this PR which no longer packages the reusable component code with the
fondant
package, but only the component specifications, as those are all that is needed since they contain a reference to the reusable image on the registry. This means that thecomponent_dir
above doesn't contain aDockerfile
whenfondant
is installed from PyPI, but does when you locally installfondant
usingpoetry install
. So the local runner doesn't build reusable components when users installfondant
from PyPI, but it does when you're working on thefondant
repo, which is useful for usfondant
developers.The only thing we still need is an option on the runner to provide
build_args
, so we can pass in thefondant
version to build into the image. But I'll open a separate PR for that.