-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sdk): Enable containerizing v2 Python components #6417
feat(sdk): Enable containerizing v2 Python components #6417
Conversation
Most of the changes are around imports and restructuring of the codebase. While it looks like a lot of code was added, most of the code already existed and was simply moved or copied over to v2. The only exceptions are: - under kfp/v2/components/component_factory.py: some helper functions were copied with simplification from _python_op.py - we no longer strip the `_path` suffix in v2 components. Note: there is still some duplication of code (particularly between component_factory.py and _python_op.py), but it's ok for now since we intend to replace some of this with v2 ComponentSpec + BaseComponent.
The CLI will search for components in all Python files by default. It can also search for a specific filepattern (as supported by pathlib.Path objects). Also add a bunch of tests.
@chensun thanks for your patience. As discussed offline, moving to a model where we package any component found under a directory. The components will be listed in a config file, which the executor will use at runtime to find and load the component. PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @neuromage, this looks great!
I only left a couple nitpicks, otherwise LGTM.
# print('EXCEPTION: ', result.exception) | ||
# print('EC: ', result.exit_code) | ||
# self.assertNotEqual(result.exit_code, 0) | ||
# self.assertIn("A target_image must be specified", result.stdout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: seems like a valid test we should keep?
/test kubeflow-pipelines-samples-v2 |
/lgtm Thanks! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: chensun The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
This PR introduces a new command in KFP's CLI,
components
, that enables them to manage and build v2 components in a container with Docker.TODOs
Example usage journey
Let's say the user has the following directory:
$ tree . ├── my_components │ ├── preprocess │ │ └── component.py │ └── train │ └── component.py ├── pipeline.py
And in
preprocess/component.py
, we've defined a components like so:By specifying
target_image
in the@decorator
component, this indicates to KFP that the component will run in the container named'gcr.io/ml-pipeline/custom-component'
, which needs to be built. Intrain/component.py
, we similarly have:Note that this component has the same
target_image
specified as thepreprocess
component.The user can now run the following command to build a container for both of these components:
$ kfp components build components INFO: Building component using KFP package path: kfp==1.7.1 INFO: Found 2 components in components.py ... INFO: Built and pushed component container gcr.io/ml-pipeline/custom-component
After running the command above, a container image called
gcr.io/ml-pipeline/custom-component
will be built and pushed to the remote repository. The command will also generate a number of files:$ tree . my_components ├── Dockerfile ├── component_metadata │ ├── preprocess.yaml │ └── train.yaml ├── kfp_config.ini ├── preprocess │ └── component.py ├── requirements.txt └── train └── component.py
The CLI generates a
Dockerfile
,requirements.txt
and.dockerignore
, all of which can be modified by the user. Component YAMLs are also generated in a directory calledcomponent_metadata
. A file calledkfp_config.ini
is also generated, which contains a mapping from components to files containing them:The user can now include the component in their pipeline either by loading the generated YAML files, or directly importing the
components.py
file. E.g.:Alternately, users can load a component using the YAML definition: