Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): Enable containerizing v2 Python components #6417

Merged
merged 27 commits into from
Oct 12, 2021

Conversation

neuromage
Copy link
Contributor

@neuromage neuromage commented Aug 24, 2021

This PR introduces a new command in KFP's CLI, components, that enables them to manage and build v2 components in a container with Docker.

TODOs

  • Update release notes.
  • Add tests

Example usage journey

Let's say the user has the following directory:

$ tree .
├── my_components
│   ├── preprocess
│   │   └── component.py
│   └── train
│       └── component.py
├── pipeline.py

And in preprocess/component.py, we've defined a components like so:

from kfp.v2.dsl import (
    component,
    InputPath,
    OutputPath,
    Input,
    Output,
    Artifact,
    Dataset,
)

@component(
  base_image='python:3.9',
  target_image='gcr.io/ml-pipeline/custom-component'
  output_component_file='preprocess.yaml')
def preprocess(...):
    ...

By specifying target_image in the @decorator component, this indicates to KFP that the component will run in the container named 'gcr.io/ml-pipeline/custom-component', which needs to be built. In train/component.py, we similarly have:

...
@component(
  base_image='python:3.9',
  target_image='gcr.io/ml-pipeline/custom-component'
  output_component_file='preprocess.yaml')
def train(...):
    ...

Note that this component has the same target_image specified as the preprocess component.

The user can now run the following command to build a container for both of these components:

$ kfp components build components
INFO: Building component using KFP package path: kfp==1.7.1
INFO: Found 2 components in components.py 
...
INFO: Built and pushed component container gcr.io/ml-pipeline/custom-component

After running the command above, a container image called gcr.io/ml-pipeline/custom-component will be built and pushed to the remote repository. The command will also generate a number of files:

$ tree .
my_components
├── Dockerfile
├── component_metadata
│   ├── preprocess.yaml
│   └── train.yaml
├── kfp_config.ini
├── preprocess
│   └── component.py
├── requirements.txt
└── train
    └── component.py

The CLI generates a Dockerfile, requirements.txt and .dockerignore, all of which can be modified by the user. Component YAMLs are also generated in a directory called component_metadata. A file called kfp_config.ini is also generated, which contains a mapping from components to files containing them:

[Components]
preprocess = preprocess/component.py
train = train/component.py

The user can now include the component in their pipeline either by loading the generated YAML files, or directly importing the components.py file. E.g.:

import components.preprocess.component as preprocess_component
import components.train.component as train_component

@dsl.pipeline(...)
def pipeline(...):
  preprocess_task = preprocess_component(...)
  train_task = train_component(...)

Alternately, users can load a component using the YAML definition:

preprocess = kfp.load_component_from_url('my_components/component_metadata/preprocess.yaml')
import components.train.component as train_component

@dsl.pipeline(...)
def pipeline(...):
  preprocess_task = preprocess_component(...)
  train_task = train_component(...)

Most of the changes are around imports and restructuring of the
codebase. While it looks like a lot of code was added, most of the code
already existed and was simply moved or copied over to v2. The only
exceptions are:
- under kfp/v2/components/component_factory.py: some helper functions
  were copied with simplification from _python_op.py
- we no longer strip the `_path` suffix in v2 components.

Note: there is still some duplication of code (particularly between
component_factory.py and _python_op.py), but it's ok for now since we
intend to replace some of this with v2 ComponentSpec + BaseComponent.
sdk/python/kfp/cli/components.py Outdated Show resolved Hide resolved
sdk/python/kfp/cli/components.py Show resolved Hide resolved
sdk/python/kfp/v2/components/utils.py Outdated Show resolved Hide resolved
sdk/python/kfp/cli/components.py Outdated Show resolved Hide resolved
sdk/python/kfp/cli/components.py Outdated Show resolved Hide resolved
sdk/python/kfp/cli/components.py Show resolved Hide resolved
The CLI will search for components in all Python files by default. It
can also search for a specific filepattern (as supported by
pathlib.Path objects).

Also add a bunch of tests.
@neuromage
Copy link
Contributor Author

@chensun thanks for your patience. As discussed offline, moving to a model where we package any component found under a directory. The components will be listed in a config file, which the executor will use at runtime to find and load the component. PTAL.

Copy link
Member

@chensun chensun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @neuromage, this looks great!
I only left a couple nitpicks, otherwise LGTM.

sdk/python/setup.py Show resolved Hide resolved
sdk/RELEASE.md Show resolved Hide resolved
# print('EXCEPTION: ', result.exception)
# print('EC: ', result.exit_code)
# self.assertNotEqual(result.exit_code, 0)
# self.assertIn("A target_image must be specified", result.stdout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: seems like a valid test we should keep?

sdk/python/kfp/cli/components_test.py Outdated Show resolved Hide resolved
@chensun
Copy link
Member

chensun commented Oct 12, 2021

/test kubeflow-pipelines-samples-v2

@chensun
Copy link
Member

chensun commented Oct 12, 2021

/lgtm
/approve

Thanks!

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: chensun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants