Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First implementation of DockerCompiler #194

Merged
merged 6 commits into from
Jun 14, 2023
Merged

Conversation

GeorgesLorre
Copy link
Collaborator

Known TODO's:

  • support for build in docker compose
  • update documentation
  • update examples to include dockercompiler
  • update pipeline.py and move compile functionality to a KubeflowCompiler

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I Really like the implementation! looks quite clean ✨

Left a few minor comments, excited to see where this is heading :)

"""Compiler that creates a docker-compose spec from a pipeline."""

def compile(self, package_path: str = "docker-compose.yml") -> None:
"""Compile a pipeline to docker-compose spec that is saved on the package_path."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Compile a pipeline to docker-compose spec that is saved on the package_path."""
"""Compile a pipeline to docker-compose spec and save it to a specified package_path."""

"""
base_path = Path(self.pipeline.base_path)
# check if base path is an existing local folder
if base_path.exists():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if the path is local but does not exist? it will assume it's remote In this case

Copy link
Collaborator Author

@GeorgesLorre GeorgesLorre Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, even if it is not remote if it does not exist locally we can't mount it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright but maybe we can improve the error handling or try creating a new local path? now it assumes that if it's local and that it does not exist that it's remote (you assign it the base path)

"""
services = {}
for component_name, component in self.pipeline._graph.items():
safe_component_name = self._safe_component_name(component_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be worth moving the safe name functionality into the pipeline when building the graph since we might need to do the renaming across different runners

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will probably happen once I implement the KubeflowCompiler

command = ["--metadata", json.dumps(metadata)]

# add in and out manifest paths to command
command.extend(["--output_manifest_path", f"{self.path}/manifest.txt"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would maybe give a unique identifier to the manifest or write it within the component folder to avoid overriding it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be ideal if we can reuse the same path logic across compilers. @PhilippeMoussalli I'm having a hard time figuring out where the output_manifest_path is currently built when running with kubeflow. Can you point me to it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The path of the output manifest path is created internally and stored in kubernetes with minio. When deploying kubeflow we specify a bucket and blob storage (in our case soy-audio-379412_kfp-artifacts/artifacts) where kubeflow will write the artifacts. The specific write folder is handled internally within kubeflow and cannot be modified afaik
(e.g soy-audio-379412_kfp-artifacts/artifacts/controlnet-pipeline-4pmng/2023/05/10/controlnet-pipeline-4pmng-1483856016/retrieve-images-output_manifest_path.tgz)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look more into this once I write the KubeflowCompiler which will hopefully clarify this

depends_on = {}
if dependencies:
# there is only an input manifest if the component has dependencies
command.extend(["--input_manifest_path", f"{self.path}/manifest.txt"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre! Really nice to have this. Left some comments 🙂

class Compiler(ABC):
"""Abstract base class for a compiler."""

def __init__(self, pipeline: Pipeline):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the pipeline passed as an initialization argument instead of to the compile method. From a logical perspective, I wouldn't expect a compiler instance to be specific to a single pipeline.

self.pipeline = pipeline

@abstractmethod
def compile(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably needs *args, **kwargs if we change the signature in the subclasses.

class DockerCompiler(Compiler):
"""Compiler that creates a docker-compose spec from a pipeline."""

def compile(self, package_path: str = "docker-compose.yml") -> None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the name package_path come from? i.e. what's the package it refers to?

return {"version": "3.8", "services": services}

def compose_service(
self, component_op: ComponentOp, dependencies: t.List[str]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self, component_op: ComponentOp, dependencies: t.List[str]
self, component_op: ComponentOp, *, dependencies: t.List[str]

"source": str(base_path),
"target": f"/{base_path.stem}",
}
self.path = f"/{base_path.stem}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pylint would throw an attribute-defined-outside-init here :)

It's a bit hard to follow currently. Can we move the _patch_path() call to the _generate_spec method and pass them to compose_service? This is the only place they're used.

So something like:

def _generate_spec(self) -> dict:
    ...
    path, volume = self._patch_path()
    for component_name, component in self.pipeline._graph.items():
        ...
        services[safe_component_name] = self.compose_service(
            component["fondant_component_op"], 
            dependencies=component["dependencies"],
            path=path,
            volume=volume,
        )

command = ["--metadata", json.dumps(metadata)]

# add in and out manifest paths to command
command.extend(["--output_manifest_path", f"{self.path}/manifest.txt"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be ideal if we can reuse the same path logic across compilers. @PhilippeMoussalli I'm having a hard time figuring out where the output_manifest_path is currently built when running with kubeflow. Can you point me to it?

pipeline.add_op(component_op, dependencies=prev_comp)
prev_comp = component_op

pipeline.compile()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit strange that we compile the pipeline here and then pass it to a compiler again. Will this stay the case or will this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should change. I did not touch the current kubeflow compiling. Once we have a dedicated KubefowCompiler we can clean up the pipeline.py code.

assert src.read() == truth.read()


def test_docker_local_path(pipeline, tmp_path_factory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to call the internals in this test and the one below, or can we just call the compile method as you did above and check the output? I think that would be a bit nicer as the tests then don't depend on the internal implementation.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we can, downside is that I need to then store more files to compare against

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

docs/pipeline.md Outdated
compiler.compile()
```

The DockerCompiler will try to see if the `base_path` of the pipeline is local or remote. If local the base_path will be mounted as a bind volume on every service/component.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The DockerCompiler will try to see if the `base_path` of the pipeline is local or remote. If local the base_path will be mounted as a bind volume on every service/component.
The DockerCompiler will try to see if the `base_path` of the pipeline is local or remote. If local the `base_path` will be mounted as a bind volume on every service/component.



@dataclass
class DockerVolume:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you add a docstring explaining the class variables



@dataclass
class MetaData:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Georges :)

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre!




## Compiling a pipeline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably explain when a user should use which compiler.

@GeorgesLorre GeorgesLorre merged commit 86d001f into main Jun 14, 2023
@GeorgesLorre GeorgesLorre deleted the feature/docker-compiler branch June 14, 2023 12:43
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
Known TODO's:

- support for build in docker compose
- update documentation
- update examples to include dockercompiler
- update pipeline.py and move compile functionality to a
KubeflowCompiler
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants