-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First implementation of DockerCompiler #194
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I Really like the implementation! looks quite clean ✨
Left a few minor comments, excited to see where this is heading :)
fondant/compiler.py
Outdated
"""Compiler that creates a docker-compose spec from a pipeline.""" | ||
|
||
def compile(self, package_path: str = "docker-compose.yml") -> None: | ||
"""Compile a pipeline to docker-compose spec that is saved on the package_path.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Compile a pipeline to docker-compose spec that is saved on the package_path.""" | |
"""Compile a pipeline to docker-compose spec and save it to a specified package_path.""" |
fondant/compiler.py
Outdated
""" | ||
base_path = Path(self.pipeline.base_path) | ||
# check if base path is an existing local folder | ||
if base_path.exists(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if the path is local but does not exist? it will assume it's remote In this case
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, even if it is not remote if it does not exist locally we can't mount it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright but maybe we can improve the error handling or try creating a new local path? now it assumes that if it's local and that it does not exist that it's remote (you assign it the base path)
""" | ||
services = {} | ||
for component_name, component in self.pipeline._graph.items(): | ||
safe_component_name = self._safe_component_name(component_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth moving the safe name functionality into the pipeline when building the graph since we might need to do the renaming across different runners
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will probably happen once I implement the KubeflowCompiler
fondant/compiler.py
Outdated
command = ["--metadata", json.dumps(metadata)] | ||
|
||
# add in and out manifest paths to command | ||
command.extend(["--output_manifest_path", f"{self.path}/manifest.txt"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would maybe give a unique identifier to the manifest or write it within the component folder to avoid overriding it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be ideal if we can reuse the same path logic across compilers. @PhilippeMoussalli I'm having a hard time figuring out where the output_manifest_path
is currently built when running with kubeflow. Can you point me to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The path of the output manifest path is created internally and stored in kubernetes with minio. When deploying kubeflow we specify a bucket and blob storage (in our case soy-audio-379412_kfp-artifacts/artifacts
) where kubeflow will write the artifacts. The specific write folder is handled internally within kubeflow and cannot be modified afaik
(e.g soy-audio-379412_kfp-artifacts/artifacts/controlnet-pipeline-4pmng/2023/05/10/controlnet-pipeline-4pmng-1483856016/retrieve-images-output_manifest_path.tgz
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look more into this once I write the KubeflowCompiler
which will hopefully clarify this
fondant/compiler.py
Outdated
depends_on = {} | ||
if dependencies: | ||
# there is only an input manifest if the component has dependencies | ||
command.extend(["--input_manifest_path", f"{self.path}/manifest.txt"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgesLorre! Really nice to have this. Left some comments 🙂
fondant/compiler.py
Outdated
class Compiler(ABC): | ||
"""Abstract base class for a compiler.""" | ||
|
||
def __init__(self, pipeline: Pipeline): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the pipeline passed as an initialization argument instead of to the compile
method. From a logical perspective, I wouldn't expect a compiler instance to be specific to a single pipeline.
fondant/compiler.py
Outdated
self.pipeline = pipeline | ||
|
||
@abstractmethod | ||
def compile(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably needs *args, **kwargs
if we change the signature in the subclasses.
fondant/compiler.py
Outdated
class DockerCompiler(Compiler): | ||
"""Compiler that creates a docker-compose spec from a pipeline.""" | ||
|
||
def compile(self, package_path: str = "docker-compose.yml") -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does the name package_path
come from? i.e. what's the package it refers to?
fondant/compiler.py
Outdated
return {"version": "3.8", "services": services} | ||
|
||
def compose_service( | ||
self, component_op: ComponentOp, dependencies: t.List[str] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self, component_op: ComponentOp, dependencies: t.List[str] | |
self, component_op: ComponentOp, *, dependencies: t.List[str] |
fondant/compiler.py
Outdated
"source": str(base_path), | ||
"target": f"/{base_path.stem}", | ||
} | ||
self.path = f"/{base_path.stem}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint would throw an attribute-defined-outside-init
here :)
It's a bit hard to follow currently. Can we move the _patch_path()
call to the _generate_spec
method and pass them to compose_service
? This is the only place they're used.
So something like:
def _generate_spec(self) -> dict:
...
path, volume = self._patch_path()
for component_name, component in self.pipeline._graph.items():
...
services[safe_component_name] = self.compose_service(
component["fondant_component_op"],
dependencies=component["dependencies"],
path=path,
volume=volume,
)
fondant/compiler.py
Outdated
command = ["--metadata", json.dumps(metadata)] | ||
|
||
# add in and out manifest paths to command | ||
command.extend(["--output_manifest_path", f"{self.path}/manifest.txt"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be ideal if we can reuse the same path logic across compilers. @PhilippeMoussalli I'm having a hard time figuring out where the output_manifest_path
is currently built when running with kubeflow. Can you point me to it?
pipeline.add_op(component_op, dependencies=prev_comp) | ||
prev_comp = component_op | ||
|
||
pipeline.compile() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit strange that we compile the pipeline here and then pass it to a compiler again. Will this stay the case or will this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should change. I did not touch the current kubeflow compiling. Once we have a dedicated KubefowCompiler we can clean up the pipeline.py code.
assert src.read() == truth.read() | ||
|
||
|
||
def test_docker_local_path(pipeline, tmp_path_factory): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to call the internals in this test and the one below, or can we just call the compile method as you did above and check the output? I think that would be a bit nicer as the tests then don't depend on the internal implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we can, downside is that I need to then store more files to compare against
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
docs/pipeline.md
Outdated
compiler.compile() | ||
``` | ||
|
||
The DockerCompiler will try to see if the `base_path` of the pipeline is local or remote. If local the base_path will be mounted as a bind volume on every service/component. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DockerCompiler will try to see if the `base_path` of the pipeline is local or remote. If local the base_path will be mounted as a bind volume on every service/component. | |
The DockerCompiler will try to see if the `base_path` of the pipeline is local or remote. If local the `base_path` will be mounted as a bind volume on every service/component. |
|
||
|
||
@dataclass | ||
class DockerVolume: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add a docstring explaining the class variables
|
||
|
||
@dataclass | ||
class MetaData: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Georges :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @GeorgesLorre!
|
||
|
||
|
||
## Compiling a pipeline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably explain when a user should use which compiler.
Known TODO's: - support for build in docker compose - update documentation - update examples to include dockercompiler - update pipeline.py and move compile functionality to a KubeflowCompiler
Known TODO's: