Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataComp] Update pipeline name, remove DockerCompiler #340

Merged
merged 7 commits into from
Aug 8, 2023

Conversation

NielsRogge
Copy link
Contributor

@NielsRogge NielsRogge commented Aug 8, 2023

This PR updates the pipeline name of DataComp, and makes sure it can run fine both locally and on GCP.

Switching between local vs GCP is currently done as follows:

  • I need to manually comment out the base path I don't want to use
  • it's either running fondant run pipeline:pipeline --local or python pipeline.py
Screenshot 2023-08-08 at 13 03 05

@@ -67,15 +66,7 @@
pipeline.add_op(load_from_hub_op)
pipeline.add_op(filter_image_resolution_op, dependencies=load_from_hub_op)
pipeline.add_op(filter_complexity_op, dependencies=filter_image_resolution_op)
pipeline.add_op(cluster_image_embeddings_op, dependencies=filter_complexity_op)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cluster image embeddings component won't be used for now, which is why it is removed here.

@PhilippeMoussalli
Copy link
Contributor

Switching between local vs GCP is currently done as follows:

  • I need to manually comment out the base path I don't want to use

Isn't this just changing the base path in PipelineConfigs? you can use a local base path or a remote one for the local runner. For the remote runner, it's always the remote one.

  • it's either running fondant run pipeline:pipeline --local or python pipeline.py

I think you can keep the interface and code of the pipeline as is, for now only fondant run pipeline:pipeline --local running the pipeline with kubeflow using a fondant run should be soon available with this PR #291 (@GeorgesLorre can you confirm?).

@NielsRogge
Copy link
Contributor Author

Isn't this just changing the base path in PipelineConfigs? you can use a local base path or a remote one for the local runner. For the remote runner, it's always the remote one.

Yes but as I'm switching regularly between both, it's handy to specify both a local and a remote base path rather than having to type it everytime. cc @GeorgesLorre

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @NielsRogge!

examples/pipelines/datacomp/pipeline.py Outdated Show resolved Hide resolved
examples/pipelines/datacomp/pipeline.py Outdated Show resolved Hide resolved
@NielsRogge
Copy link
Contributor Author

Feel free to approve :)

@GeorgesLorre
Copy link
Collaborator

GeorgesLorre commented Aug 8, 2023

Isn't this just changing the base path in PipelineConfigs? you can use a local base path or a remote one for the local runner. For the remote runner, it's always the remote one.

Yes but as I'm switching regularly between both, it's handy to specify both a local and a remote base path rather than having to type it everytime. cc @GeorgesLorre

I'm working on making the interface for docker and kfp more similar (TBD).

But you can alter your pipeline.py already to make this easier:

if __name__ == "__main__":
    # assuming kfp run
    client = Client(host=PipelineConfigs.HOST)
    # override the base_path
    pipeline.base_path = "some/remote/path"
    client.compile_and_run(pipeline=pipeline)

So if you call your pipeline.py it will assume it is a kfp run and override the base_path and if you use fondant run pipeline:pipeline --local it should not run the code under the if statement.

@NielsRogge NielsRogge merged commit 9fcc994 into main Aug 8, 2023
5 checks passed
@NielsRogge NielsRogge deleted the fix_datacomp branch August 8, 2023 14:51
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
This PR updates the pipeline name of DataComp, and makes sure it can run
fine both locally and on GCP.

Switching between local vs GCP is currently done as follows:
- I need to manually comment out the base path I don't want to use
- it's either running `fondant run pipeline:pipeline --local` or `python
pipeline.py`

<img width="391" alt="Screenshot 2023-08-08 at 13 03 05"
src="https://github.com/ml6team/fondant/assets/48327001/3fc14f9a-23be-4d0b-bb39-022ae42e69a3">
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants