Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor local runner and add run command #234

Merged
merged 7 commits into from
Jun 28, 2023
Merged

Conversation

GeorgesLorre
Copy link
Collaborator

No description provided.

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre!

Can you explain a bit why you switched to this approach? One downside I see is that we're no longer building the parser hierarchy up front, which might not allow for cli autocompletion (to be checked).

fondant/cli.py Outdated
"""Check if the next command is help and print help if it is.
Otherwise parse the known args.
"""
if input_args[0] in ["help", "-h", "--help"]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we rely on the built-in behavior of argparse here?

Copy link
Collaborator Author

@GeorgesLorre GeorgesLorre Jun 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parse_known_args() will always consume the help. So if you do fondant compile local --help you will get the help page for just fondant because the first parse_know_args() will consume it.

fondant/cli.py Outdated
parser = argparse.ArgumentParser(add_help=False)

parser.add_argument(
"mode",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's a more logical user interface if this is a flag instead of a subcommand.

So:

fondant compile <pipeline> --local

instead of

fondant compile local <pipeline>

I think this can be achieved in the current implementation. We might just have to lift the pipeline parameter from compile_{mode} to this function, which I think makes sense.

fondant/cli.py Outdated
parser = argparse.ArgumentParser(add_help=False)

parser.add_argument(
"mode",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here as above.

@GeorgesLorre
Copy link
Collaborator Author

Thanks @GeorgesLorre!

Can you explain a bit why you switched to this approach? One downside I see is that we're no longer building the parser hierarchy up front, which might not allow for cli autocompletion (to be checked).

Because of multiple subcommand levels fondant(1) compile(2) local(3) (see our chat). But we could change it back and build the hierarchy up front.

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the --local and --kubeflow flags instead of subcommands as the user interface. But can we now still define separate flags for each of them?

def run(args):
if args.local:
try:
pipeline = pipeline_from_string(args.ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only need the try except around this line I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could move the logic around but I believe this is the clearest. Try to see if it is an importstring if it is then compile it otherwise assume it is a reference to an already compiled pipeline.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use

try:
    ...
except:
    ...
else:
    ...
finally:
    ...

fondant/cli.py Outdated
sys.path.append(".")
args = cli.parse_args()

args = parser.parse_args(sys.argv[1:])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args = parser.parse_args(sys.argv[1:])
args = parser.parse_args()

I don't think explicitly providing the arguments is necessary.

README.md Outdated
We have compiled some references and created some scripts to [get you started](https://fondant.readthedocs.io/en/latest/infrastructure) with setting up the required infrastructure.
There are 2 ways of using fondant:

- Leveraging [kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any kubernetes cluster. All fondant needs is an url pointing to the kubeflow pipeline host and a Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Leveraging [kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any kubernetes cluster. All fondant needs is an url pointing to the kubeflow pipeline host and a Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.
- Leveraging [kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any kubernetes cluster. All that fondant needs is an url pointing to the kubeflow pipeline host and an Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.

README.md Outdated
- Or locally by using [docker compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale.


The same pipeline can be used in both variant allowing you to quickly develop and iterate using the local docker compose implementation and then using the power of kubeflow pipelines to run a large scale pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The same pipeline can be used in both variant allowing you to quickly develop and iterate using the local docker compose implementation and then using the power of kubeflow pipelines to run a large scale pipeline.
The same pipeline can be used in both variants, allowing you to quickly develop and iterate using the local docker compose implementation and then using the power of kubeflow pipelines to run a large scale pipeline.

README.md Outdated

### Running your pipeline

Once you have a pipeline you can easily run (and compile it) using the builtin CLI:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Once you have a pipeline you can easily run (and compile it) using the builtin CLI:
Once you have a pipeline you can easily run (and compile) it by using the built-in CLI:

README.md Outdated
Once you have a pipeline you can easily run (and compile it) using the builtin CLI:

```bash
fondant run foo/bar.py:pipeline --local
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for the --local, should we add both?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kubeflow command is there but raises NotImplementedError's

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @GeorgesLorre! 2 small comments.

README.md Outdated
Once you have a pipeline you can easily run (and compile it) using the builtin CLI:

```bash
fondant run foo/bar.py:pipeline --local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fondant run foo/bar.py:pipeline --local
fondant run foo.bar:pipeline --local

You should pass the file as a module, right?

def run(args):
if args.local:
try:
pipeline = pipeline_from_string(args.ref)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump

README.md Outdated
We have compiled some references and created some scripts to [get you started](https://fondant.readthedocs.io/en/latest/infrastructure) with setting up the required infrastructure.
There are 2 ways of using fondant:

- Leveraging [kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any kubernetes cluster. All fondant needs is an url pointing to the kubeflow pipeline host and a Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Leveraging [kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any kubernetes cluster. All fondant needs is an url pointing to the kubeflow pipeline host and a Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.
- Leveraging [Kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any Kubernetes cluster. All Fondant needs is a url pointing to the Kubeflow pipeline host and an Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.


- Leveraging [kubeflow pipelines](https://www.kubeflow.org/docs/components/pipelines/v1/introduction/) on any kubernetes cluster. All fondant needs is an url pointing to the kubeflow pipeline host and a Object Storage provider (S3, GCS, etc) to store data produced in the pipeline between steps.
We have compiled some references and created some scripts to [get you started](https://fondant.readthedocs.io/en/latest/infrastructure) with setting up the required infrastructure.
- Or locally by using [docker compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Or locally by using [docker compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale.
- Or locally by using [Docker Compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop Fondant pipelines and components faster by making it easier to run things on a smaller scale.

README.md Outdated
- Or locally by using [docker compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale.


The same pipeline can be used in both variant allowing you to quickly develop and iterate using the local docker compose implementation and then using the power of kubeflow pipelines to run a large scale pipeline.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The same pipeline can be used in both variant allowing you to quickly develop and iterate using the local docker compose implementation and then using the power of kubeflow pipelines to run a large scale pipeline.
The same pipeline can be used in both variant allowing you to quickly develop and iterate using the local Docker Compose implementation and then using the power of Kubeflow pipelines to run a large scale pipeline.

@RobbeSneyders RobbeSneyders changed the title Feature/local runner Refactor local runner and add run command Jun 28, 2023
README.md Outdated
Once you have a pipeline you can easily run (and compile) it by using the built-in CLI:

```bash
fondant run foo.bar.py:pipeline --local
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fondant run foo.bar.py:pipeline --local
fondant run foo.bar:pipeline --local

The .py should also be removed.

"""Run a docker-compose spec."""
cmd = ["docker", "compose", "-f", input_spec, "up", "--build"]

subprocess.call(cmd, stdout=subprocess.PIPE) # nosec
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
subprocess.call(cmd, stdout=subprocess.PIPE) # nosec
subprocess.call(cmd) # nosec

As discussed, this intercepts the build logs and docker compose log colors.

@GeorgesLorre GeorgesLorre merged commit 01cb057 into main Jun 28, 2023
@GeorgesLorre GeorgesLorre deleted the feature/local-runner branch June 28, 2023 10:05
Hakimovich99 pushed a commit that referenced this pull request Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants