Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update docs with the new CLI commands #370

Merged
merged 2 commits into from
Aug 18, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -309,3 +309,10 @@ fondant explore --data-directory "path/to/your/data"
```

Note that if you use a remote path (S3, GCS) you can also pass credentials using the `--credentials` flag. For all the options of the data explorer run `fondant explore --help`.



## Running at scale

You can find more information on how to configure and run your pipeline on different runners [here](pipeline.md)

69 changes: 64 additions & 5 deletions docs/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,15 +115,67 @@ where processing one row significantly increases the number of rows in the datas
By setting a lower value for input partition rows, you can mitigate issues where the processed data
grows larger than the available memory before being written to disk.

## Compiling a pipeline
## Compiling and Running a pipeline

Once all your components are added to your pipeline you can use different compilers to run your pipeline:

!!! note "IMPORTANT"
Copy link
Contributor

@PhilippeMoussalli PhilippeMoussalli Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will render the text as a highlighted and indented blockquote, making it stand out as an important note.

IMPORTANT

Currently Fondant supports linear DAGs with single dependencies. Support for non-linear DAGs will be available in future releases.

It's done like this

> **IMPORTANT**
> 
> Currently Fondant supports linear DAGs with single dependencies. Support for non-linear DAGs will be available in future releases.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my bad, I guess it's only in the github readme where they're not properly rendered

When using other runners you will need to make sure that your new environment has access to:
- The base_path of your pipeline (can be storage bucket like S3, GCS, etc)
- The images used in your pipeline (make sure you have access to the registries where the images are stored)

### Kubeflow
TODO: update this once kubeflow compiler is implemented

~~Once the pipeline is built, you need to initialize the client with the kubeflow host path (more info about the host path can be found in the [infrastructure documentation](https://github.com/ml6team/fondant/blob/main/docs/infrastructure.md))
and use it to compile and run the pipeline with the `compile_and_run()` method. This performs static checking to ensure that all required arguments are provided to the components and that the required input data subsets are available. If the checks pass, a URL will be provided, allowing you to visualize and monitor the execution of your pipeline.~~
The Kubeflow compiler will take your pipeline and compile it to a Kubeflow pipeline spec. This spec can be used to run your pipeline on a Kubeflow cluster. There are 2 ways to compile your pipeline to a Kubeflow spec:

- Using the CLI:
```bash
fondant compile <pipeline_ref> --kubeflow --output <path_to_output>
```

- Using the compiler directly:
```python
from fondant.compiler import KubeFlowCompiler


pipeline = ...

compiler = KubeFlowCompiler()
compiler.compile(pipeline=pipeline, output_path="pipeline.yaml")
```

Both of these options will produce a kubeflow specification as a file, if you also want to immediately start a run you can also use the runner we provide (see below).

### Running a Kubeflow compiled pipeline

You will need a Kubeflow cluster to run your pipeline on and specify the host of that cluster. More info on setting up a Kubeflow pipelines deployment and the host path can be found in the [infrastructure documentation](infrastructure.md).

There are 2 ways to run a Kubeflow compiled pipeline:

- Using the CLI:
```bash
fondant run <pipeline_ref> --kubeflow --host <kubeflow_host>
```
NOTE: that the pipeline ref is the path to the compiled pipeline spec OR a reference to an fondant pipeline in which case the compiler will compile the pipeline first before running.


- Using the compiler directly:
```python
from fondant.compiler import KubeFlowCompiler
from fondant.runner import KubeflowRunner

# Your pipeline definition here

if __name__ == "__main__":
    compiler = KubeFlowCompiler()
    compiler.compile(pipeline=pipeline, output_path="pipeline.yaml")
    runner = KubeflowRunner(
        host="YOUR KUBEFLOW HOST",
    )
    runner.run(input_spec="pipeline.yaml")
```

Once your pipeline is running you can monitor it using the Kubeflow UI.

### Docker-Compose

Expand Down Expand Up @@ -188,4 +240,11 @@ Navigate to the folder where your docker compose is located and run (you need to
docker compose up
```

This will start the pipeline and provide logs per component(service)
Or you can use the fondant cli to run the pipeline:
```bash
fondant run <pipeline_ref> --local
```

NOTE: that the pipeline ref is the path to the compiled pipeline spec OR a reference to an fondant pipeline in which case the compiler will compile the pipeline first before running.

This will start the pipeline and provide logs per component(service).
Loading