Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update vertex and general docs #526

Merged
merged 2 commits into from
Oct 18, 2023
Merged

Update vertex and general docs #526

merged 2 commits into from
Oct 18, 2023

Conversation

PhilippeMoussalli
Copy link
Contributor

PR to add documentation Vertex + slight modification on removing outdated docs and some restructuring

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @PhilippeMoussalli, nice improvements!

README.md Outdated
- Or locally by using [docker compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale.
- [**Local runner**](https://github.com/ml6team/fondant/blob/main/docs/pipeline.md#local-runner): leverages [docker compose](https://docs.docker.com/compose/). The local runner is mainly aimed
at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale
and iterate quickly on your pipeline. Once you have a pipeline developed, you can use the other runners mentioned below
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mention that you can easily use it on a VM as well as that's a workflow we've noticed people using.

README.md Outdated
@@ -308,7 +308,7 @@ speed up your data preparation work.
- Data lineage and experiment tracking
- Distributed execution, both on and off cluster
- Support other dataframe libraries such as HF Datasets, Polars, Spark
- Move reusable components into a decentralized component registry
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why was this removed? This is still on the roadmap :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't this the docker hub?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that's just for the images. This is about something like https://llamahub.ai/.

docs/pipeline.md Outdated

## Setting Custom partitioning parameters
This local runner is mainly aimed at local development and quick iterations, there is no scaling so using small slices of your data is advised.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does scale, right. Especially on a larger VM. I would mention that it only scales to the machine you're running on. And that switching to Vertex / KfP has advantages that you can choose hardware per component, get better monitoring, reproducibility, etc.

docs/pipeline.md Outdated

**2) Repartitioning the Written DataFrame:** The written dataframe is also repartitioned into
smaller sizes (default 250MB) to enable the next component to load these partitions into memory.
In order to compile your pipeline to a `docker-compose` spec you need to import the `DockerCompiler`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we document the Python API here? I would focus on using the CLI, and including it here might make it unclear if they need to add this to their code somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even just document the run command, not the compile command.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for the other runners below.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we update this page to mention Vertex and the LocalRunner at the top? And remove that Fondant is built on KfP :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this page should just be specific on setting up the infra required for KFP. Since Vertex and docker don't really require an infrastructure, I don't see what we can add here. All the info on setting them up is in the pipeline page

Copy link
Member

@RobbeSneyders RobbeSneyders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@RobbeSneyders RobbeSneyders merged commit 5cc8c24 into main Oct 18, 2023
@RobbeSneyders RobbeSneyders deleted the add-vertex-docs branch October 18, 2023 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants