-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update vertex and general docs #526
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @PhilippeMoussalli, nice improvements!
README.md
Outdated
- Or locally by using [docker compose](https://docs.docker.com/compose/). This way is mainly aimed at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale. | ||
- [**Local runner**](https://github.com/ml6team/fondant/blob/main/docs/pipeline.md#local-runner): leverages [docker compose](https://docs.docker.com/compose/). The local runner is mainly aimed | ||
at helping you develop fondant pipelines and components faster by making it easier to run things on a smaller scale | ||
and iterate quickly on your pipeline. Once you have a pipeline developed, you can use the other runners mentioned below |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mention that you can easily use it on a VM as well as that's a workflow we've noticed people using.
README.md
Outdated
@@ -308,7 +308,7 @@ speed up your data preparation work. | |||
- Data lineage and experiment tracking | |||
- Distributed execution, both on and off cluster | |||
- Support other dataframe libraries such as HF Datasets, Polars, Spark | |||
- Move reusable components into a decentralized component registry |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why was this removed? This is still on the roadmap :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't this the docker hub?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's just for the images. This is about something like https://llamahub.ai/.
docs/pipeline.md
Outdated
|
||
## Setting Custom partitioning parameters | ||
This local runner is mainly aimed at local development and quick iterations, there is no scaling so using small slices of your data is advised. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does scale, right. Especially on a larger VM. I would mention that it only scales to the machine you're running on. And that switching to Vertex / KfP has advantages that you can choose hardware per component, get better monitoring, reproducibility, etc.
docs/pipeline.md
Outdated
|
||
**2) Repartitioning the Written DataFrame:** The written dataframe is also repartitioned into | ||
smaller sizes (default 250MB) to enable the next component to load these partitions into memory. | ||
In order to compile your pipeline to a `docker-compose` spec you need to import the `DockerCompiler` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document the Python API here? I would focus on using the CLI, and including it here might make it unclear if they need to add this to their code somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would even just document the run
command, not the compile
command.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for the other runners below.
docs/infrastructure.md
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we update this page to mention Vertex and the LocalRunner at the top? And remove that Fondant is built on KfP :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this page should just be specific on setting up the infra required for KFP. Since Vertex and docker don't really require an infrastructure, I don't see what we can add here. All the info on setting them up is in the pipeline page
6a90360
to
ee4e486
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
PR to add documentation Vertex + slight modification on removing outdated docs and some restructuring