Releases: ml6team/fondant
0.12.1
0.12.0
⚡️ Introducing the dataset-first interface
We have removed the pipeline interface and redesigned the dataset class. Datasets can still be built using load components as before. Now, you have to use the Dataset
class instead of the Pipeline
.
from fondant.dataset import Dataset
dataset = Dataset.create(
"load_from_parquet",
arguments={
...
},
)
dataset = dataset.apply(...)
Additionally, we now support initializing datasets from previous workflow runs, which allows you to share your Fondant datasets. Datasets can be initialized using manifests. To share a dataset, you can easily share manifest files.
from fondant.dataset import Dataset
dataset = Dataset.read("gs://.../manifest.json")
dataset = dataset.apply(...)
🛠️ Working directory
Since the pipeline doesn’t exist anymore, we added a new cli command to define a working directory. In the working directory all the workflow related artifacts will be stored.
fondant run local dataset --working-directory ./data
Fondant pipelines created with previous Fondant versions are no longer compatible with >=0.12.0. To migrate your existing pipelines, initialize your dataset using Dataset.create(...)
instead of Pipeline.read(...)
and use the former base_path
as the working directory when you materialize your dataset.
What's Changed
- Refactor pipeline interface by @mrchtr in #901
- Update dataset documentation by @mrchtr in #918
- Remove pipeline references by @mrchtr in #923
- Update documentation dataset first interface by @mrchtr in #921
- Empty produces leading into list index out of range by @mrchtr in #924
- Remove working directory from user arguments by @mrchtr in #925
- Fix navigation documentation by @mrchtr in #926
- Fix link in the README file by @Philmod in #930
- Update readme with dataset focus by @GeorgesLorre in #928
- Mount absolute path of working dir to local runner by @mrchtr in #931
- Fixing cicd by @mrchtr in #929
- Fix arch link in readme by @GeorgesLorre in #933
- Set session duration to 5h in prep release pipeline by @mrchtr in #934
New Contributors
Full Changelog: 0.11.2...0.12.0
0.12.dev0
What's Changed
- Refactor pipeline interface by @mrchtr in #901
- Update dataset documentation by @mrchtr in #918
- Remove pipeline references by @mrchtr in #923
- Update documentation dataset first interface by @mrchtr in #921
- Empty produces leading into list index out of range by @mrchtr in #924
- Remove working directory from user arguments by @mrchtr in #925
- Fix navigation documentation by @mrchtr in #926
Full Changelog: 0.11.2...0.12.dev0
0.11.2
What's Changed
- Bug fix for retrieve from faiss by prompt by @mrchtr in #914
- Skip transformation of partition if partition is empty by @mrchtr in #908
- Edit prep release pipeline - refresh aws token by @mrchtr in #919
- Revert changes in release pipeline by @mrchtr in #922
Full Changelog: 0.11.1...0.11.2
0.11.1
What's Changed
- Don't run build action on every PR by @RobbeSneyders in #898
- Add datacomp CLIP index announcement by @RobbeSneyders in #897
- Fix list formatting in CLIP announcement by @RobbeSneyders in #899
- Update documentation lightweight_components by @mrchtr in #903
- Add string as fallback index type when writing data by @RobbeSneyders in #904
- Add resource requirements to the retrieve from faiss component by @mrchtr in #905
Full Changelog: 0.11.0...0.11.1
0.11.0
What's Changed
- Add docker default platform to data explorer by @mrchtr in #841
- Fix exception when invoke consumes with invalid field schema by @mrchtr in #842
- Update readme index weaviate component by @mrchtr in #843
- Create local artifact directory if it does not exist by @mrchtr in #847
- Remove RAG use case custom components by @mrchtr in #848
- Make Fondant installable via git by @RobbeSneyders in #849
- Fix hub generation with new components location by @RobbeSneyders in #851
- Move Dask Client configuration to Component class and use multi-GPU in
embed_images
component by @RobbeSneyders in #852 - Update component dir in build script by @RobbeSneyders in #856
- Make unique index sorted by @RobbeSneyders in #855
- Validate docker versions by @GeorgesLorre in #854
- Infer consume operation if not present in dataset interface by @mrchtr in #859
- Change dask_client to general setup method by @RobbeSneyders in #861
- Add
gpu
extra withdask-cuda
and bump minimum Python version to 3.9 by @RobbeSneyders in #862 - Write metadata file by @RobbeSneyders in #864
- Update component directory in tag component script by @RobbeSneyders in #866
- Add pre- and post-build script to work around Poetry bug by @RobbeSneyders in #868
- Catch Dask client shutdown error by @RobbeSneyders in #869
- Move convert-string into component setup method by @RobbeSneyders in #871
- Specify schema when writing to parquet in write_to_file component by @RobbeSneyders in #873
- Don't write metadata file by @RobbeSneyders in #875
- Allow more supported docker versions by @RobbeSneyders in #878
- Fix docs build for Poetry >=1.8 by @RobbeSneyders in #881
- Update Fondant version in load_from_hub component by @RobbeSneyders in #880
- Add test for consume name to name mapping by @mrchtr in #867
- Add struct to types by @PhilippeMoussalli in #879
- Bugfix remote file detection by @PhilippeMoussalli in #883
- Fix default lightweight component by @PhilippeMoussalli in #884
- Enable timezone type for timestamp pyarrow type by @PhilippeMoussalli in #888
- Add dask and docker as default dependencies by @GeorgesLorre in #893
- Add image retrieval from FAISS index by @mrchtr in #876
- Implement non-blocking to_parquet by @RobbeSneyders in #892
- Add retrieve from faiss by embedding component by @mrchtr in #894
- Remove docker image after build and push from local device by @mrchtr in #895
- Temporarily disabled exposing the Dask diagnostic dashboard by @mrchtr in #872
- Align retrieve_from_faiss_by_prompt component name by @RobbeSneyders in #896
Full Changelog: 0.10.1...0.11.0
0.11.dev5
What's Changed
- Allow more supported docker versions by @RobbeSneyders in #878
- Fix docs build for Poetry >=1.8 by @RobbeSneyders in #881
- Update Fondant version in load_from_hub component by @RobbeSneyders in #880
- Add test for consume name to name mapping by @mrchtr in #867
- Add struct to types by @PhilippeMoussalli in #879
- Bugfix remote file detection by @PhilippeMoussalli in #883
- Fix default lightweight component by @PhilippeMoussalli in #884
- Enable timezone type for timestamp pyarrow type by @PhilippeMoussalli in #888
Full Changelog: 0.11.dev4...0.11.dev5
0.11.dev4
What's Changed
- Specify schema when writing to parquet in write_to_file component by @RobbeSneyders in #873
- Don't write metadata file by @RobbeSneyders in #875
Full Changelog: 0.11.dev3...0.11.dev4
0.11.dev3
What's Changed
- Move convert-string into component setup method by @RobbeSneyders in #871
Full Changelog: 0.11.dev2...0.11.dev3
0.11.dev2
What's Changed
- Catch Dask client shutdown error by @RobbeSneyders in #869
Full Changelog: 0.11.dev1...0.11.dev2