-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DataComp pipeline] Add first 2 components #223
Conversation
e6d13c2
to
fe4f8ef
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @NielsRogge!
The pipeline is failing on pre-commit. If you haven't yet, you can run pre-commit install
so it automatically runs on every commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments. Can you address them and then merge the PR so you can open new PRs for additional components?
examples/pipelines/datacomp/components/filter_text_complexity/src/main.py
Outdated
Show resolved
Hide resolved
Local pipeline is running fine until writing of the data of the second component, where it gives:
=> the |
Yes that's correct. The I think I know a way around this though. will open a PR. |
Ok, I guess we should wait to merge this PR before that is resolved? Else we can merge with the column being dropped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge now and I'll update it in my PR.
This PR filters out data from the pandas dataframe returned by the user that is not defined in the component spec. Previously, returning additional columns would raise an error. (see #223 (comment))
This PR adds the first two components of the DataComp pipeline, where it reuses the `load_from_hf_hub` component but with a different `fondant_component.yaml` file. --------- Co-authored-by: Niels Rogge <nielsrogge@Nielss-MacBook-Pro.local>
This PR filters out data from the pandas dataframe returned by the user that is not defined in the component spec. Previously, returning additional columns would raise an error. (see #223 (comment))
This PR adds the first two components of the DataComp pipeline, where it reuses the
load_from_hf_hub
component but with a differentfondant_component.yaml
file.