-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Dataset from CSV #1946
Conversation
src/datasets/io/csv.py
Outdated
self, | ||
path, | ||
name=None, | ||
data_dir=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lhoestq do we need data_dir
for Dataset.from_csv()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ! 🔥
I think the only thing left to do is just add tests for Dataset.from_csv and DatasetDict.from_csv in test_arrrow_dataset.py and test_dataset_dict.py
Good job !
@lhoestq question about public API: |
For consistence I'd say |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks all good ! Thanks for adding this 🎉
Feel free to also add from_csv
to the list of documented methods in the documentation in main_classes.rst
.
@lhoestq done! |
Implement
Dataset.from_csv
.Analogue to #1943.
If finally, the scripts should be used instead, at least we can reuse the tests here.