Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefer split patterns from directories over split patterns from filenames #4985

Conversation

polinaeterna
Copy link
Contributor

@polinaeterna polinaeterna commented Sep 16, 2022

related to #4895

@polinaeterna polinaeterna marked this pull request as ready for review September 16, 2022 11:20
@polinaeterna polinaeterna changed the title [WIP] Prefer splits patterns from directories over split patterns from filenames [WIP] Prefer split patterns from directories over split patterns from filenames Sep 16, 2022
@polinaeterna polinaeterna marked this pull request as draft September 16, 2022 11:22
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 16, 2022

The documentation is not available anymore as the PR was closed or merged.

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks ! adding a test just for this is maybe overkill but as you want

@@ -73,10 +73,10 @@ class Url(str):
Split.TRAIN: ["**"],
}

ALL_SPLIT_PATTERNS = [SPLIT_PATTERN_SHARDED]
ALL_SPLIT_PATTERNS = [SPLIT_PATTERN_SHARDED] # TODO: should this be changed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine this way. This corresponds to the patterns that contain custom split names (other than train/dev/test)

@mariosasko
Copy link
Collaborator

Can we merge this one since the issue this PR fixes was reported for the second time? I also think we don't need a test for this simple change.

@polinaeterna polinaeterna marked this pull request as ready for review September 27, 2022 18:10
@polinaeterna
Copy link
Contributor Author

@mariosasko sure! could you please approve it?

@polinaeterna polinaeterna changed the title [WIP] Prefer split patterns from directories over split patterns from filenames Prefer split patterns from directories over split patterns from filenames Sep 28, 2022
Copy link
Collaborator

@mariosasko mariosasko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! Thanks for fixing this.

@polinaeterna polinaeterna merged commit c00fdac into huggingface:main Sep 29, 2022
@shaneacton
Copy link

shaneacton commented Oct 6, 2022

Hi there @polinaeterna @mariosasko! I have installed 5.2.3.dev0, which should have this fix. Unfortunately, I am still getting the error:
ValueError: Unknown split "validation". Should be one of ['train']. When I call load_dataset("csv", data_files=files, split=split)

Any help would be greatly appreciated!

@polinaeterna polinaeterna deleted the prefer-splits-in-dirs-over-filenames branch November 2, 2022 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants