-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make inspect.get_dataset_config_names always return a non-empty list #3159
Conversation
This reverts commit 90e6ca2.
This PR is already working (although not very beautiful; see below): the idea was to have the I have a plan for a follow-up refactoring (same functionality, better implementation, much nicer), but I think we could already merge this, so that @severo can test it in the datasets previewer and report any potential issues. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure that storing the kwargs in a class attribute is a good idea.
Indeed the kwargs depend on how the builder is loaded, and therefore you could overwrite another builder's kwargs:
from datasets import *
b1 = load_dataset_builder("text", data_files="setup.py")
print(b1.kwargs["data_files"])
# {'train': [PosixPath('/Users/quentinlhoest/Desktop/hf/datasets/setup.py')]}
b2 = load_dataset_builder("text", data_files="setup.cfg")
print(b2.kwargs["data_files"])
# {'train': [PosixPath('/Users/quentinlhoest/Desktop/hf/datasets/setup.cfg')]}
print(b1.kwargs["data_files"])
# {'train': [PosixPath('/Users/quentinlhoest/Desktop/hf/datasets/setup.cfg')]}
The issue can also happen for community datasets without scripts, and so it can break the dataset viewer
Though you mentioned that you have a better implementation in mind so that's great :)
Here are my thoughts if it can help:
The default config name doesn't depend on the builder class alone - but also how it is imported. For example the name of the configuration of lhoestq/demo1
is lhoestq___demo1
, and the builder class is the Csv
builder.
Therefore I would let the responsibility of the default config name to the load.py file, and not add this complexity to the builder class logic.
In particular I think you can actually get the default config name with dataset_module.builder_kwargs.get("name", "default")
if builder_cls.builder_configs
is empty.
Yes @lhoestq you are completely right. Indeed I was exclusively using In your example, the {"name": "lhoestq___demo1",...} I'm going to refactor all the approach... as I only need the name for this specific case ;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently, there is an incident on the CircleCI side: VMs failing to be created.
It works well, thanks! |
Make all named configs cases, so that no special unnamed config case needs to be handled differently.
Fix #3135.