Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verify ignore_attributes and row_id_attribute on create_dataset #964

Closed
PGijsbers opened this issue Oct 22, 2020 · 4 comments
Closed

Verify ignore_attributes and row_id_attribute on create_dataset #964

PGijsbers opened this issue Oct 22, 2020 · 4 comments
Assignees
Labels
Good First Issue Issues suitable for people new to contributing to openml-python!

Comments

@PGijsbers
Copy link
Collaborator

From #960.

When create_dataset is called, the user can specify ignored_attributes and row_id_attribute. These are attributes in the dataset which should be ignored when building models, because they represent e.g. an id field or a feature with label leakage. However, there is currently no check that the attributes listed are actually present in the data.

Using the dataset upload tutorial, if we overwrite row_id_attribute=None or ignore_attribute=None in the create_dataset calls to instead say row_id_attribute="not exist" or ignore_attribute="not exist" no error is raised, even though there is no "not exist" attribute in the dataset.

We expect either scenario to raise a ValueError clearly stating which id or ignored attribute is not present in the dataset.

@PGijsbers PGijsbers added the Good First Issue Issues suitable for people new to contributing to openml-python! label Oct 22, 2020
@ArlindKadra ArlindKadra self-assigned this Oct 26, 2020
@joaquinvanschoren
Copy link
Contributor

FYI, when you use ignored_attributes="attribute1,attribute2" as a string, it will store "attribute1,attribute2" on the server, and not "attribute1","attribute2". That's no good. Hence, if one uses a comma-separated string instead of a list, it should either return an error or auto-parse and check if all attributes exist.

@PGijsbers
Copy link
Collaborator Author

I think this will automatically be caught if this issue is resolved, as the dataset will not have an attribute named "attribute1,attribute2".

@a-moadel
Copy link
Contributor

@PGijsbers I would like start working in this task.

@PGijsbers
Copy link
Collaborator Author

#978 added client-side checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Issues suitable for people new to contributing to openml-python!
Projects
None yet
Development

No branches or pull requests

4 participants