Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964

PGijsbers · 2020-10-22T08:13:48Z

From #960.

When create_dataset is called, the user can specify ignored_attributes and row_id_attribute. These are attributes in the dataset which should be ignored when building models, because they represent e.g. an id field or a feature with label leakage. However, there is currently no check that the attributes listed are actually present in the data.

Using the dataset upload tutorial, if we overwrite row_id_attribute=None or ignore_attribute=None in the create_dataset calls to instead say row_id_attribute="not exist" or ignore_attribute="not exist" no error is raised, even though there is no "not exist" attribute in the dataset.

We expect either scenario to raise a ValueError clearly stating which id or ignored attribute is not present in the dataset.

The text was updated successfully, but these errors were encountered:

joaquinvanschoren · 2020-10-26T15:44:09Z

FYI, when you use ignored_attributes="attribute1,attribute2" as a string, it will store "attribute1,attribute2" on the server, and not "attribute1","attribute2". That's no good. Hence, if one uses a comma-separated string instead of a list, it should either return an error or auto-parse and check if all attributes exist.

PGijsbers · 2020-10-26T16:08:13Z

I think this will automatically be caught if this issue is resolved, as the dataset will not have an attribute named "attribute1,attribute2".

a-moadel · 2020-10-27T08:30:40Z

@PGijsbers I would like start working in this task.

PGijsbers · 2020-10-29T13:59:44Z

#978 added client-side checks

PGijsbers added the Good First Issue Issues suitable for people new to contributing to openml-python! label Oct 22, 2020

ArlindKadra self-assigned this Oct 26, 2020

PGijsbers assigned a-moadel and unassigned ArlindKadra Oct 27, 2020

a-moadel mentioned this issue Oct 28, 2020

add validation for ignore_attributes and default_target_attribute at … #978

Merged

PGijsbers closed this as completed Oct 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964

Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964

PGijsbers commented Oct 22, 2020

joaquinvanschoren commented Oct 26, 2020

PGijsbers commented Oct 26, 2020

a-moadel commented Oct 27, 2020

PGijsbers commented Oct 29, 2020

Verify ignore_attributes and row_id_attribute on create_dataset #964

Verify ignore_attributes and row_id_attribute on create_dataset #964

Comments

PGijsbers commented Oct 22, 2020

joaquinvanschoren commented Oct 26, 2020

PGijsbers commented Oct 26, 2020

a-moadel commented Oct 27, 2020

PGijsbers commented Oct 29, 2020

Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964

Verify `ignore_attributes` and `row_id_attribute` on `create_dataset` #964