Verify ignore_attributes
and row_id_attribute
on create_dataset
#964
Labels
Good First Issue
Issues suitable for people new to contributing to openml-python!
From #960.
When
create_dataset
is called, the user can specifyignored_attributes
androw_id_attribute
. These are attributes in the dataset which should be ignored when building models, because they represent e.g. an id field or a feature with label leakage. However, there is currently no check that the attributes listed are actually present in the data.Using the dataset upload tutorial, if we overwrite
row_id_attribute=None
orignore_attribute=None
in thecreate_dataset
calls to instead sayrow_id_attribute="not exist"
orignore_attribute="not exist"
no error is raised, even though there is no "not exist" attribute in the dataset.We expect either scenario to raise a
ValueError
clearly stating which id or ignored attribute is not present in the dataset.The text was updated successfully, but these errors were encountered: