-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update error message when kedro-datasets
is not installed or DataSet
spelling is used
#3952
Conversation
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
kedro/io/core.py
Outdated
hint = ( | ||
"Hint: If you are trying to use a dataset from `kedro-datasets`, " | ||
"make sure that the package is installed in your current environment. " | ||
"You can do so by running `pip install kedro-datasets`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"You can do so by running `pip install kedro-datasets`." | |
"You can do so by running `pip install kedro-datasets` or `pip install kedro-datasets[polars] for a specific back-end`." |
wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @datajoely 's suggestion, but I wouldn't use the wording "back-end" and just say for a specific dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this with a slightly different wording
"You can do so by running `pip install kedro-datasets`." | ||
) | ||
raise DatasetError( | ||
f"Class '{dataset_type}' not found, is this a typo?" f"\n{hint}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder:
- if we could aid debugging by spitting out the full importlib classpath at this point
- if we could check if users are missing an
__int__.py
a common issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we could check if users are missing an int.py a common issue
Is that Kedro's responsibility though?
I'm happy with this approach 👍 |
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a user is trying to use a Kedro (not custom) dataset and -
There is a typo in the dataset type name
Old spelling "DataSet" is used instead of "Dataset"
kedro-dataset is not installed
This is missing one case where load_obj
currently handle. Which is kedro-datasets
is installed but the specific dependencies is not.
i.e. you are using polars.EagerDataset
but only installed kedro-datasets
instead of kedro-datasets[polars]
.
The complexity of mainly comes from kedro.io.core where we try to be smart about the imports.
_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]
The result is that when there is a ModuleNotFoundError, we don't know whether it is:
- Typo (i.e. the module doesn't exist, wrong path etc)
- kedro-datasets is not installed
- kedro-datasets is installed, but the necessary dependencies are missing (handled by
load_obj
currently)
In the past we try to improve (3) but (1)/(2) remains confusing. I wonder if it is simpler to list out all 3 cases explicitly and avoid trying to be too smart?
Explicit is better than implicit |
@noklam It seems like if an optional requirement is missing, this error is not triggered. It is when trying to run the project, the error that shows up is when the dataset is being loaded(or saved) -
|
This is great - in the past this was an issue because sometimes it was stupid to install all of spark just to just run the Pandas parts. I wonder if it's finally time for us to may something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a neat improvement over what we have now, and the phrasing is helpful enough while not too prescriptive to help folks figure out what else to try out (provided that they read it!)
LGTM!
@ankatiyar yes, the error is triggered in different place. Since this issue only focus on the typo / kedro-datasets missing case, I have approved this now. |
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
…et` spelling is used (kedro-org#3952) * Update error message when kedro-datasets is not installed Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Release notes, update error hint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Skip test for python 3.8 Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Ignore test coverage Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: bpmeek <bpmeek.developer@gmail.com>
…et` spelling is used (kedro-org#3952) * Update error message when kedro-datasets is not installed Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Release notes, update error hint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Skip test for python 3.8 Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Ignore test coverage Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: bpmeek <bpmeek.developer@gmail.com>
…et` spelling is used (kedro-org#3952) * Update error message when kedro-datasets is not installed Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Release notes, update error hint Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Skip test for python 3.8 Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> * Ignore test coverage Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> --------- Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com> Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com> Signed-off-by: bpmeek <bpmeek.developer@gmail.com>
Description
Resolve #3911, resolve #3909
Development notes
The error message below shows up in a number of cases:
When a user is trying to use a Kedro (not custom) dataset and -
kedro-dataset
is not installedSince not having
kedro-datasets
is not the only way this error is triggered, I thought I'd update the error message with a "Hint" like in #3944 which suggests it could be a potential solution without assuming it is the ONLY solution.The hint is different (related to #3909) when "DataSet" spelling is used.
I didn't change the
"is this a typo?"
part because that very well might be a case, specially if it's a custom dataset which still should work with the "DataSet" spelling unless the user has changed it.I'd like to see what the team thinks about the messaging of the error! There's still some ongoing discussion on #3909 so I can revert this part for now.
Developer Certificate of Origin
We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a
Signed-off-by
line in the commit message. See our wiki for guidance.If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.
Checklist
RELEASE.md
file