Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update error message when kedro-datasets is not installed or DataSet spelling is used #3952

Merged
merged 5 commits into from
Jun 19, 2024

Conversation

ankatiyar
Copy link
Contributor

@ankatiyar ankatiyar commented Jun 13, 2024

Description

Resolve #3911, resolve #3909

Development notes

The error message below shows up in a number of cases:

Class '{dataset_name}' not found, is this a typo?

When a user is trying to use a Kedro (not custom) dataset and -

  • There is a typo in the dataset type name
  • Old spelling "DataSet" is used instead of "Dataset"
  • kedro-dataset is not installed

Since not having kedro-datasets is not the only way this error is triggered, I thought I'd update the error message with a "Hint" like in #3944 which suggests it could be a potential solution without assuming it is the ONLY solution.
The hint is different (related to #3909) when "DataSet" spelling is used.
I didn't change the "is this a typo?" part because that very well might be a case, specially if it's a custom dataset which still should work with the "DataSet" spelling unless the user has changed it.

I'd like to see what the team thinks about the messaging of the error! There's still some ongoing discussion on #3909 so I can revert this part for now.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@ankatiyar ankatiyar marked this pull request as ready for review June 13, 2024 15:24
@ankatiyar ankatiyar requested a review from merelcht as a code owner June 13, 2024 15:24
kedro/io/core.py Outdated
hint = (
"Hint: If you are trying to use a dataset from `kedro-datasets`, "
"make sure that the package is installed in your current environment. "
"You can do so by running `pip install kedro-datasets`."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"You can do so by running `pip install kedro-datasets`."
"You can do so by running `pip install kedro-datasets` or `pip install kedro-datasets[polars] for a specific back-end`."

wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @datajoely 's suggestion, but I wouldn't use the wording "back-end" and just say for a specific dataset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this with a slightly different wording

"You can do so by running `pip install kedro-datasets`."
)
raise DatasetError(
f"Class '{dataset_type}' not found, is this a typo?" f"\n{hint}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder:

  • if we could aid debugging by spitting out the full importlib classpath at this point
  • if we could check if users are missing an __int__.py a common issue

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we could check if users are missing an int.py a common issue

Is that Kedro's responsibility though?

@merelcht
Copy link
Member

I'm happy with this approach 👍

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When a user is trying to use a Kedro (not custom) dataset and -
There is a typo in the dataset type name
Old spelling "DataSet" is used instead of "Dataset"
kedro-dataset is not installed

This is missing one case where load_obj currently handle. Which is kedro-datasets is installed but the specific dependencies is not.

i.e. you are using polars.EagerDataset but only installed kedro-datasets instead of kedro-datasets[polars].

The complexity of mainly comes from kedro.io.core where we try to be smart about the imports.

_DEFAULT_PACKAGES = ["kedro.io.", "kedro_datasets.", ""]

The result is that when there is a ModuleNotFoundError, we don't know whether it is:

  1. Typo (i.e. the module doesn't exist, wrong path etc)
  2. kedro-datasets is not installed
  3. kedro-datasets is installed, but the necessary dependencies are missing (handled by load_obj currently)

In the past we try to improve (3) but (1)/(2) remains confusing. I wonder if it is simpler to list out all 3 cases explicitly and avoid trying to be too smart?

@datajoely
Copy link
Contributor

In the past we try to improve (3) but (1)/(2) remains confusing. I wonder if it is simpler to list out all 3 cases explicitly and avoid trying to be too smart?

Explicit is better than implicit

@ankatiyar
Copy link
Contributor Author

@noklam It seems like if an optional requirement is missing, this error is not triggered. It is when trying to run the project, the error that shows up is when the dataset is being loaded(or saved) -

kedro.io.core.DatasetError: Failed while loading data from data set ExcelDataset(filepath=/Users/ankita_katiyar/kedro_projects/demo/data/01_raw/shuttles.xlsx, load_args={'engine': openpyxl}, protocol=file, save_args={'index': False}, writer_args={'engine': openpyxl}).
Missing optional dependency 'openpyxl'.  Use pip or conda to install openpyxl.

@datajoely
Copy link
Contributor

This is great - in the past this was an issue because sometimes it was stupid to install all of spark just to just run the Pandas parts.

I wonder if it's finally time for us to may something like kedro run --dry a 1st class command for validating if a session is 'runnable'. It would be useful for our own testing and for users to build an integration test.

Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a neat improvement over what we have now, and the phrasing is helpful enough while not too prescriptive to help folks figure out what else to try out (provided that they read it!)

LGTM!

@noklam
Copy link
Contributor

noklam commented Jun 19, 2024

@ankatiyar yes, the error is triggered in different place. Since this issue only focus on the typo / kedro-datasets missing case, I have approved this now.

Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
@ankatiyar ankatiyar enabled auto-merge (squash) June 19, 2024 11:11
@ankatiyar ankatiyar merged commit 1c95627 into main Jun 19, 2024
41 checks passed
@ankatiyar ankatiyar deleted the error-message-datasets-not-installed branch June 19, 2024 11:29
bpmeek pushed a commit to bpmeek/kedro that referenced this pull request Jun 20, 2024
…et` spelling is used (kedro-org#3952)

* Update error message when kedro-datasets is not installed

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Release notes, update error hint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Skip test for python 3.8

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Ignore test coverage

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: bpmeek <bpmeek.developer@gmail.com>
bpmeek pushed a commit to bpmeek/kedro that referenced this pull request Jun 20, 2024
…et` spelling is used (kedro-org#3952)

* Update error message when kedro-datasets is not installed

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Release notes, update error hint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Skip test for python 3.8

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Ignore test coverage

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: bpmeek <bpmeek.developer@gmail.com>
bpmeek pushed a commit to bpmeek/kedro that referenced this pull request Jul 18, 2024
…et` spelling is used (kedro-org#3952)

* Update error message when kedro-datasets is not installed

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Release notes, update error hint

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Skip test for python 3.8

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Ignore test coverage

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: bpmeek <bpmeek.developer@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants