Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document "magic" behavior of filepath and other attributes #2942

Open
astrojuanlu opened this issue Aug 17, 2023 · 2 comments
Open

Document "magic" behavior of filepath and other attributes #2942

astrojuanlu opened this issue Aug 17, 2023 · 2 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Hacktoberfest Issue suitable for hacktoberfest

Comments

@astrojuanlu
Copy link
Member

Description

https://linen-slack.kedro.org/t/12920230/hi-all-the-other-day-i-was-making-a-custom-dataset-for-the-h#4537cfc5-4656-4b9f-9330-e199d48ac8fa

Hi All,
The other day i was making a custom dataset for the Huggingface AudioFolder dataset, which takes a folder as an argument. As such, i gave it the parameter data_dir as input, instead of filepath, it took me roughly an hour of debugging to figure out why i loading the dataset was now dependant on the current working directory, and just wouldn't load if i gave it a relative path (data/01_raw/..) instead of workspace/project_name/data/01_raw/….
Anyway, the issue was that filepath has a (buried) custom resolver in AbstractDataSet baseclass.

Indeed, Kedro has a hardcoded list of special properties of datasets that get special treatment:

# only check a few conf keys that are known to specify a path string as value
conf_keys_with_filepath = ("filename", "filepath", "path")

As noted by @noklam in #2819 (comment), this does not happen when one uses the DataCatalog as a standalone library component.

Context

Possible Implementation

Possible Alternatives

An alternative is to remove this behavior or make it configurable. This will probably come up as one of the work items of #2819. In the meantime though, we should document it somewhere.

@astrojuanlu astrojuanlu added Component: Documentation 📄 Issue/PR for markdown and API documentation Issue: Feature Request New feature or improvement to existing feature labels Aug 17, 2023
@noklam
Copy link
Contributor

noklam commented Aug 17, 2023

I think this need to be included in the documentation. This is part of the artifact of CustomDataSet is bind with an implicit contract (by convention, rather than fixed interface).

Note for this example, it will be hard to figure out all possible "file" attribute if we allowed any kind of keywords. It is not a bad thing.

Maybe the alternative is having a generic way of handling path, or may need to move these logic around to have consistent behavior across framework and library?

@stichbury
Copy link
Contributor

Thanks for this. Let me know when/where this should go but I will discount including it in the current round of updates to the data catalog docs in #2888 🙏

@stichbury stichbury removed the Issue: Feature Request New feature or improvement to existing feature label Aug 18, 2023
@stichbury stichbury moved this to To Do in Kedro Framework Aug 18, 2023
@stichbury stichbury removed the status in Kedro Framework Oct 2, 2023
@ankatiyar ankatiyar added the Hacktoberfest Issue suitable for hacktoberfest label Sep 30, 2024
@ankatiyar ankatiyar moved this to To Do in Kedro Framework Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Hacktoberfest Issue suitable for hacktoberfest
Projects
Status: To Do
Status: Todo
Development

No branches or pull requests

4 participants