Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using paths relative to project_root in dataset implementation #3149

Open
jasonmhite opened this issue Oct 9, 2023 · 1 comment
Open

Using paths relative to project_root in dataset implementation #3149

jasonmhite opened this issue Oct 9, 2023 · 1 comment
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@jasonmhite
Copy link
Contributor

jasonmhite commented Oct 9, 2023

Description

I have a custom dataset implementation that is supposed to fetch some data into a cache folder that you specify in the dataset configuration. If the data isn't in the folder, it fetches it, then loads from disk. Simple enough.

I want to be able to specify a path relative to the project root e.g. data/01_raw, however I can't figure out how to access the project_root directory at runtime. Things work fine if I run from the cli since it starts in the project root, but if I manipulate the catalog in say kedro jupyter lab and then make a notebook in say notebooks/my_notebook.ipynb, working in that notebook my working directory will be notebooks. Hence if I load my dataset from the catalog it will resolve the cache folder to notebooks/data/01_raw and redownload all of my datasets.

Best I can figure, however, there is not a good way to get the project_root in a dataset implementation as you need to know the project_root folder to instantiate config/context.

Context

This could be worked around by using an absolute path, but I want to be able to redistribute my project to share with other users. Being able to specify a path that is always interpreted as relative to project_root would be helpful.

Possible Implementation

Expose project_root in a way that can be accessed from a dataset implementation. Maybe there is already a way, but I can't seem to work it out.

Possible Alternatives

Workaround is to specify an absolute path but then users need to remember to fix the path when they clone my project.

I suppose in my notebooks I could have something like a %cd context.project_root since I would have access to it, but that seems like not a great solution.

Thanks for your help as always.

@jasonmhite jasonmhite added the Issue: Feature Request New feature or improvement to existing feature label Oct 9, 2023
@astrojuanlu
Copy link
Member

Hi @jasonmhite , thanks for flagging this. We're having a discussion related to this in #2965 but the solution is still not clear. Could you have a look and tell us how it relates to your feature request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

2 participants