Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCatalog2.0]: Extend KedroDataCatalog with dict interface #4175

Open
ElenaKhaustova opened this issue Sep 18, 2024 · 4 comments
Open

[DataCatalog2.0]: Extend KedroDataCatalog with dict interface #4175

ElenaKhaustova opened this issue Sep 18, 2024 · 4 comments
Assignees
Labels
Issue: Feature Request New feature or improvement to existing feature

Comments

@ElenaKhaustova
Copy link
Contributor

ElenaKhaustova commented Sep 18, 2024

Description

As a continuation of a work on DataCatalog redesign (#3995 (comment)) we suggest simplifying the interface for KedroDataCatalog and make it similar to UserDict

The reasoning: this way we want to make the catalog functioning as a collection of datasets with an interface familiar to users.

Related tickets that will be addressed:

Context

Possible Implementation

Change the KedroDataCatalog API to support __iter__, __getitem__, __setitem__, keys(), items() similar to the dict.

Replace old API such as save(), load(), list(), etc.

The first iteration can be done without breaking changes if keep old interface and reuse newly added methods.

@ElenaKhaustova ElenaKhaustova added the Issue: Feature Request New feature or improvement to existing feature label Sep 18, 2024
@datajoely
Copy link
Contributor

datajoely commented Sep 18, 2024

I'm actually less excited about this than a lot of the other proposed ideas for this redesign.

What about putting it inside a property or something like a .dict_like() method?

@ElenaKhaustova
Copy link
Contributor Author

I'm actually less excited about this than a lot of the other proposed ideas for this redesign.

What about putting it inside a property or something like a .dict_like() method?

That's an attempt to align on the interface changes before introducing new features, so it doesn't cancel the rest of the redesign idea. But since this is a proposal of a possible interface, it will be helpful at this stage if you can expand your thoughts about why it feels not good.

@noklam
Copy link
Contributor

noklam commented Sep 23, 2024

Related idea discussed in the past: #3914 (comment)

@ElenaKhaustova ElenaKhaustova self-assigned this Sep 23, 2024
@astrojuanlu
Copy link
Member

astrojuanlu commented Sep 23, 2024

We discussed this on backlog grooming. To clarify, the user impact of this one is that rather than doing

>>> catalog.list()
>>> catalog.load("my_dataset")
>>> catalog.save("my_dataset", data)
>>> for dataset in catalog.datasets: pass

the user would do

>>> list(catalog)  # __iter__
>>> catalog["my_dataset"]  # __getitem__
>>> catalog["my_dataset"] = data  # __setitem__
>>> for dataset in catalog: pass  # __iter__

While I'm not opposed to have the dict-like interface, I think it's worth discussing whether we should keep the method-based interface. We're not forced to keep it, because it's a new class and we allow ourselves to do breaking changes between minor releases. But if we do include it, I expect migration between 0.19.* and 0.20.* will definitely be easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature
Projects
Status: In Progress
Development

No branches or pull requests

4 participants