This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Improve _FrozenDatasets
class
#3610
Milestone
You can continue the conversation there. Go to discussion →
_FrozenDatasets
class
#3610
Description
The
_FrozenDatasets
class is the class that returnscatalog.datasets
, which is supposed to be the official (and public) way for users to access datasets from the catalog through the API. However, it's been flagged by several team members that this class isn't very easy to work with:vars(catalog.datasets)[dataset_name]
orcatalog.datasets.__dict__[dataset_name]
.__init__
, delegating most of this to a few new, well-documented methods would also make this class much easier to understand.On top of that we have evidence that users frequently resort to using private methods and attributes to access datasets e.g.
catalog._datasets
and trough_get_dataset()
. So it also seems like the class isn't sufficiently allowing users to access datasets through the API. See e.g. https://github.com/Galileo-Galilei/kedro-mlflow/blob/e88679938b1d4c7633c3f631f6b402ff11ab61fe/kedro_mlflow/framework/hooks/mlflow_hook.py#L148Observations about
_FrozenDatasets
__
: https://github.com/kedro-org/kedro/blob/main/kedro/io/data_catalog.py#L86-L95_FrozenDatasets
seems to be access, because of the name conversions.Context
The reason why it's not straightforward to fetch datasets from the catalog directly, is because the catalog was designed to hide the dataset details and implementation. It's meant for loading and saving the data, but not modify in any way. The
_FrozenDatasets
class was added to make it possible to have tab completion for catalog datasets in ipython or jupyter sessions. The PR that added this functionality is on private-kedro: https://github.com/quantumblacklabs/private-kedro/pull/84/files. It's important to note that the_FrozenDatasets
needs to be immutable, if users want to inject data they should use hooks.Improvement suggestions
_FrozenDatasets
inherit fromUserDict
Important
The above suggestions are based on ideas from several Kedro engineers see e.g. #1778. However, they are mostly solutions to improve developer experience, but we need a clear view on what user needs are as well. Any implementation should be preceded by user research: #1978
The text was updated successfully, but these errors were encountered: