Skip to content

Commit

Permalink
[DataCatalog2.0]: KedroDataCatalog (#4151)
Browse files Browse the repository at this point in the history
* Added a skeleton for AbstractDataCatalog and KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed from_config method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented _init_datasets method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented get dataset

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Started resolve_patterns implementation

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented resolve_patterns

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed credentials resolving

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated match pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented add from dict method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated io __init__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added list method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented _validate_missing_keys

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added datasets access logic

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __contains__ and comments on lazy loading

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed dataset_name to ds_name

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated some docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed _update_ds_configs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed _init_datasets

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented add_runtime_patterns

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed runtime patterns usage

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved pattern logic out of data catalog, implemented KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* KedroDataCatalog updates

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added property to return config

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added list patterns method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed and moved ConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed ConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Cleaned KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Cleaned up DataCatalogConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Docs build fix attempt

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* KedroDataCatalog draft

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated from_config method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated constructor and add methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated _get_dataset method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated __contains__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated __eq__ and shallow_copy

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __iter__ and __getitem__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed unused imports

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added TODO

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated runner.run()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated session

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added confil_resolver property

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog list command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog create command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog rank command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog resolve command

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Remove some methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed ds configs from catalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed typo

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added module docstring

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renaming methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed None from Pattern type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs failing to find class reference

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs failing to find class reference

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated Patterns type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fix tests (#4149)

* Fix most tests

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Fix most tests

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

* Returned constants to avoid breaking changes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Udapted KedroDataCatalog for recent changes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Minor fix

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated test_sorting_order_with_other_dataset_through_extra_pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed odd properties

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed None from _fetch_credentials input

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated specs and context

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated runners

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated default catalog validation

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated default catalog validation

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated contains and added exists methods for KedroDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixing docs and lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed unit tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __eq__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed DataCatalogConfigResolver to CatalogConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed _init_configs to _resolve_config_credentials

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved functions to the class

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Refactored resolve_dataset_pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed refactored part

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Changed the order of arguments for DataCatalog constructor

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced __getitem__ with .get()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated catalog commands

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved warm up block outside of the try block

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed odd copying

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed DataCatalogConfigResolver to CatalogConfigResolver

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed AbstractDataCatalog to BaseDataCatalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Moved validate_dataset_config inside catalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed _init_dataset to _add_from_config

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fix lint

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Returned DatasetError

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added _dataset_patterns and _default_pattern to _config_resolver to avoid breaking change

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Made resolve_dataset_pattern return just dict

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added Catalogprotocol draft

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Implemented CatalogProtocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated types

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed linter

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added _ImplementsCatalogProtocolValidator

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Excluded Potocol from coverage

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed catalog source to kedro_data_catalog

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed data set to dataset in docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated add_from_dict

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Revised comments and TODOs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated error message to point to specific catalog type

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Merged with protocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed reference to DataCatalog in docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docs

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Reordered methods

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed add_all from protocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Changed the order of arguments

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __repr__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Made __getitem__ return deepcopy

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed bug in get_dataset()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed __eq__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixed docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added __setitem__

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Unit tests for `KedroDataCatalog` (#4171)

* Added KedroDataCatlog tests template

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test save/load unregistered dataset

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test_feed_dict

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added exists tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added tests for list()

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test_eq

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added test init/add datasets

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated test_adding_datasets_not_allowed

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added shallow copy tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added TestKedroDataCatalogFromConfig

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added missing tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated RELEASE.md

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed deep copies

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed some interface that will be changed in the next version

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed key completions

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Fixinf typos

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Removed key completions test

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced data set with dataset

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Added docstring for get_dataset() method

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed pytest fixture

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Addressed review comments

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated _assert_requirements_ok starters test

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Revert "Updated _assert_requirements_ok starters test"

This reverts commit 5208321.

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated error message

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced typo

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Replaced data set with dataset in docstrings

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated tests

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Made KedroDataCatalog subclass from CatalogProtocol

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Updated release notes

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

* Renamed resolve_dataset_pattern to resolve_pattern

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>

---------

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
  • Loading branch information
ElenaKhaustova and ankatiyar committed Sep 24, 2024
1 parent 5147dfb commit 53280bd
Show file tree
Hide file tree
Showing 10 changed files with 1,149 additions and 148 deletions.
5 changes: 5 additions & 0 deletions RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# Upcoming Release

## Major features and improvements
* Implemented `KedroDataCatalog` repeating `DataCatalog` functionality with a few API enhancements:
* Removed `_FrozenDatasets` and access datasets as properties;
* Added get dataset by name feature;
* `add_feed_dict()` was simplified and renamed to `add_data()`;
* Datasets' initialisation was moved out from `from_config()` method to the constructor.
* Moved development requirements from `requirements.txt` to the dedicated section in `pyproject.toml` for project template.
* Implemented `Protocol` abstraction for the current `DataCatalog` and adding new catalog implementations.
* Refactored `kedro run` and `kedro catalog` commands.
Expand Down
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,7 @@
"kedro.io.core.DatasetError",
"kedro.io.core.Version",
"kedro.io.data_catalog.DataCatalog",
"kedro.io.kedro_data_catalog.KedroDataCatalog",
"kedro.io.memory_dataset.MemoryDataset",
"kedro.io.partitioned_dataset.PartitionedDataset",
"kedro.pipeline.pipeline.Pipeline",
Expand Down Expand Up @@ -172,6 +173,7 @@
"Patterns",
"CatalogConfigResolver",
"CatalogProtocol",
"KedroDataCatalog",
),
"py:data": (
"typing.Any",
Expand Down
6 changes: 2 additions & 4 deletions kedro/framework/cli/catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -95,9 +95,7 @@ def list_datasets(metadata: ProjectMetadata, pipeline: str, env: str) -> None:

for ds_name in default_ds:
if data_catalog.config_resolver.match_pattern(ds_name):
ds_config = data_catalog.config_resolver.resolve_dataset_pattern(
ds_name
)
ds_config = data_catalog.config_resolver.resolve_pattern(ds_name)
factory_ds_by_type[ds_config.get("type", "DefaultDataset")].append(
ds_name
)
Expand Down Expand Up @@ -250,7 +248,7 @@ def resolve_patterns(metadata: ProjectMetadata, env: str) -> None:
if ds_name in explicit_datasets or is_parameter(ds_name):
continue

ds_config = data_catalog.config_resolver.resolve_dataset_pattern(ds_name)
ds_config = data_catalog.config_resolver.resolve_pattern(ds_name)

# Exclude MemoryDatasets not set in the catalog explicitly
if ds_config:
Expand Down
2 changes: 2 additions & 0 deletions kedro/io/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
Version,
)
from .data_catalog import DataCatalog
from .kedro_data_catalog import KedroDataCatalog
from .lambda_dataset import LambdaDataset
from .memory_dataset import MemoryDataset
from .shared_memory_dataset import SharedMemoryDataset
Expand All @@ -30,6 +31,7 @@
"DatasetAlreadyExistsError",
"DatasetError",
"DatasetNotFoundError",
"KedroDataCatalog",
"LambdaDataset",
"MemoryDataset",
"SharedMemoryDataset",
Expand Down
2 changes: 1 addition & 1 deletion kedro/io/catalog_config_resolver.py
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ def _resolve_config_credentials(

return resolved_configs

def resolve_dataset_pattern(self, ds_name: str) -> dict[str, Any]:
def resolve_pattern(self, ds_name: str) -> dict[str, Any]:
"""Resolve dataset patterns and return resolved configurations based on the existing patterns."""
matched_pattern = self.match_pattern(ds_name)

Expand Down
4 changes: 2 additions & 2 deletions kedro/io/data_catalog.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ def __setattr__(self, key: str, value: Any) -> None:
if key == "_original_names":
super().__setattr__(key, value)
return
msg = "Operation not allowed! "
msg = "Operation not allowed. "
if key in self.__dict__:
msg += "Please change datasets through configuration."
else:
Expand Down Expand Up @@ -324,7 +324,7 @@ def _get_dataset(
version: Version | None = None,
suggest: bool = True,
) -> AbstractDataset:
ds_config = self._config_resolver.resolve_dataset_pattern(dataset_name)
ds_config = self._config_resolver.resolve_pattern(dataset_name)

if dataset_name not in self._datasets and ds_config:
ds = AbstractDataset.from_config(
Expand Down
Loading

0 comments on commit 53280bd

Please sign in to comment.