-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DRAFT] Dataset factories (new version) #2743
Conversation
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
I try to use your repository This doesn't work and I am not sure why Steps:
|
} | ||
# Already add explicit entry datasets | ||
for ds_name, ds_config in catalog.items(): | ||
if "}" not in ds_name and cls._is_full_config(ds_config): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need a small helper function here i.e. _is_pattern
even if it's a one-liner, as the semantics make it easier to read.
@@ -311,6 +428,28 @@ def _get_dataset( | |||
|
|||
return data_set | |||
|
|||
def _resolve_config(self, data_set_name, data_set_pattern) -> dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def _resolve_config(self, data_set_name, data_set_pattern) -> dict[str, Any]: | |
def _resolve_config(self, data_set_name: str, data_set_pattern: str) -> dict[str, Any]: |
# Merge config with entry created containing load and save versions | ||
config_copy.update(self._raw_catalog[data_set_name]) | ||
for key, value in config_copy.items(): | ||
if isinstance(value, Iterable) and "}" in value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check for "{"
sometimes but "}"
? Again can we just use one function to check if something is a pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am note sure why are we checking for Iterable
here?
string_value = str(value) | ||
# result.named: gives access to all dict items in the match result. | ||
# format_map fills in dict values into a string with {...} placeholders | ||
# of the same key name. | ||
try: | ||
config_copy[key] = string_value.format_map(result.named) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
string_value = str(value) | |
# result.named: gives access to all dict items in the match result. | |
# format_map fills in dict values into a string with {...} placeholders | |
# of the same key name. | |
try: | |
config_copy[key] = string_value.format_map(result.named) | |
# result.named: gives access to all dict items in the match result. | |
# format_map fills in dict values into a string with {...} placeholders | |
# of the same key name. | |
try: | |
config_copy[key] = str(value).format_map(result.named) |
if unsatisfied: | ||
raise ValueError( | ||
f"Pipeline input(s) {unsatisfied} not found in the DataCatalog" | ||
) | ||
|
||
free_outputs = pipeline.outputs() - set(catalog.list()) | ||
unregistered_ds = pipeline.data_sets() - set(catalog.list()) | ||
free_outputs = pipeline.outputs() - set(registered_ds) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point is all the pattern datasets materialised in catalog already?
|
||
@classmethod | ||
def _match_name_against_pattern( | ||
cls, raw_catalog: dict[str, Any], data_set_name: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is raw_cata
cls, raw_catalog: dict[str, Any], data_set_name: str | |
cls, raw_catalog: dict[str, dict[str, Any]], data_set_name: str |
Is this the correct typing? When I read through it I have a hard time to map all the types. Maybe worth to create TypeAlias
.
@staticmethod | ||
def _is_full_config(config: dict[str, Any]) -> bool: | ||
"""Check if the config is a full config""" | ||
remaining = set(config.keys()) - {"load_version", "save_version"} | ||
return bool(remaining) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why config.keys() substract load_version and save_version equal to a full config?
def __contains__(self, item): | ||
"""Check if an item is in the catalog as a materialised dataset or pattern""" | ||
if item in self._data_sets or self._match_name_against_pattern( | ||
self._raw_catalog, item | ||
): | ||
return True | ||
return False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏼 I like this
Description
Slightly different implementation of dataset factories that works with versions and credentials
UPDATE:
This now works with -
kedro run --load-versions="france_companies:2023-06-12T10.01.52.889Z"
whenfrance_companies
might not exist as an explicit catalog entry but as{country}_companies
Development notes
Checklist
RELEASE.md
file