-
Notifications
You must be signed in to change notification settings - Fork 893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve add_feed_dict
docstrings
#4009
Changes from 4 commits
cbd73ee
0e6e60b
d0420f1
232b1b1
ee563c1
dada3d4
390dfd1
da0eda8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -681,29 +681,41 @@ def add_all( | |||||||||||
self.add(name, dataset, replace) | ||||||||||||
|
||||||||||||
def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> None: | ||||||||||||
"""Adds instances of ``MemoryDataset``, containing the data provided | ||||||||||||
through feed_dict. | ||||||||||||
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`. | ||||||||||||
ElenaKhaustova marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||
|
||||||||||||
`feed_dict` dictionary key is used as a dataset name, and a value is used to create an instance of | ||||||||||||
``MemoryDataset`` before adding to the ``DataCatalog`` for all the value types except of ``AbstractDataset``. | ||||||||||||
In the last case, the ``AbstractDataset`` is added as it is. | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What about
Suggested change
? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually don't think mentioning There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a good point. I rephrased it, but I still think it's useful to distinguish between the two cases when adding raw data and a dataset. |
||||||||||||
|
||||||||||||
Has `from_dict` alias, so it can be called using `from_dict` name instead. | ||||||||||||
Will be fully renamed to `from_dict` in the `0.20.0` version. | ||||||||||||
|
||||||||||||
Args: | ||||||||||||
feed_dict: A feed dict with data to be added in memory. | ||||||||||||
replace: Specifies whether to replace an existing dataset | ||||||||||||
with the same name is allowed. | ||||||||||||
feed_dict: A feed dict with data to be added to the ``DataCatalog``. | ||||||||||||
replace: Specifies whether to replace an existing dataset with the same name in the ``DataCatalog``. | ||||||||||||
|
||||||||||||
Example: | ||||||||||||
:: | ||||||||||||
|
||||||||||||
>>> from kedro_datasets.pandas import CSVDataset | ||||||||||||
>>> import pandas as pd | ||||||||||||
>>> | ||||||||||||
>>> df = pd.DataFrame({'col1': [1, 2], | ||||||||||||
>>> 'col2': [4, 5], | ||||||||||||
>>> 'col3': [5, 6]}) | ||||||||||||
>>> df = pd.DataFrame({"col1": [1, 2], | ||||||||||||
>>> "col2": [4, 5], | ||||||||||||
>>> "col3": [5, 6]}) | ||||||||||||
>>> | ||||||||||||
>>> io = DataCatalog() | ||||||||||||
>>> io.add_feed_dict({ | ||||||||||||
>>> 'data': df | ||||||||||||
>>> catalog = DataCatalog() | ||||||||||||
>>> catalog.add_feed_dict({ | ||||||||||||
>>> "data_df": df | ||||||||||||
>>> }, replace=True) | ||||||||||||
>>> | ||||||||||||
>>> assert io.load("data").equals(df) | ||||||||||||
>>> assert catalog.load("data_df").equals(df) | ||||||||||||
>>> | ||||||||||||
>>> csv_dataset = CSVDataset(filepath="test.csv") | ||||||||||||
>>> csv_dataset.save(df) | ||||||||||||
>>> catalog.add_feed_dict({"data_csv_dataset": csv_dataset}) | ||||||||||||
>>> | ||||||||||||
>>> assert catalog.load("data_csv_dataset").equals(df) | ||||||||||||
""" | ||||||||||||
for dataset_name in feed_dict: | ||||||||||||
if isinstance(feed_dict[dataset_name], AbstractDataset): | ||||||||||||
|
@@ -713,6 +725,9 @@ def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> Non | |||||||||||
|
||||||||||||
self.add(dataset_name, dataset, replace) | ||||||||||||
|
||||||||||||
# Alias for add_feed_dict method | ||||||||||||
from_dict = add_feed_dict | ||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pedantic nitpick, but when I read class DataCatalog:
@classmethod
def from_dict(cls, data_dict: dict[str, AbstractDataset]):
...
return cls So not sure if I'd add this alias. (I know this comes from #3612 (comment) and that I asserted, but my response there is incomplete) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is actually quite a significant change. I think we should discuss first what a good alternative is, before just adding an alias. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This has always been an undiscovered part of the API, I agree with @merelcht that we should think really hard / research what a better name should be. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I removed an alias until we are sure about adding it and agree on its name. To @astrojuanlu's point, if we decide to keep an alias, we can use the Or, as an alternative, we can add a new implementation ( |
||||||||||||
|
||||||||||||
def list(self, regex_search: str | None = None) -> list[str]: | ||||||||||||
""" | ||||||||||||
List of all dataset names registered in the catalog. | ||||||||||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a style guide for docstrings yet? In any case, following Pydocstyle D404 and D401,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add additional ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't fully understand why we need
additional
here, as we can add them to the empty catalog as well. So, it looks like we do not need to specify any adding conditions.