Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve add_feed_dict docstrings #4009

Merged
merged 8 commits into from
Jul 17, 2024

Conversation

ElenaKhaustova
Copy link
Contributor

@ElenaKhaustova ElenaKhaustova commented Jul 15, 2024

Description

Solves #3612

Development notes

  • Updated the docstrings for DataCatalog's add_feed_dict method and extended them with an additional example.
  • Created form_dict alias for add_feed_dict method and added a note to the docstring that the method will be renamed in the 0.20.0 release.

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
@ElenaKhaustova
Copy link
Contributor Author

ElenaKhaustova commented Jul 15, 2024

I would also suggest replacing io with catalog in the rest of the DataCatalog docstrings, as the latter feels more intuitive. But in a separate PR.

@ElenaKhaustova ElenaKhaustova marked this pull request as ready for review July 15, 2024 16:59
@@ -681,29 +681,41 @@ def add_all(
self.add(name, dataset, replace)

def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> None:
"""Adds instances of ``MemoryDataset``, containing the data provided
through feed_dict.
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have a style guide for docstrings yet? In any case, following Pydocstyle D404 and D401,

Suggested change
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`.
"""Add datasets to the ``DataCatalog`` using the data provided through the `feed_dict`.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add additional ?

Suggested change
"""This function adds datasets to the ``DataCatalog`` using the data provided through the `feed_dict`.
"""Add additional datasets to the ``DataCatalog`` using the data provided through the `feed_dict`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't fully understand why we need additional here, as we can add them to the empty catalog as well. So, it looks like we do not need to specify any adding conditions.

@@ -713,6 +725,9 @@ def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> Non

self.add(dataset_name, dataset, replace)

# Alias for add_feed_dict method
from_dict = add_feed_dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pedantic nitpick, but when I read from_dict I assume it's a classmethod, like

class DataCatalog:
  @classmethod
  def from_dict(cls, data_dict: dict[str, AbstractDataset]):
    ...
    return cls

So not sure if I'd add this alias.

(I know this comes from #3612 (comment) and that I asserted, but my response there is incomplete)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually quite a significant change. I think we should discuss first what a good alternative is, before just adding an alias.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has always been an undiscovered part of the API, I agree with @merelcht that we should think really hard / research what a better name should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed an alias until we are sure about adding it and agree on its name.

To @astrojuanlu's point, if we decide to keep an alias, we can use the add_from_dict name instead.

Or, as an alternative, we can add a new implementation (add_from_dict / add_datasets) and use it inside add_feed_dict with a deprecation warning.

Comment on lines 686 to 688
`feed_dict` dictionary key is used as a dataset name, and a value is used to create an instance of
``MemoryDataset`` before adding to the ``DataCatalog`` for all the value types except of ``AbstractDataset``.
In the last case, the ``AbstractDataset`` is added as it is.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about

Suggested change
`feed_dict` dictionary key is used as a dataset name, and a value is used to create an instance of
``MemoryDataset`` before adding to the ``DataCatalog`` for all the value types except of ``AbstractDataset``.
In the last case, the ``AbstractDataset`` is added as it is.
`feed_dict` keys are used as dataset names, and values can either be raw data or instances of ``AbstractDataset``.
In the former case, instances of ``MemoryDataset`` are automatically created before adding to the ``DataCatalog``.

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually don't think mentioning AbstractDataset is useful here, we shouldn't really have instances of an interface right? Does the user gain anything for understanding this implementation detail?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I rephrased it, but I still think it's useful to distinguish between the two cases when adding raw data and a dataset.

@@ -713,6 +725,9 @@ def add_feed_dict(self, feed_dict: dict[str, Any], replace: bool = False) -> Non

self.add(dataset_name, dataset, replace)

# Alias for add_feed_dict method
from_dict = add_feed_dict
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually quite a significant change. I think we should discuss first what a good alternative is, before just adding an alias.

kedro/io/data_catalog.py Outdated Show resolved Hide resolved
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Signed-off-by: Elena Khaustova <ymax70rus@gmail.com>
Copy link
Member

@astrojuanlu astrojuanlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@ankatiyar ankatiyar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

much clearer!

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ⭐

@ElenaKhaustova ElenaKhaustova enabled auto-merge (squash) July 17, 2024 15:45
@ElenaKhaustova ElenaKhaustova merged commit ea54d60 into main Jul 17, 2024
34 checks passed
@ElenaKhaustova ElenaKhaustova deleted the feature/3612-add-feed-dict-docstrings branch July 17, 2024 16:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants