Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add documentation about nodes with generator functions #2302

Merged
merged 15 commits into from
Feb 13, 2023

Conversation

SajidAlamQB
Copy link
Contributor

@SajidAlamQB SajidAlamQB commented Feb 9, 2023

Signed-off-by: SajidAlamQB 90610031+SajidAlamQB@users.noreply.github.com

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Related to: #2170

After introducing the ability to wrap generator functions in Kedro in #2161, we should add an example in the docs how this can be leveraged to process large datasets in chunks. The example can show a repurposed split_dataset function to process chunk-wise data: https://github.com/kedro-org/kedro-starters/blob/main/pandas-iris/%7B%7B%20cookiecutter.repo_name%20%7D%7D/src/%7B%7B%20cookiecutter.python_package%20%7D%7D/nodes.py#L13

In the example, we should also implement a custom DataSet, which saves the data in an append-or-create mode (a+ mode).

Development notes

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
@SajidAlamQB SajidAlamQB self-assigned this Feb 9, 2023
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
@SajidAlamQB
Copy link
Contributor Author

@idanov @merelcht would love to hear your early thoughts on it.

Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some small comments, but in general I think this looks good 🙂

docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
Copy link
Member

@idanov idanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good start and I think it should read like a mini-tutorial, obviously skipping the parts of creating the example project, but mentioning that we are working on the pandas-iris example for demonstrating the functionality. The goal here is to enable the users to be able to reproduce what you have done to get it working.

docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
docs/source/nodes_and_pipelines/nodes.md Outdated Show resolved Hide resolved
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
@SajidAlamQB SajidAlamQB marked this pull request as ready for review February 10, 2023 12:44
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
…com/kedro-org/kedro into docs/nodes-with-generator-functions

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great explanation! 👍 I'd maybe ask @stichbury to have a look at this as well.

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
Copy link
Contributor

@stichbury stichbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you! Just a couple of extra suggestions ⭐ 🌟 ⭐

Signed-off-by: SajidAlamQB <90610031+SajidAlamQB@users.noreply.github.com>
@SajidAlamQB SajidAlamQB added this pull request to the merge queue Feb 13, 2023
Merged via the queue into main with commit c83ce9a Feb 13, 2023
@SajidAlamQB SajidAlamQB deleted the docs/nodes-with-generator-functions branch February 13, 2023 12:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add an example in the documentation about nodes with generator functions
4 participants