Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The development of custom templated kedro starters is impractical #1961

Closed
foxale opened this issue Oct 21, 2022 · 7 comments
Closed

The development of custom templated kedro starters is impractical #1961

foxale opened this issue Oct 21, 2022 · 7 comments
Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature

Comments

@foxale
Copy link

foxale commented Oct 21, 2022

Description and context

So to give you some background, we work with a bunch of business use cases involving data processing and machine learning (customer segmentation, next best offer, etc.) and for each business problem we want to have a generalized but customizable solution.

When I first read about kedro starters, it sounded like a perfect match — you can create a custom prompt asking users (data scientists) about name of the project they're working on, the data source where their data is stored, credentials for this data source, and.. boom! Automagically the project is all setup and ready to run.

But after few sprints of starter development we started realizing how harder and slower the development and maintenance of a starter is as compared to the usual, not templated project. The cons we came across:

  1. Introducing code changes (PRs) is a nightmare
    The flow goes like this:
    1. Create a new project instance from a starter
    2. Develop a feature there
    3. Manually copy paste it back to local starter
    4. Create a project instance one more time to test the changes
    5. If it's ok, push the changes, otherwise go back few steps and repeat
      Which is five painful steps as opposed to just create a new branch + (ii) + (v)
  2. Reviewing code changes (PRs) is a nightmare
    • Each time you have to create a new project instance from a starter
    • You pretty much can't switch between branches / PRs or do any kinds of git operations, as you're working on a separate project instance
  3. Problems with linting templated projects
    Should testing and linting dependencies and configuration be dropped from starters and / or templates? #1849
  4. Problems with integrating with other templated packages
    kedro-dbt plugin #1813
  5. Integration with poetry is problematic as you already need kedro to execute kedro new
    Poetry Support for Kedro Projects #1722

And the pros? So far it looks like the whole idea of (at least custom templated) starters could be reduced to a global config file and git operations.

I wouldn't be surprised if it turned out I'm missing one or more crucial details that makes starters valuable, so if that's the case just fill me in please :)

Possible Implementation

I can see how the templated starters look cool, but what’s the real benefit here?

Possible Alternatives

Project config + git CLI

@foxale foxale added the Issue: Feature Request New feature or improvement to existing feature label Oct 21, 2022
@noklam
Copy link
Contributor

noklam commented Oct 24, 2022

@foxale Thank you for your thoughtful and well-written issue.

Is the goal to develop a template for individual use cases? The template is meant to be a stable structure for projects. If you are developing a new template, I think the making of the template would be the final step instead of the first step.

You will start to develop a new project, and iterate it until you find a reusable structure. Once you have this structure, you will create a template out of it and share it across different projects.

  1. Introducing code changes (PRs) is a nightmare
    The flow goes like this:

    1. Create a new project instance from a starter
    2. Develop a feature there
    3. Manually copy-paste it back to the local starter
    4. Create a project instance one more time to test the changes
    5. If it's ok, push the changes, otherwise, go back a few steps and repeat
      Which is five painful steps as opposed to just creating a new branch + (ii) + (v)

Does "local starter" mean your custom template? I think there are 2 levels of code sharing, and starters shouldn't be used as a replacement for a Python Library.

about name of the project they're working on, the data source where their data is stored, credentials for this data source

This is indeed the purpose of a starter - and it shouldn't be changed rapidly in project development, what kind of code changes are you introducing?

IMO, the starters are good for the code/file that you need to copy-paste over and over again, but they are not module/functions (source code)

@merelcht merelcht added the Community Issue/PR opened by the open-source community label Nov 8, 2022
@merelcht
Copy link
Member

Hi @foxale, do you have any more thoughts or ideas to add here, also following on to @noklam 's comment? We really appreciate the feedback and want to improve your experience using Kedro. However, starter template improvements aren't a huge priority at the moment, so understanding your pain points completely will help giving this issue the right attention.

@astrojuanlu
Copy link
Member

(1) could be solved by https://github.com/copier-org/copier update capabilities, (5) could be solved by copier ability to apply a template to the current directory (which cookiecutter cannot do).

@merelcht
Copy link
Member

merelcht commented Mar 14, 2024

We haven't heard from the original author for a while and I'm wondering if this is something we want to tackle any time soon? Replacing cookiecutter would be quite the undertaking and I don't think that's where we can add the most value currently. Thoughts @astrojuanlu ?

@astrojuanlu
Copy link
Member

I agree we have other areas to focus on currently, but I think we should keep this issue open. There's a long list of problems our current kedro new flow has: the current flow is confusing, new tools cannot be added after project creation, templates cannot be updated when there are upstream changes, and finally the ones @foxale mentioned originally. This is nudging intermediate and advance users towards creating their own templates, and beginner users to sidestep or ignore kedro new altogether. In fact, I have anecdotal evidence that lots of teams with years of Kedro experience don't know kedro new exists.

There's enough evidence that we have to do something, but whatever we do must not be part of the pip install kedro experience (precisely to avoid the global + local installation problem, hence if anything it should be a separate tool) and it should take project maintenance into account and not just project creation.

@sfc-gh-plis
Copy link

sfc-gh-plis commented Mar 18, 2024

Apologies, shortly after writing the post I moved on to a different engagement and eventually to a new job.

Is the goal to develop a template for individual use cases? The template is meant to be a stable structure for projects. If you are developing a new template, I think the making of the template would be the final step instead of the first step.

Right, but in order to have it you first need to create it. And even after let's say completing the template there is the usual maintenance effort - version bumps, bugfixes, adding new extensions, etc. At that time our templates were quite sophisticated, and we wanted our end users (other data scientists) to use them while we were still working on them and so there was no way to do "make a project to template" conversion the final step.

I remember that around the time I first wrote the original post, I came into realization that it would probably be just easier to maintain a codebase composed of a single project instead of a template, and with the addition of proper config it would have the exact same capabilities but way easier to maintain.

@astrojuanlu
Copy link
Member

I still think there are some very valid concerns raised here, but I'm turning this into a discussion to continue the conversation there.

@kedro-org kedro-org locked and limited conversation to collaborators Nov 8, 2024
@astrojuanlu astrojuanlu converted this issue into discussion #4312 Nov 8, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
Community Issue/PR opened by the open-source community Issue: Feature Request New feature or improvement to existing feature
Projects
None yet
Development

No branches or pull requests

5 participants