Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create docs for best practices in Kedro pipeline deployment #2712

Open
noklam opened this issue Jun 21, 2023 · 6 comments
Open

Create docs for best practices in Kedro pipeline deployment #2712

noklam opened this issue Jun 21, 2023 · 6 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation

Comments

@noklam
Copy link
Contributor

noklam commented Jun 21, 2023

Description

Is your feature request related to a problem? A clear and concise description of what the problem is: "I'm always frustrated when ..."

Related:

Create a single set of best practices for deployment
There should be a clear mapping from "best-practice Kedro pipeline" to "best-practice Kedro deployment"

This was mentioned end of last year, this ticket is created as the follow-up action.

The core of this is explaining mapping between Kedro's pipeline to Deployment shouldn't be 1:1. In mid/long term we can provide better toolings. Meanwhile we should recommend mapping a modular pipeline -> Airflow's task / Prefect's task / Compute Node on AWS/GCP/Azure

Context

Why is this change important to you? How would you use it? How can it benefit other users?

Possible Implementation

(Optional) Suggest an idea for implementing the addition or change.

Possible Alternatives

(Optional) Describe any alternative solutions or features you've considered.

@noklam noklam added the Issue: Feature Request New feature or improvement to existing feature label Jun 21, 2023
@noklam
Copy link
Contributor Author

noklam commented Jun 21, 2023

I would really like to push this one as we get asked often, this question get asked more often with the deployment plugins.

Cc @marrrcin

@noklam noklam added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Jun 21, 2023
@marrrcin
Copy link
Contributor

How do you want to proceed with this one @noklam ?

@noklam
Copy link
Contributor Author

noklam commented Jun 27, 2023

To start with, I wan to create a documentation that guides user how to tackle these common challenge. As kedro team, we may build specific plugin for certain platform (databricks). After that, the next step may be #2058 or something else.

What are the remaining challenges when deploying a Kedro pipeline? I don't have too many real world experience and need more of your input here. I'd focus on the pipeline & performance here. There are many I/O overhead when people simply do a 1:1 mapping. We have been repeating answer like this on Slack "You should deploy a modular pipeline to a Task rather than having a 1:1 mapping between Kedro's node to orchestrator".

What do you need to do differently when moving a Kedro pipeline to Azure? How different is it if it is SageMaker instead?

@noklam
Copy link
Contributor Author

noklam commented Jul 3, 2023

  • Aligning best practice
  • Requires modification of deployment guide

@deepyaman
Copy link
Member

How do you want to proceed with this one @noklam ?

@marrrcin We just had some discussion on this in backlog grooming. In short, we need to:

  1. Align on what we consider the "best practice" approach to deployment (in particular, how to map nodes to orchestrators)
  2. Create a document explaining the best option(s) (and likely the rationale?)
  3. Update existing deployment guides to be in line with what we say is best practice
    • It's possible we could initially mark deployment guides that aren't aligned with what we consider best practice as outdated, but it's unclear what users would do with that information, if they're not comfortable modifying the deployment code themselves

I think step 1 is especially important, because I think the recommendation above

Meanwhile we should recommend mapping a modular pipeline -> Airflow's task / Prefect's task / Compute Node on AWS/GCP/Azure

largely comes from my push (and maybe @datajoely? I can't remember, but without much chance for me at least to implement it in practice). And now, I'm not totally sure this is ideal, and think you may have better thoughts (as in https://kedro-org.slack.com/archives/C03RKPCLYGY/p1688043383962069?thread_ts=1687854961.990649&cid=C03RKPCLYGY). We should also get the inputs from the broader team, and it would be a good topic for tech design to reach alignment on what we consider "best practice".

So... would you be willing to lead a tech design session on this? :)

@marrrcin
Copy link
Contributor

marrrcin commented Jul 3, 2023

Sure, but not this week.

@noklam noklam added Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation and removed Issue: Feature Request New feature or improvement to existing feature labels Jul 3, 2023
@stichbury stichbury changed the title [Docs] - Create docs for best practice Kedro pipeline to Best practice Kedro deployment [Docs] - Create docs for best practices in Kedro pipeline deployment Jul 17, 2023
@stichbury stichbury changed the title [Docs] - Create docs for best practices in Kedro pipeline deployment Create docs for best practices in Kedro pipeline deployment Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
Status: No status
Development

No branches or pull requests

4 participants