Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parent task: Content on Kedro vs complementary tools #3012

Open
merelcht opened this issue Nov 23, 2022 · 21 comments
Open

Parent task: Content on Kedro vs complementary tools #3012

merelcht opened this issue Nov 23, 2022 · 21 comments
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation

Comments

@merelcht
Copy link
Member

merelcht commented Nov 23, 2022

Description (edit 06/09/2023)

The Kedro docs are missing a clear description about the value proposition of Kedro vs other tools.

Another topic related to this is migration guides about how to go from tool X to Kedro.

Ideas

  • article pages in docs
  • content pages for the Website to explain where Kedro fits in the "ecosystem" of tools (see https://ably.com/topics). I already have a ticket in the kedro-website project to design this so we publish them in Contentful.
  • a matrix of Kedro compared to X for various technologies that we may be compared to (rightly or wrongly). See https://ably.com/compare
  • videos (script them, work out who makes them later)
  • blogs
  • partner with plugin developer teams on content creation and run webinar showcases
@stichbury
Copy link
Contributor

Let's compile a list of these "competitor/complementary" platforms.

Category 1:

Category 2:

Category 3:

Category 4:

This is something I'll do this week, I've earmarked some time...

@stichbury stichbury transferred this issue from kedro-org/kedro Feb 23, 2023
@astrojuanlu
Copy link
Member

This is from some recent slide decks.

image

@astrojuanlu
Copy link
Member

Evidence that this could be useful for some users (private communication):

like it [Kedro] a lot, it's very versatile and interesting and above all the way it works, when you take the roll it speeds up [the development process] a lot (I think that's its goal, to make it reproducible). What I would like to have clearer is how it fits or differs from mlflow

@astrojuanlu
Copy link
Member

I think we should abstain to do blog posts or promotional content about this. People ask very frequently about Kedro vs MLFlow (happened to me last week), Kedro vs dbt (happened to me a minute ago), Kedro vs DVC and this should be more prominently explained in the documentation.

I'm advocating for moving this to https://github.com/kedro-org/kedro/ and raising its priority.

@stichbury
Copy link
Contributor

stichbury commented Sep 6, 2023

Sure, let's do this.

  • 1. Move this ticket as a "Kedro vs comparable tools, and make it a parent with a prioritized list of comparable tools
  • 2. Create a set of child tickets for each tool and execute according to priority in parent. Each "article" (could be a video, graphic, whatever, but let's assume text for now) needs to have sections on similiarities, differences, pros and cons and how to migrate to Kedro from the other tool
  • 3. Create a new parent ticket "Kedro + tools" where we write about complementary products as opposed to comparable products. Likewise prioritize what we'll add as complementary tools. This is Parent task: Content on Kedro + complementary tools (integrations with other tools, best practices and tutorials) #2817
  • 4. Create child tickets as per 2.

@astrojuanlu Could you assist me with the lists. I have this big set of potential tools but need help to decide if they're in group 2 or 4 and also priorities thereof.

  • Build-your-own <--comparable (Kedro vs. X)
  • Cookiecutter <--comparable (Kedro vs. X)
  • Dagster
  • DBT
  • DVC
  • Great expectations <-- complementary (Kedro + X)
  • Hamilton <--comparable (Kedro vs. X)
  • Intake
  • MLflow <-- complementary (Kedro + X)
  • Orchestration platforms (various) <-- complementary (Kedro + X)
  • Pachyderm
  • Ploomber
  • ZenML

@stichbury stichbury transferred this issue from kedro-org/kedro-devrel Sep 6, 2023
@stichbury stichbury removed this from Roadmap Sep 6, 2023
@stichbury stichbury added the Component: Documentation 📄 Issue/PR for markdown and API documentation label Sep 6, 2023
@stichbury stichbury changed the title Content on Kedro vs. other tools Parent task: Content on Kedro vs. other tools Sep 6, 2023
@astrojuanlu
Copy link
Member

Let's start with MLflow, dbt, DVC. The other ones are smaller and can be tackled at a later stage I think.

@stichbury
Copy link
Contributor

Could you help me categorise since MLflow isn't a comparable tool but a complementary one, for the others. I'll jot down which I think are which and that'll help with deciding on the template for each type of article.

@astrojuanlu
Copy link
Member

Notice that MLflow now has MLflow Recipes (previously MLflow Pipelines) https://mlflow.org/docs/latest/recipes.html hence it can be considered a comparable tool.

image

See also the official announcement https://www.databricks.com/blog/2022/06/29/introducing-mlflow-pipelines-with-mlflow-2-0.html

@stichbury
Copy link
Contributor

Also adding smart notebooks viz https://deepnote.com/blog/jupyter-notebook-alternative and https://hex.tech/

@stichbury stichbury self-assigned this Nov 6, 2023
@astrojuanlu
Copy link
Member

astrojuanlu commented Jan 22, 2024

Google's opinion:

image

So let's do:

  • MLflow
  • Airflow
  • dbt
  • DVC
  • Prefect
  • and maybe Dagster next

@merelcht
Copy link
Member Author

Could we take some of the content that @NeroOkwa presented in his competitor analysis for this?

@astrojuanlu
Copy link
Member

I think it's much better to focus first on "how to use Kedro and X" (#3012 (comment)) rather than "why to use Kedro instead of X/differences & similarities between Kedro and X" (@NeroOkwa's competitor analysis).

@astrojuanlu
Copy link
Member

MLflow is done, Airflow is sufficiently covered in https://docs.kedro.org/en/stable/deployment/airflow.html

I'm shifting my focus to MLOps integrations for the next couple of months before coming back to this. Will add more details later.

@astrojuanlu
Copy link
Member

Maybe Kedro and SQLMesh as an alternative to dbt?

93e6a489-0ce7-4ee5-833d-58e69f376995_2510x1642

(source https://juhache.substack.com/p/multi-engine-stacks-deserve-to-be)

@deepyaman
Copy link
Member

Maybe Kedro and SQLMesh as an alternative to dbt?

93e6a489-0ce7-4ee5-833d-58e69f376995_2510x1642

(source https://juhache.substack.com/p/multi-engine-stacks-deserve-to-be)

Just to confirm, Kedro and SQLMesh as an alternative to dbt, or Kedro as an alternative to SQLMesh and dbt?

SQLMesh is already a direct competitor to dbt, so I think the latter makes sense. From the linked article:

If you think this could easily be run as a vanilla python function outside of SQLMesh: You’re right!

But what’s nice about SQLMesh is that you can add audits to run built-in data tests based on the pandas dataframe this returns.

I think Kedro could definitely be a great fit in these situations, or in general for Python projects, and we should push that. I like the approach of showing the similarities, but focusing on how you can get similar values while working with Python. If we can get it used in something like the above project, that would be amazing!

@astrojuanlu
Copy link
Member

astrojuanlu commented Jul 30, 2024

Today we explored the possibility of showcasing how dlt with Kedro. Let's do it next.

@astrojuanlu
Copy link
Member

And let's include Delta & Iceberg too, which aren't by any means similar tools but can be used alongside Kedro successfully.

@astrojuanlu astrojuanlu changed the title Parent task: Content on Kedro vs. other tools Parent task: Content on Kedro vs complementary tools Aug 1, 2024
@astrojuanlu
Copy link
Member

astrojuanlu commented Aug 1, 2024

Summary: in the coming months let's document

  • Delta (& Apache Iceberg)
  • DVC
  • OpenTelemetry w/ Logfire
  • dlt

And for full clarity, we're focusing on complementary, and not competitive, tools for now. I think unbiased comparisons are very hard to get right and the onus should be on the user to do their due diligence and reach their own conclusions.

@deepyaman
Copy link
Member

Today we explored the possibility of showcasing how dlt with Kedro. Let's do it next.

Issue documenting initial options: #4057

@astrojuanlu
Copy link
Member

Adding DVC #2691

@astrojuanlu
Copy link
Member

Tweaking the OpenTelemetry work item to explicitly include Logfire #3978

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Documentation 📄 Issue/PR for markdown and API documentation
Projects
Status: No status
Development

No branches or pull requests

4 participants