Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the difference with luigi? #660

Closed
pietermarsman opened this issue Jan 7, 2021 · 6 comments
Closed

What is the difference with luigi? #660

pietermarsman opened this issue Jan 7, 2021 · 6 comments
Labels
documentation Improvements or additions to documentation question Further information is requested

Comments

@pietermarsman
Copy link

Hi there,

I'm considering mulitple tools to orchestrate a machine-learning training and deployment pipeline. Up until now I've been looking at Argo, Prefect, Luigi and Airflow extinsively.

I'm not sure what the added benefits of flyte are compared to e.g. luigi. Can you add more documentation about how flyte compares to other similiar workflow managers.

@kumare3
Copy link
Contributor

kumare3 commented Jan 7, 2021

@pietermarsman thank you for the question, this is one of the things we will be adding soon. I will let the spotify folks @honnix / @kanterov / @narape - who are now using Flyte extensively answer in more detail. But, here is a quick summary IMO

Model:
Luigi is a python based workflow engine that has data flow as a first class citizen. The scheduler runs locally in the same procces as the workflow. Thus there is not distributed fault-tolerance or hosted experience possible.
Flyte is a specification based Workflow Engine. It borrows ideas from both Airflow and Luigi in terms of extensibility - you can easily add new extensions - like AirflowOperators. Also it is a dataflow based orchestrator, where it deeply understands the data flowing through the system. Big difference is, the workflows can be authored in any language and get converted to the common protobuf based specification and uploaded to the Flyte Service. From then on, the execution is done on a kubernetes cluster. You can visualize the dags and execute them from a UI or a CLI

Scale:
Flyte is a distributed scheduler and schedules the pipelines using variety of backend plugins - on k8s (pods, containers, spark jobs, tensorflow training etc), on other hosted services like EMR, databricks, AWS Batch etc. The scheduler is fault tolerant and resilient to machine crashes etc. At Lyft we run more than million pipelines a month and have more than 10k unique pipelines.

Versioning:
Flyte versions all the workflows and tasks (tasks are each individual execution unit in a Workflow, Workflow is the DAG). So you can go back in time to any execution and retrieve outputs from any execution and run it immediately (as long as the containers are available).
All inputs and outputs are also cached and can be retrieved from one central API using the various clients or UI

Match the developers workflow:
Users can write a single task, execute a single task, once happy combine multiple tasks into a workflow and then can have multiple schedules on a workflow. Developers can be notified on success / failure of a workflow/task.

Easy GitOps:
At Lyft for every new commit to a repo, we build all the workflows and tasks in that repository. Thus making is possible refer back to the code. This does not mean you have to build containers, for quick iteration we have a fast-register mode - that makes it possible to iterate on the code directly from the laptop in matter of seconds

Catalog Caching and Lineage tracking:
Every execution is recorded and intermediate steps recorded. Thus if the same execution - same version of code, identical inputs are observed (for deterministic algorithms), Flyte will re-use outputs from a previous execution. This makes it possible to fix bugs in a DAG, without having to redo all of the computation.

UI/CLI/SDKs:
Python SDK or JAVA SDK can be used to author workflows and tasks
Here are some examples of using the python SDK - https://flytecookbook.readthedocs.io/en/latest/auto_recipes/index.html
The UI automatically generates a form to handle all inputs, because we natively understand the data
you can also use the CLI to interact with all your workflows

Write your own plugins (Flytekit python or backend)
Flytekit allows you to simply write new operators in python that users can use. But, you can also extend FlyteBackend to add platform level capabilities - this makes it possible to write global extensions within a company, which can be deployed without having to fix the libraries etc

Backend
The scheduler and backend is all written in Golang

@kumare3
Copy link
Contributor

kumare3 commented Jan 7, 2021

Also Please join our slack channel here and feel free to ping me and we can have a longer discussion

@kumare3
Copy link
Contributor

kumare3 commented Jan 10, 2021

Also look at https://flytecookbook.readthedocs.io/en/latest/ to see the new python programming model

@honnix
Copy link
Member

honnix commented Jan 11, 2021

@kumare3 Thanks for answering.

Luigi is a python based workflow engine that has data flow as a first class citizen. The scheduler runs locally in the same procces as the workflow.

Small corrections here.

  • If data flow here refers to Google Dataflow, Luigi doesn't do anything special to that; if data flow refers to a generic concept, Luigi is task centric and workflow is merely a DAG of tasks that can be traversed/deduced at runtime, and Luigi doesn't have workflow as a physical entity/model.
  • Luigi has an optional global scheduler that ensures the same task (identified by name + input values) is not executed more than once simultaneously. There is some doc talking about this: https://luigi.readthedocs.io/en/latest/central_scheduler.html

And yes, Luigi runs all tasks deduced from an entry task in the same process (or on the same computer to be more precise, because multiple processes could be forked by Luigi to run tasks in parallel).

@kumare3
Copy link
Contributor

kumare3 commented Jan 29, 2021

@pietermarsman does this help?

@kumare3 kumare3 added documentation Improvements or additions to documentation question Further information is requested labels Feb 8, 2021
@kumare3
Copy link
Contributor

kumare3 commented Feb 23, 2021

@pietermarsman I am closing this issufor now. Thank you. Please re-open if you want any more clarifications

@kumare3 kumare3 closed this as completed Feb 23, 2021
palchicz pushed a commit to palchicz/flyte that referenced this issue Dec 23, 2021
* Update Boilerplate

Signed-off-by: Flyte-Bot <admin@flyte.org>

* fix lint

Signed-off-by: Samhita Alla <aallasamhita@gmail.com>

* fix lint

Signed-off-by: Samhita Alla <aallasamhita@gmail.com>

* fix lint

Signed-off-by: Samhita Alla <aallasamhita@gmail.com>

Co-authored-by: flyte-bot <flyte-bot@users.noreply.github.com>
Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
Signed-off-by: Yuvraj <code@evalsocket.dev>
eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022
* Added reuse able workflow (flyteorg#660)

Signed-off-by: Yuvraj <code@evalsocket.dev>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Add directive (flyteorg#663)

Signed-off-by: SmritiSatyanV <smriti@union.ai>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Added links from Flytelab (flyteorg#652)

* Added link from Flytelab

Added weather forecasting application link
Minor grammar fixes
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Created weather_forecast.rst

Created rst file to add github repo to weather-forecasting, and blog
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Fixed errors-1

Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Updated weather_forecasting.rst

Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Added flytelab and blog link

Added description, and right links.
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Changes to tutorials.rst

Placed the weather forecasting tab in a different position
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* updated ml_training.rst

Added description for ml_training file
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Changes based on review

Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Changed weather forecasting drop down to flytelab

Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Changes based on comments

Signed-off-by: SmritiSatyanV <smriti@union.ai>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Add AWS Batch example (flyteorg#636)

* Added aws batch example

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Updated dependency

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Update dockerfile

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Fixed tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* rerun tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Fixed tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* rerun tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Fixed tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Fixed tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* Fixed tests

Signed-off-by: Kevin Su <pingsutw@apache.org>

* address comment

Signed-off-by: Kevin Su <pingsutw@apache.org>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Moving register files example to use flytectl

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Update fast serialization

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Apply suggestions from code review

Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Apply suggestions from code review

Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Updated index.rst (flyteorg#670)

rephrased a sentence
Signed-off-by: SmritiSatyanV <smriti@union.ai>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Minor updates (flyteorg#669)

* Minor updates
Grammar, and rendering fix
Updates based on comments
Update contribute.rst
Signed-off-by: SmritiSatyanV <smriti@union.ai>

* Moving panel-and-toc image to static-resources repo and updating the url (flyteorg#671)

Co-authored-by: Alekhya Sai <alekhyasaip@gmail.com>
Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Update backend_plugins.py (flyteorg#653)

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Update fast_registration.py

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Update flyte_python_types.py iteration 1

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Remove flyte-cli references

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Remove fast_registration.py in favor of deploying_workflows.py

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Rewording a few sentences

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Add new line

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Remove references from lp_schedules.py

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Update instructions

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Update settings commnent

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Worked on review suggestions

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

* Add alternative option

Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com>

Co-authored-by: Yuvraj <code@evalsocket.dev>
Co-authored-by: SmritiSatyanV <94349093+SmritiSatyanV@users.noreply.github.com>
Co-authored-by: Kevin Su <pingsutw@apache.org>
Co-authored-by: Samhita Alla <aallasamhita@gmail.com>
Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com>
hamersaw added a commit that referenced this issue Feb 18, 2025
Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants