What is the difference with luigi? #660

pietermarsman · 2021-01-07T15:32:50Z

Hi there,

I'm considering mulitple tools to orchestrate a machine-learning training and deployment pipeline. Up until now I've been looking at Argo, Prefect, Luigi and Airflow extinsively.

I'm not sure what the added benefits of flyte are compared to e.g. luigi. Can you add more documentation about how flyte compares to other similiar workflow managers.

kumare3 · 2021-01-07T19:47:19Z

@pietermarsman thank you for the question, this is one of the things we will be adding soon. I will let the spotify folks @honnix / @kanterov / @narape - who are now using Flyte extensively answer in more detail. But, here is a quick summary IMO

Model:
Luigi is a python based workflow engine that has data flow as a first class citizen. The scheduler runs locally in the same procces as the workflow. Thus there is not distributed fault-tolerance or hosted experience possible.
Flyte is a specification based Workflow Engine. It borrows ideas from both Airflow and Luigi in terms of extensibility - you can easily add new extensions - like AirflowOperators. Also it is a dataflow based orchestrator, where it deeply understands the data flowing through the system. Big difference is, the workflows can be authored in any language and get converted to the common protobuf based specification and uploaded to the Flyte Service. From then on, the execution is done on a kubernetes cluster. You can visualize the dags and execute them from a UI or a CLI

Scale:
Flyte is a distributed scheduler and schedules the pipelines using variety of backend plugins - on k8s (pods, containers, spark jobs, tensorflow training etc), on other hosted services like EMR, databricks, AWS Batch etc. The scheduler is fault tolerant and resilient to machine crashes etc. At Lyft we run more than million pipelines a month and have more than 10k unique pipelines.

Versioning:
Flyte versions all the workflows and tasks (tasks are each individual execution unit in a Workflow, Workflow is the DAG). So you can go back in time to any execution and retrieve outputs from any execution and run it immediately (as long as the containers are available).
All inputs and outputs are also cached and can be retrieved from one central API using the various clients or UI

Match the developers workflow:
Users can write a single task, execute a single task, once happy combine multiple tasks into a workflow and then can have multiple schedules on a workflow. Developers can be notified on success / failure of a workflow/task.

Easy GitOps:
At Lyft for every new commit to a repo, we build all the workflows and tasks in that repository. Thus making is possible refer back to the code. This does not mean you have to build containers, for quick iteration we have a fast-register mode - that makes it possible to iterate on the code directly from the laptop in matter of seconds

Catalog Caching and Lineage tracking:
Every execution is recorded and intermediate steps recorded. Thus if the same execution - same version of code, identical inputs are observed (for deterministic algorithms), Flyte will re-use outputs from a previous execution. This makes it possible to fix bugs in a DAG, without having to redo all of the computation.

UI/CLI/SDKs:
Python SDK or JAVA SDK can be used to author workflows and tasks
Here are some examples of using the python SDK - https://flytecookbook.readthedocs.io/en/latest/auto_recipes/index.html
The UI automatically generates a form to handle all inputs, because we natively understand the data
you can also use the CLI to interact with all your workflows

Write your own plugins (Flytekit python or backend)
Flytekit allows you to simply write new operators in python that users can use. But, you can also extend FlyteBackend to add platform level capabilities - this makes it possible to write global extensions within a company, which can be deployed without having to fix the libraries etc

Backend
The scheduler and backend is all written in Golang

kumare3 · 2021-01-07T19:48:12Z

Also Please join our slack channel here and feel free to ping me and we can have a longer discussion

kumare3 · 2021-01-10T18:14:47Z

Also look at https://flytecookbook.readthedocs.io/en/latest/ to see the new python programming model

honnix · 2021-01-11T10:53:17Z

@kumare3 Thanks for answering.

Luigi is a python based workflow engine that has data flow as a first class citizen. The scheduler runs locally in the same procces as the workflow.

Small corrections here.

If data flow here refers to Google Dataflow, Luigi doesn't do anything special to that; if data flow refers to a generic concept, Luigi is task centric and workflow is merely a DAG of tasks that can be traversed/deduced at runtime, and Luigi doesn't have workflow as a physical entity/model.
Luigi has an optional global scheduler that ensures the same task (identified by name + input values) is not executed more than once simultaneously. There is some doc talking about this: https://luigi.readthedocs.io/en/latest/central_scheduler.html

And yes, Luigi runs all tasks deduced from an entry task in the same process (or on the same computer to be more precise, because multiple processes could be forked by Luigi to run tasks in parallel).

kumare3 · 2021-01-29T18:24:19Z

@pietermarsman does this help?

kumare3 · 2021-02-23T21:31:52Z

@pietermarsman I am closing this issufor now. Thank you. Please re-open if you want any more clarifications

* Update Boilerplate Signed-off-by: Flyte-Bot <admin@flyte.org> * fix lint Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * fix lint Signed-off-by: Samhita Alla <aallasamhita@gmail.com> * fix lint Signed-off-by: Samhita Alla <aallasamhita@gmail.com> Co-authored-by: flyte-bot <flyte-bot@users.noreply.github.com> Co-authored-by: Samhita Alla <aallasamhita@gmail.com>

Signed-off-by: Yuvraj <code@evalsocket.dev>

* Added reuse able workflow (flyteorg#660) Signed-off-by: Yuvraj <code@evalsocket.dev> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Add directive (flyteorg#663) Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Added links from Flytelab (flyteorg#652) * Added link from Flytelab Added weather forecasting application link Minor grammar fixes Signed-off-by: SmritiSatyanV <smriti@union.ai> * Created weather_forecast.rst Created rst file to add github repo to weather-forecasting, and blog Signed-off-by: SmritiSatyanV <smriti@union.ai> * Fixed errors-1 Signed-off-by: SmritiSatyanV <smriti@union.ai> * Updated weather_forecasting.rst Signed-off-by: SmritiSatyanV <smriti@union.ai> * Added flytelab and blog link Added description, and right links. Signed-off-by: SmritiSatyanV <smriti@union.ai> * Changes to tutorials.rst Placed the weather forecasting tab in a different position Signed-off-by: SmritiSatyanV <smriti@union.ai> * updated ml_training.rst Added description for ml_training file Signed-off-by: SmritiSatyanV <smriti@union.ai> * Changes based on review Signed-off-by: SmritiSatyanV <smriti@union.ai> * Changed weather forecasting drop down to flytelab Signed-off-by: SmritiSatyanV <smriti@union.ai> * Changes based on comments Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Add AWS Batch example (flyteorg#636) * Added aws batch example Signed-off-by: Kevin Su <pingsutw@apache.org> * Updated dependency Signed-off-by: Kevin Su <pingsutw@apache.org> * Update dockerfile Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * rerun tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * rerun tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * Fixed tests Signed-off-by: Kevin Su <pingsutw@apache.org> * address comment Signed-off-by: Kevin Su <pingsutw@apache.org> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Moving register files example to use flytectl Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Update fast serialization Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Apply suggestions from code review Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Apply suggestions from code review Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Updated index.rst (flyteorg#670) rephrased a sentence Signed-off-by: SmritiSatyanV <smriti@union.ai> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Minor updates (flyteorg#669) * Minor updates Grammar, and rendering fix Updates based on comments Update contribute.rst Signed-off-by: SmritiSatyanV <smriti@union.ai> * Moving panel-and-toc image to static-resources repo and updating the url (flyteorg#671) Co-authored-by: Alekhya Sai <alekhyasaip@gmail.com> Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Update backend_plugins.py (flyteorg#653) Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Update fast_registration.py Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Update flyte_python_types.py iteration 1 Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Remove flyte-cli references Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Remove fast_registration.py in favor of deploying_workflows.py Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Rewording a few sentences Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Add new line Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Remove references from lp_schedules.py Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Update instructions Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Update settings commnent Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Worked on review suggestions Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> * Add alternative option Signed-off-by: Alekhya Sai Punnamaraju <alekhyasaip@gmail.com> Co-authored-by: Yuvraj <code@evalsocket.dev> Co-authored-by: SmritiSatyanV <94349093+SmritiSatyanV@users.noreply.github.com> Co-authored-by: Kevin Su <pingsutw@apache.org> Co-authored-by: Samhita Alla <aallasamhita@gmail.com> Co-authored-by: Niels Bantilan <niels.bantilan@gmail.com>

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

kumare3 added documentation Improvements or additions to documentation question Further information is requested labels Feb 8, 2021

kumare3 closed this as completed Feb 23, 2021

eapolinario pushed a commit to eapolinario/flyte that referenced this issue Dec 20, 2022

Added reuse able workflow (flyteorg#660)

f962bfc

Signed-off-by: Yuvraj <code@evalsocket.dev>

hamersaw added a commit that referenced this issue Feb 18, 2025

update action/cache@v3 (#660)

7c71fbe

Signed-off-by: Daniel Rammer <hamersaw@protonmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the difference with luigi? #660

What is the difference with luigi? #660

pietermarsman commented Jan 7, 2021

kumare3 commented Jan 7, 2021

kumare3 commented Jan 7, 2021

kumare3 commented Jan 10, 2021

honnix commented Jan 11, 2021

kumare3 commented Jan 29, 2021

kumare3 commented Feb 23, 2021

What is the difference with luigi? #660

What is the difference with luigi? #660

Comments

pietermarsman commented Jan 7, 2021

kumare3 commented Jan 7, 2021

kumare3 commented Jan 7, 2021

kumare3 commented Jan 10, 2021

honnix commented Jan 11, 2021

kumare3 commented Jan 29, 2021

kumare3 commented Feb 23, 2021