Skip to content

Demonstration of a simple ELT pipeline with Sling and Dagster

Notifications You must be signed in to change notification settings

sidataplus/demo-dagster-sling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Demo Dagster Sling

In this demo, we will create a simple ELT pipeline using Sling and Dagster.

We will extract data from SQLite database of OMOP CDM Data called Eunomia store in data/src, and load data into Parquet files at data/tgt with Sling. Then we will use Dagster to schedule the pipeline and monitor the pipeline runs.

We will follow this Dagster Documentation: Embedded ELT to create the pipeline. See detailed step-by-step instructions in how-to.md.

Why Dagster & Sling

  • Sling is a simple EL tool that can be used as CLI or Python wrapper.
  • Dagster is an asset-based orchrestation tool, while Airflow is a task-based tool. Read more

Why not

  • Slink is still new, albeit version 1 is available. Watch for updates on GitHub.
  • Dagster has role-based access control (RBAC) but limited to Dagster Cloud version. Airflow has RBAC.

What's next

This demo uses SQLite for simplicity. Sling is able to connect to many other databases such as MS SQL Server, PostgreSQL, Oracle and major cloud DB. See Sling documentation for more details.

From the Parquet file, we can load it into a data lake or, even better, a data lakehouse. Parquet files can be handled and managed by Apache Iceberg via Dremio or Apache Spark.

About

Demonstration of a simple ELT pipeline with Sling and Dagster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages