In this demo, we will create a simple ELT pipeline using Sling and Dagster.
We will extract data from SQLite database of OMOP CDM Data called Eunomia store in data/src
, and load data into Parquet files at data/tgt
with Sling. Then we will use Dagster to schedule the pipeline and monitor the pipeline runs.
We will follow this Dagster Documentation: Embedded ELT to create the pipeline. See detailed step-by-step instructions in how-to.md.
- Sling is a simple EL tool that can be used as CLI or Python wrapper.
- Dagster is an asset-based orchrestation tool, while Airflow is a task-based tool. Read more
- Slink is still new, albeit version 1 is available. Watch for updates on GitHub.
- Dagster has role-based access control (RBAC) but limited to Dagster Cloud version. Airflow has RBAC.
This demo uses SQLite for simplicity. Sling is able to connect to many other databases such as MS SQL Server, PostgreSQL, Oracle and major cloud DB. See Sling documentation for more details.
From the Parquet file, we can load it into a data lake or, even better, a data lakehouse. Parquet files can be handled and managed by Apache Iceberg via Dremio or Apache Spark.