This repository serves as a template for orchestrating Prefect and DBT within Docker containers. It is tailored as a starting point for data engineering projects, leveraging Prefect for workflow orchestration and DBT for data transformation. In this setup, DBT is configured to work with Google BigQuery, but the template can be easily adapted to other databases.
The project is designed for local development and single-server deployment but can be extended for Kubernetes cluster deployment if required.
- Docker
- Docker Compose
- Python 3.9 with conda/miniconda (for local development)
- Google Cloud Platform (GCP) account
/dbt
- DBT project folder/prefect
- Prefect project folder/prefect/flows/src/<subfolder>
- Prefect flow scriptsdocker-compose.yml
- Docker Compose file for running Prefect and DBT in developmentdocker-compose-prod.yml
- Docker Compose file for running Prefect and DBT in productionMakefile
- Utility commands for running the project
-
Clone this repository:
git clone https://github.com/david-mcbacon/prefect-dbt-docker-template.git cd prefect-dbt-docker-template
-
Create a new Google Cloud Platform project.
-
Enable the BigQuery API in GCP.
-
Create a new service account with the BigQuery Admin role.
-
Download the service account key:
- Save it as
bq-credentials.json
in the root of the project. - Copy the same file to the
dbt
folder.
- Save it as
-
Create environment files:
- Create
.env
,dev-worker.env
, andprod.env
in the root of the project. - Use the corresponding
.example
files as templates.
- Create
-
Configure DBT:
- Create
profiles.yaml
in thedbt
folder usingprofiles.example.yml
as a template.
- Create
-
Set up the Conda environment and install dependencies:
conda create -n prefect-dbt python=3.9 -y conda activate prefect-dbt pip install -r prefect/requirements.txt
-
Run the project locally:
make start-app-local
-
Access the Prefect UI:
- Open your browser and go to http://localhost:4200.
-
Deploy pre-built flows to Prefect:
make deploy-local
-
Verify deployed flows:
- Go to http://localhost:4200/deployments and confirm that your flow is deployed. You can run it manually or schedule it.
-
Ensure Prefect server and agent are running locally:
make start-app-local
-
Activate the Conda environment:
conda activate prefect-dbt
-
Set the Prefect API URL environment variable:
export PREFECT_API_URL=http://0.0.0.0:4200/api/
-
Run the flow:
python prefect/flows/src/pokemon/pokemon_elt.py
-
Activate the Conda environment:
conda activate prefect-dbt
-
Navigate to the
dbt
folder:cd dbt
-
Check DBT configuration and connection (optional):
dbt debug
-
If everything is configured correctly, you should see:
Connection test: [OK connection ok] All checks passed!
-
-
Run DBT models:
dbt run -s pokemons.pokemons
DBT uses the profiles.yaml
file for database connection configuration. Create this file in the dbt
folder, using profiles.example.yml
as a template. The template includes both dev
and prod
configurations. The default target is dev
, but you can switch to prod
by changing the target
parameter in profiles.yaml
or directly in the DBT command.
To run the production configuration:
dbt run -s pokemons.pokemons --target prod
Development vs. Production Configurations:
- Development (
dev
target): DBT uses thedev_
prefix for schema/dataset names. - Production (
prod
target): No prefix is used; the dataset name remains as defined.
For example, if you run the dev
target, the table will be created as dev_pokemons.pokemons
. For prod
, it will be pokemons.pokemons
.
Schema naming is configured in the dbt_project.yml
file:
models:
app:
pokemons:
dataset: "{{ 'pokemons' if target.name == 'prod' else 'dev_pokemons' }}"
Additionally, the generate_schema_name.sql
macro in the macros
folder automatically generates the schema name based on the target.
Recommended Setup:
- Local development: Use the
dev
target inprofiles.yaml
. - Production: Use the
prod
target inprofiles.yaml
on the server.
DBT includes a built-in documentation UI. To run it locally:
cd dbt
dbt docs generate
dbt docs serve --port 8081
Then open http://localhost:8081 in your browser.
-
Ensure Prefect server and agent are running locally:
make start-app-local
-
Create a new flow script:
- Add the script in the
prefect/flows/src/pokemon
folder (e.g.,new_flow.py
). - Define the main flow function (e.g.,
run_new_flow
).
- Add the script in the
-
Register the new flow:
- Modify the
deployment_dev.py
file:
from src.pokemon.new_flow import run_new_flow new_flow_dep = Deployment.build_from_flow( name="New Pokemon Flow", flow=run_new_flow, storage=minio_block.load("minio"), infrastructure=worker_infrastructure, work_queue_name="default", tags=["pokemon", "new_flow"], apply=True, )
- Modify the
-
Deploy the flow:
make deploy-local
-
Verify deployment:
- Check the Prefect UI at http://localhost:4200/deployments to confirm the new flow is deployed.
-
Set up SSH access to your VPS server.
-
(Optional but recommended) Configure Nginx as a reverse proxy with authentification for Prefect UI.
-
Clone the repository to the VPS server.
-
Prepare environment files:
- Ensure
.env
andprod.env
are correctly configured on your local machine. - Set
SSH_USER
,SSH_HOST
, andSSH_PATH_TO_PROJECT
environment variables.
- Ensure
-
Copy environment file to the server:
make copy-env-to-server
- This command let you select the environment file to copy to the server and rename it to
.env
. Selectprod.env
for production.
- This command let you select the environment file to copy to the server and rename it to
-
Prepare necessary files on the server:
- Copy
bq-credentials.json
andprofiles.yaml
to the server in the appropriate locations.
- Copy
-
Start Prefect on the server:
make start-app-prod
-
Ensure Prefect server and agent are running on the server:
make start-app-prod
-
Create and register the new flow:
- Similar to local deployment, add the new flow to the
deployment_prod.py
file.
- Similar to local deployment, add the new flow to the
-
Commit and push changes to the GitHub repository
-
Deploy to production:
-
Option 1: From your local machine:
make deploy-prod-from-local
-
Option 2: SSH into the server, navigate to the project folder, and run:
make git make deploy-prod
-
-
Verify deployment:
- Check the Prefect UI at
http://your-domain.com/deployments
to confirm the new flow is deployed.
- Check the Prefect UI at
-
Local Development:
- Start with pure Python (without Docker or Prefect), develop and test functions, then create a main function for the script.
-
Integrate with Prefect:
- Add
@task
decorators to functions, and@flow
to the main function (renamed asrun_<filename>
, e.g.run_pokemon_elt
). - Add Prefect logging (optional).
- Add
-
Run Prefect Locally:
- Start Prefect server locally and test the script from the terminal.
-
Deploy and Test Locally:
- Register the script in
deployment_dev.py
and deploy locally. - Check the Prefect UI and run the flow manually from the UI.
- Register the script in
-
Production Deployment:
- Register the script in
deployment_prod.py
, push changes to the repository, and deploy to production.
- Register the script in
-
Deploy from Local to Production:
make deploy-prod-from-local