In this repository, I have made a simple Airflow dag to learn the fundamentals of working with Airflow. I Installed airflow on EC2 - Ubuntu and configured airflow to work with MySQL backend
DAG has the following features:
- File sensor on S3 bucket
- XCOM variable
- Variables passed from Admin Screen
- Branching
- create a
.aws
folder in home directory - create a file called
credentials
- add the following information in the file
[default] aws_access_key_id=[AWS_ACCESS_KEY] aws_secret_access_key=[AWS_SECRET_KEY]
In the airflow console webpage
- Click on ADMIN
- Click on Variables
- Add your variables
- In my demo, I have added variables called s3_bucket, s3_file, s3_file_trigger
- pull them into the program using
from airflow.models import DAG, Variable s3_bucket = Variable.get("s3_bucket") s3_file = Variable.get("s3_file") s3_file_trigger = Variable.get("s3-file-trigger")
By default, airflow backend comes with a Sqlite database. Sqlite db works well for small tests and poc projects but for prodution we need a more robust database. Also, Sqlite db does not support parallel processing, we will need to change the database to unlock parallel processing capabilites
sudo apt-get update
sudo apt install mysql-server
perform the security setup as required/needed
I have created a new user called airflow-user
which I will use to access the database
I also created a database called airflow
If we come across an error -
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
while executing airflow initdb
command then we will need to set
set global explicit_defaults_for_timestamp = 1;
on our instance of mysql
refer here for more details about the error
to configure airflow to use our mysql database we will need to install the drivers/modules so that python can connect with the database
NOTE:if you are using python3, we will need to install mysqlclient
instead of MySqldb
as MySQLdb is not supported in Python3.x
click here for reference
# we will need to install prerequsities first
sudo apt-get install libmysqlclient-dev
# install mysql client
sudo pip3 install mysqlclient
we will need to change the connection strings in the airflow.cfg file
sql_alchemy_conn = mysql://airflow-user:your-password@localhost:3306/airflow
executor = LocalExecutor
references: