Open-sourced set of Airflow operators. They can be set up on your Airflow installation by placing them in its plugins directory and restarting the Airflow processes. Tested with Airflow 1.10.2, Python 3.6 and DC/OS 1.13.7.
Featured on our blog post: Operationalization of Machine Learning Pipelines on Apache Mesos and Hadoop using Airflow
For deploying Metronome jobs on your D2iQ DC/OS cluster (formerly known as Mesosphere DC/OS).
The operator takes a Metronome job definition as a JSON parameter and works as depicted in the figure below:
First, it gets an authorization token from the IAM REST API to be able to deploy containers on the DC/OS cluster. Second, it checks whether the job exists via the Metronome REST API. Depending on the response, it upserts the job definition. Then it starts the job. Finally, it polls for the job status and, depending on the response, changes the Airflow operator’s status to SUCCESSFUL or FAILED.
MetronomeOperator(
task_id="metronome_operator",
metronome_job_json=YOUR_METRONOME_JOB_DEFINITION,
dcos_http_conn_id="dcos_master",
dcos_robot_user_name=Variable.get("robot_user_name_dcos"),
dcos_robot_user_pwd=Variable.get("robot_user_pwd_dcos"),
dag=dag,
pool="markdown_metronome_job_pool",
retries=3)
Modification of the existing Airflow HdfsSensor by replacing the HDFSHook with a new one, which uses a Python 3 compatible HDFS client - hdfscli.
Requires the presence of hdfscli in Airflow's virtual environment.
NYHDFSSensor(
task_id="training_data_sensor",
filepaths=["/data/production/2020-05-13/_SUCCESS"],
hdfs_conn_id="hdfs_conn_default",
retries=1440,
retry_delay=timedelta(minutes=1),
timeout=0,
dag=dag)