DAGs can be mounted by using a ConfigMap
or git-sync
. This is best illustrated with an example of each, shown in the sections below.
link:example$example-configmap.yaml[role=include]
link:example$example-airflow-dags-configmap.yaml[role=include]
-
The name of the configuration map
-
The name of the DAG (this is a renamed copy of the
example_bash_operator.py
from the Airflow examples) -
The volume backed by the configuration map
-
The name of the configuration map referenced by the Airflow cluster
-
The name of the mounted volume
-
The path of the mounted resource. Note that should map to a single DAG.
-
The resource has to be defined using
subPath
: this is to prevent the versioning of configuration map elements which may cause a conflict with how Airflow propagates DAGs between its components. -
If the mount path described above is anything other than the standard location (the default is
$AIRFLOW_HOME/dags
), then the location should be defined using the relevant environment variable.
Warning
|
If a DAG mounted via ConfigMap consists of modularized files then using the standard location is mandatory as python will use this as a "root" folder when looking for referenced files. |
The advantage of this approach is that a DAG can be provided "in-line", as it were. This becomes cumbersome when multiple DAGs are to be made available in this way, as each one has to be mapped individually. For multiple DAGs it is probably easier to expose them all via a mounted volume, which is shown below.
git-sync is a command that pulls a git repository into a local directory and is supplied as a sidecar container for use within Kubernetes. The Stackable implementation is a wrapper around this such that the binary and image requirements are included in the Stackable Airflow product images and do not need to be specified or handled in the AirflowCluster
custom resource. Internal details such as image names and volume mounts are handled by the operator, so that only the repository and synchronization details are required. An example of this usage is given in the next section.
link:example$example-airflow-gitsync.yaml[role=include]
-
A
Secret
used for accessing database and admin user details (included here to illustrate where different credential secrets are defined) -
The git-gync configuration block that contains list of git-sync elements
-
The repository that will be cloned (required)
-
The branch name (defaults to
main
) -
The location of the DAG folder, relative to the synced repository root (required)
-
The depth of syncing i.e. the number of commits to clone (defaults to 1)
-
The synchronisation interval in seconds (defaults to 20 seconds)
-
The name of the
Secret
used to access the repository if it is not public. This should include two fields:user
andpassword
(which can be either a password - which is not recommended - or a github token, as described here) -
A map of optional configuration settings that are listed in this configuration section (and the ones that follow on that link)
-
An example showing how to specify a target revision (the default is HEAD). The revision can also be a tag or a commit, though this assumes that the target hash is contained within the number of commits specified by
depth
. If a tag or commit hash is specified, then git-sync will recognise that and not perform further cloning. -
Git-sync settings can be provided inline, although some of these (
--dest
,--root
) are specified internally in the operator and will be ignored if provided by the user. Git-config settings can also be specified, although a warning will be logged ifsafe.directory
is specified as this is defined internally, and should not be defined by the user.
Important
|
The example above shows a *list* of git-sync definitions, with a single element. This is to avoid breaking-changes in future releases. Currently, only one such git-sync definition is considered and processed. |
Note
|
git-sync can be used with DAGs that make use of Python modules, as Python will be configured to use the git-sync target folder as the "root" location when looking for referenced files. See the usage-guide/applying-custom-resources.adoc example for more details. |