Skip to content

Latest commit

 

History

History
51 lines (39 loc) · 3.71 KB

mounting-dags.adoc

File metadata and controls

51 lines (39 loc) · 3.71 KB

Mounting DAGs

DAGs can be mounted by using a ConfigMap or a PersistentVolumeClaim. This is best illustrated with an example of each, shown in the next section.

via ConfigMap

link:example$example-configmap.yaml[role=include]
link:example$example-airflow-dags-configmap.yaml[role=include]
  1. The name of the configuration map

  2. The name of the DAG (this is a renamed copy of the example_bash_operator.py from the Airflow examples)

  3. The volume backed by the configuration map

  4. The name of the configuration map referenced by the Airflow cluster

  5. The name of the mounted volume

  6. The path of the mounted resource. Note that should map to a single DAG.

  7. The resource has to be defined using subPath: this is to prevent the versioning of configuration map elements which may cause a conflict with how Airflow propagates DAGs between its components.

  8. If the mount path described above is anything other than the standard location (the default is $AIRFLOW_HOME/dags), then the location should be defined using the relevant environment variable.

The advantage of this approach is that a DAG can be provided "in-line", as it were. This becomes cumbersome when multiple DAGs are to be made available in this way, as each one has to be mapped individually. For multiple DAGs it is probably easier to expose them all via a mounted volume, which is shown below.

via git-sync

Overview

git-sync is a command that pulls a git repository into a local directory and is supplied as a sidecar container for use within Kubernetes. The Stackable implementation is a wrapper around this such that the binary and image requirements are included in the Stackable Airflow product images and do not need to be specified or handled in the AirflowCluster custom resource. Internal details such as image names and volume mounts are handled by the operator, so that only the repository and synchronisation details are required. An example of this usage is given in the next section.

Example

link:example$example-airflow-gitsync.yaml[role=include]
  1. A Secret used for accessing database and admin user details (included here to illustrate where different credential secrets are defined)

  2. The git-gync configuration block that contains list of git-sync elements

  3. The repository that will be cloned (required)

  4. The branch name (defaults to main)

  5. The location of the DAG folder, relative to the synced repository root (required)

  6. The depth of syncing i.e. the number of commits to clone (defaults to 1)

  7. The synchronisation interval in seconds (defaults to 20 seconds)

  8. The name of the Secret used to access the repository if it is not public. This should include two fields: user and password (which can be either a password - which is not recommended - or a github token, as described here)

  9. A map of optional configuration settings that are listed in this configuration section (and the ones that follow on that link)

  10. An example showing how to specify a target revision (the default is HEAD). The revision can also a be tag or a commit, though this assumes that the target hash is contained within the number of commits specified by depth. If a tag or commit hash is specified, then git-sync will recognise that and not perform further cloning.

Important
The example above shows a *list* of git-sync definitions, with a single element. This is to avoid breaking-changes in future releases. Currently, only one such git-sync definition is considered and processed.