Merge pull request #1 from InformaticsMatters/m2ms-1486

Maintenance release
InformaticsMatters · Sep 21, 2024 · eca489e · eca489e
2 parents a285db1 + c77313a
commit eca489e
Show file tree

Hide file tree

Showing 9 changed files with 98 additions and 47 deletions.
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,5 @@
 .idea/
 venv/
 
-parameters.yaml
+parameters*.yaml
 vault-pass.txt
diff --git a/README.md b/README.md
@@ -7,6 +7,13 @@
 Ansible playbooks for the Kubernetes-based execution of [fragmentor]
 **Playbooks**.
 
+This repository's `site-player` play launches a _player_ Pod in Kubernetes your
+Kubernetes cluster. The player Pod can run each stage of our fragmentation process.
+The player understands how to run the `standardise`, `fragment`, `inchi`, and `extract`
+playbooks (in our [fragmentor] repository) by _injecting_ your parameter, kubeconfig
+and nextflow files into the player, which then runs the fragmentor playbook
+you name.
+
 Before you attempt to execute any fragmentation plays...
 
 1.  You will need a Kubernetes cluster with a ReadWriteMany storage class
@@ -38,6 +45,9 @@ Before you attempt to execute any fragmentation plays...
     fragmentation/graph data.
 11. You will need your Kubernetes config file.
 12. You will need AWS credentials (that allow for bucket access).
+13. You will need to be able to run `kubectl` from the command-line
+    as the player expects to use it to obtain the cluster host and its IP.
+    So ensure that `KUBECONFIG` is set appropriately.
 
 ## Kubernetes namespace setup
 You can conveniently create the required namespace and database using our
@@ -49,11 +59,11 @@ You can conveniently create the required namespace and database using our
 
 Start from the project root of a clone of the repository: -
 
-    $ python -m venv venv
+    python -m venv venv
 
-    $ source venv/bin/activate
-    $ pip install --upgrade pip
-    $ pip install -r requirements.txt
+    source venv/bin/activate
+    pip install --upgrade pip
+    pip install -r requirements.txt
 
 ...and create the database and corresponding namespace using an Ansible
 YAML-based parameter file. Here's an example that should work for 'small'
@@ -84,13 +94,13 @@ pg_mem_limit: 4Gi
 
 You will need to set a few Kubernetes variables...
 
-    $ export K8S_AUTH_HOST=https://example.com
-    $ export K8S_AUTH_API_KEY=1234
-    $ export K8S_AUTH_VERIFY_SSL=no
+    export K8S_AUTH_HOST=https://example.com
+    export K8S_AUTH_API_KEY=1234
+    export K8S_AUTH_VERIFY_SSL=no
 
 Then run the playbook...
 
-    $ ansible-playbook site.yaml -e @parameters.yaml
+    ansible-playbook site.yaml -e @parameters.yaml
     [...]
 
 ## Running a fragmentor play
@@ -115,30 +125,30 @@ To run a play you must set a set of play-specific parameters in the local file
 
 Start from a virtual environment: -
 
-    $ python -m venv venv
+    python -m venv venv
 
-    $ source venv/bin/activate
-    $ pip install --upgrade pip
-    $ pip install -r requirements.txt
+    source venv/bin/activate
+    pip install --upgrade pip
+    pip install -r requirements.txt
 
 As always, set a few key environment parameters: -
 
-    $ export K8S_AUTH_HOST=https://example.com
-    $ export K8S_AUTH_API_KEY=?????
-    $ export K8S_AUTH_VERIFY_SSL=no
+    export K8S_AUTH_HOST=https://example.com
+    export K8S_AUTH_API_KEY=?????
+    export K8S_AUTH_VERIFY_SSL=no
 
-    $ export KUBECONFIG=~/.kube/config
+    export KUBECONFIG=~/.kube/config
 
 For access to AWS S3: -
 
-    $ export AWS_ACCESS_KEY_ID=?????
-    $ export AWS_SECRET_ACCESS_KEY=?????
+    export AWS_ACCESS_KEY_ID=?????
+    export AWS_SECRET_ACCESS_KEY=?????
 
 You _name_ the play to run using our playbook's `fp_play` variable.
 In this example we're running the *database reset* play and setting
 the storage class to `nfs`: -
 
-    $ ansible-playbook site-player.yaml \
+    ansible-playbook site-player.yaml \
         -e fp_play=db-server-configure_create-database \
         -e fp_work_volume_storageclass=nfs
 
@@ -179,8 +189,7 @@ extracts:
     regenerate_index: yes
 hardware:
   production:
-    parallel_jobs: 8
-    cluster_cores: 8
+    parallel_jobs: 360
     sort_memory: 4GB
     postgres_jobs: 8
 ```
@@ -192,32 +201,32 @@ hardware:
     with key records.
 
 ```
-    $ ansible-playbook site-player.yaml \
+    ansible-playbook site-player.yaml \
         -e fp_play=db-server-configure_create-database
 ```
 
 -   **Standardise**
 
 ```
-    $ ansible-playbook site-player.yaml -e fp_play=standardise
+    ansible-playbook site-player.yaml -e fp_play=standardise
 ```
 
 -   **Fragment**
 
 ```
-    $ ansible-playbook site-player.yaml -e fp_play=fragment
+    ansible-playbook site-player.yaml -e fp_play=fragment
 ```
 
 -   **InChi**
 
 ```
-    $ ansible-playbook site-player.yaml -e fp_play=inchi
+    ansible-playbook site-player.yaml -e fp_play=inchi
 ```
 
 -   **Extract** (a dataset to graph CSV files)
 
 ```
-    $ ansible-playbook site-player.yaml -e fp_play=extract
+    ansible-playbook site-player.yaml -e fp_play=extract
 ```
 
 -   **Combine** (multiple datasets into graph CSV files)
@@ -258,15 +267,15 @@ hardware:
 ```
 
 ```
-    $ ansible-playbook site-player.yaml -e fp_play=combine
+    ansible-playbook site-player.yaml -e fp_play=combine
 ```
 
 ## A convenient player query playbook
 If you don't have visual access to the cluster you can run
 the following playbook, which summarises the phase of the currently executing
 play. It will tell you if the current play is still running.
 
-    $ ansible-playbook site-player_query.yaml
+    ansible-playbook site-player_query.yaml
 
 It finishes with a summary message like this: -
 
@@ -282,7 +291,7 @@ ok: [localhost] => {
 If the player is failing, and you want to kill it, and the Job that
 launched it, you can run the kill-player playbook: -
 
-    $ ansible-playbook site-player_kill-player.yaml
+    ansible-playbook site-player_kill-player.yaml
 
 ---
 

diff --git a/requirements.txt b/requirements.txt
@@ -1,4 +1,5 @@
 ansible == 8.7.0
+dnspython == 2.6.1
 jmespath == 1.0.1
 kubernetes == 23.6.0
 openshift == 0.13.2
diff --git a/roles/player/defaults/main.yaml b/roles/player/defaults/main.yaml
@@ -8,8 +8,8 @@
 fp_play: SetMe
 
 # The user's kubernetes configuration file.
-# The user must set KUBECONFIG - we do not assume ~/.kube/config
-fp_kubeconfig_file: "{{ lookup('env', 'KUBECONFIG') }}"
+# The user must define this variable - we no-longer rely on KUBECONFIG.
+fp_kubeconfig_file: SetMe
 
 # The namespace that is expected to exist.
 fp_namespace: fragmentor
@@ -31,7 +31,7 @@ fp_parameter_file: parameters.yaml
 # Details of the fragmentation player container image
 fp_image_registry: ''
 fp_image_name: informaticsmatters/fragmentor-player
-fp_image_tag: '1.1.0'
+fp_image_tag: '1.2.0'
 
 # The nextflow version to run.
 # The player image generally contains the 'latest' nextflow version.
@@ -46,8 +46,10 @@ fp_image_tag: '1.1.0'
 #
 # See https://github.com/nextflow-io/nextflow/issues/1902
 fp_nextflow_version: '21.02.0-edge'
-# And the Nextflow queue size
-fp_nextflow_queue_size: 100
+# And the Nextflow executor queue size
+fp_nextflow_executor_queue_size: 100
+# And Pod pull policy
+fp_nextflow_pod_image_pull_policy: 'IfNotPresent'
 
 # A pull-secret for public images pulled from DockerHub.
 # If set this is the base-64 string that can be used as the value

diff --git a/roles/player/tasks/deploy.yaml b/roles/player/tasks/deploy.yaml
@@ -11,8 +11,8 @@
 
 - name: Assert queue size
   assert:
-    that: fp_nextflow_queue_size|int > 0
-    fail_msg: You must set a sensible 'fp_nextflow_queue_size'
+    that: fp_nextflow_executor_queue_size|int > 0
+    fail_msg: You must set a sensible 'fp_nextflow_executor_queue_size'
 
 # Assert the Kubernetes config has been named and exists
 

diff --git a/roles/player/tasks/main.yaml b/roles/player/tasks/main.yaml
@@ -3,6 +3,10 @@
 - name: Include prep
   include_tasks: prep.yaml
 
+- name: Load parameters from {{ fp_parameter_file }}
+  include_vars:
+    file: "{{ fp_parameter_file }}"
+
 # A kubernetes host and an API key must be set.
 # Either environment variables wil have been set by the user
 # or AWX 'kubernetes' credentials will have injected them.
@@ -14,6 +18,34 @@
     - k8s_auth_host|length > 0
     - k8s_auth_api_key|length > 0
 
+- name: Assert kubeconfig file is named
+  assert:
+    that:
+    - fp_kubeconfig_file|length > 0
+    - fp_kubeconfig_file!='SetMe'
+
+# Discover the hostname (an IP address) of the kubernetes cluster
+# control plane. We do this to set a host alias in the Player Pod
+# to avoid the need for a DNS lookup (something that may be unreliable on
+# the chosen cluster)
+
+- name: Run kubectl (to get the host)
+  command: kubectl config view --minify --output 'jsonpath={.clusters[0].cluster.server}'
+  register: k8s_host
+  changed_when: false
+
+- name: Extract k8s hostname
+  set_fact:
+    k8s_hostname: "{{ k8s_host.stdout_lines[0] | urlsplit('hostname') }}"
+
+- name: Use Python's 'dig' to get the IP address
+  set_fact:
+    k8s_ip: "{{ lookup('dig', k8s_hostname) }}"
+
+- name: Display k8s hostname and address
+  debug:
+    msg: k8s_hostname={{ k8s_hostname }} k8s_ip={{ k8s_ip }}
+
 # Go...
 
 # There is no 'undeploy' fragmentation is a 'Job'

diff --git a/roles/player/templates/configmap-nextflow-config.yaml.j2 b/roles/player/templates/configmap-nextflow-config.yaml.j2
@@ -11,15 +11,17 @@ metadata:
 data:
   config: |
     process {
+      pod = [
+        nodeSelector: 'informaticsmatters.com/purpose-fragmentor=yes',
 {% if all_image_preset_pullsecret_name|string|length > 0 %}
-      pod = [nodeSelector: 'informaticsmatters.com/purpose-fragmentor=yes', imagePullSecret: '{{ all_image_preset_pullsecret_name }}']
-{% else %}
-      pod = [nodeSelector: 'informaticsmatters.com/purpose-fragmentor=yes']
+        imagePullSecret: '{{ all_image_preset_pullsecret_name }}',
 {% endif %}
+        imagePullPolicy: '{{ fp_nextflow_pod_image_pull_policy }}'
+      ]
     }
     executor {
         name = 'k8s'
-        queueSize = {{ fp_nextflow_queue_size }}
+        queueSize = {{ fp_nextflow_executor_queue_size }}
     }
     k8s {
         serviceAccount = 'fragmentor'

diff --git a/roles/player/templates/job.yaml.j2 b/roles/player/templates/job.yaml.j2
@@ -23,6 +23,13 @@ spec:
               matchExpressions:
               - key: informaticsmatters.com/purpose-fragmentor
                 operator: Exists
+      # A host alias for the Kubernetes API.
+      # This ensures the host (and th IP address we provide)
+      # go into the Pod's /etc/hosts file and permit bypassing of DNS.
+      hostAliases:
+      - ip: "{{ k8s_ip }}"
+        hostnames:
+        - "{{ k8s_hostname }}"
 
 {% if all_image_preset_pullsecret_name|string|length > 0 %}
       imagePullSecrets:
@@ -36,11 +43,7 @@ spec:
 {% else %}
         image: {{ fp_image_name }}:{{ fp_image_tag }}
 {% endif %}
-{% if fp_image_tag in ['latest', 'stable'] %}
         imagePullPolicy: Always
-{% else %}
-        imagePullPolicy: IfNotPresent
-{% endif %}
         # The default termination log (here for clarity)
         # But also fallback to stdout logs on error
         # if there is no termination log.

diff --git a/roles/player/vars/main.yaml b/roles/player/vars/main.yaml
@@ -18,8 +18,10 @@ fp_mem_limit: 4Gi
 # The home directory in the fragmentor 'player' pod
 fp_player_home: /root
 
-# How long to hold-on to the player Pod
-# (keep alive for post-run debug)
+# How long to hold-on to the player Pod.
+# If set the player Pod remains running for th defined period.
+# This gives you an opportunity to shell int to the Pod and inspect
+# the execution, essentially an ability to look around when the play has finished.
 fp_keep_alive_seconds: 0
 
 # General variables