Support moving state to persistent storage #173

sjpb · 2022-04-21T12:34:19Z

Allows defining a hostvar appliances_state_dir to put control node state onto persistent storage.
Changes the default TF + group_vars to put /home on persistent storage (for all except control node)

Ticket: https://stackhpc.atlassian.net/browse/DEV-834

In the skeleton TF (as used for arcus):

volumes home and state are created and attached to the control node
cloud-init userdata is used to partition/format/mount thes to /var/lib/state and /exports/home on the control node
the inventory template defines appliances_state_dir for the control group as /var/lib/state. NB this must be defined on the group, not the host, so that the Packer builds for control images also get this set.

Currently the persistent state on the slurm control node covers:

slurmctld state
MySQL database for slurmdbd
Prometheus database
Opendistro data
Grafana data

It also adds documentation for this feature, and modifies the block_devices docs to explain why this isn't the right way to use volumes.

CI (`arcus` environment)

This now:

defines volumes on the control node, as described above
runs hpctests before reimaging the cluster
runs the slurm.yml playbook after the reimage to regenerate partition information (this is excluded from the control image build info)
checks that the job info for the hpctests runs is still present after the reimage:
a) fromsacct, which checks mysql state has persisted and slurmdbd restarted
b) from opendistro proxied through grafana, which checks the opendistro state has been persisted and the datasource works

Manual checks

I have (manually) checked that after a reimage as above:

hpctests works
FAILED - OOD shell works - TODO: RECHECK now /home is persistent
Monitoring works

Caveats

ansible/slurm.yml must be rerun after reimaging the control node to redefine partition information
The default nfs config now assumes /exports/home exists, which assumes the default TF (or similar) is used.
The new mysql role can't change the mysql root password after initialisation.

Requires/TODOs:

sjpb · 2022-04-21T13:01:19Z

AlaSKA disk usage is as follows:

15M /var/lib/podman/.local/share/containers/storage/volumes
1.5M /var/lib/grafana/
235M /var/lib/mysql/
4.5G /mnt/slurmctld # normally /var/spool/ somwehere - this is nearly all old copies of $HOME though from when volumes got switched.
4.1G /var/lib/prometheus/

Total (after accounting for $HOME problem): ~5GB

Last jobid is 500.

…e/offboard-state

Note before this commit the podman user was created on ALL nodes, now only on nodes in "podman" group

…efault prometheus storage limit

sjpb · 2022-07-14T11:29:29Z

Using this on a deployment, the recreation of a cluster fails with

TASK [geerlingguy.mysql : Disallow root login remotely] *****************************************************************************
failed: [dev-control] (item=DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')) => {
    "ansible_loop_var": "item",
    "changed": false,
    "cmd": [
        "mysql",
        "-NBe",
        "DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')"
    ],
    "delta": "0:00:00.004478",
    "end": "2022-07-14 11:24:54.025773",
    "item": "DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')",
    "rc": 1,
    "start": "2022-07-14 11:24:54.021295"
}

STDERR:

ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

sjpb · 2022-07-14T15:14:46Z

Despite the CI there also appears to be a problem with 9893c35, where the firewalld install task fails with a directory error on /home/rocky, which should be /var/lib/rocky.

sjpb · 2022-07-14T15:30:35Z

9893c35 also seems problematic in a deployment with this:

TASK [Add users] 
<snip ok users>
failed: [dev-control] (item={'group': {'name': 'prometheus', 'gid': 976}, 'user': {'name': 'prometheus', 'uid': 981, 'home': '/var/lib/prometheus', 'shell': '/usr/sbin/nologin'}, 'enable': True}) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": {
        "enable": true,
        "group": {
            "gid": 976,
            "name": "prometheus"
        },
        "user": {
            "home": "/var/lib/prometheus",
            "name": "prometheus",
            "shell": "/usr/sbin/nologin",
            "uid": 981
        }
    },
    "name": "prometheus",
    "rc": 8
}

MSG:

usermod: user prometheus is currently used by process 1359

[rocky@dev-control ~]$ id prometheus
uid=981(prometheus) gid=976(prometheus) groups=976(prometheus)
[rocky@dev-control ~]$ ps -p 1359
    PID TTY          TIME CMD
   1359 ?        00:00:00 prometheus

[rocky@dev-control ~]$ grep prometheus /etc/passwd 
prometheus:x:981:976::/var/lib/state/prometheus:/usr/sbin/nologin

Problem is:

prometheus_db_dir: "{{ appliances_state_dir | default('/var/lib') }}/prometheus"

with

ansible/roles/cloudalchemy.prometheus/tasks/install.yml:
- name: create prometheus system user
  user:
    name: prometheus
    system: true
    shell: "/usr/sbin/nologin"
    group: prometheus
    createhome: false
    home: "{{ prometheus_db_dir }}"

conflicts with

environments/common/inventory/group_vars/all/defaults.yml
    - group:
        name: prometheus
        gid: 976
      user:
        name: prometheus
        uid: 981
        home: /var/lib/prometheus   # <-- doesn't include /state/

ansible/ci/check_sacct_hpctests.yml

ansible/roles/mysql/README.md

ansible/roles/mysql/tasks/main.yml

ansible/roles/opendistro/templates/opendistro.service.j2

ansible/roles/block_devices/README.md

environments/skeleton/{{cookiecutter.environment}}/terraform/control.userdata.tpl

m-bull · 2022-09-05T11:08:50Z

environments/skeleton/{{cookiecutter.environment}}/terraform/control.userdata.tpl

+  /dev/vdb:
+    table_type: gpt
+    layout: true
+  /dev/vdc:


How can we control for device names here? /dev/vda vs /dev/sda etc - these are properties of the SCSI device I think...

You can't really. It is supposed to depend on the flags set on the image, so setting --property hw_scsi_model=virtio-scsi should I think give you sd* although in my test it didn't. For the use of this TF I think it's ok - in CI we control both the device name here and the image tags. If you're using this TF in your own deployment you may need to change it but that goes for anything

sjpb · 2022-09-06T16:24:34Z

TODO: do I need to delay nfs server start till after /home mounted from /etc/exports?

m-bull · 2022-09-06T16:57:46Z

TODO: do I need to delay nfs server start till after /home mounted from /etc/exports?

This might be useful: https://www.freedesktop.org/software/systemd/man/systemd.mount.html#x-systemd.wanted-by=

…ding services

m-bull

I think this looks okay once that UUID/LABEL docs change is complete

support moving state to persistent storage

53449cd

sjpb requested a review from jovial April 21, 2022 12:36

update openhpc role to fix offboard state

51baf3b

sjpb mentioned this pull request Apr 21, 2022

Improve openhpc_state_save_location templating stackhpc/ansible-role-openhpc#135

Merged

sjpb added 9 commits April 28, 2022 15:01

set podman volume storage path (rather than moving podman /home/rocky)

a328906

use volume for appliances_state_dir in smslabs

596eb14

supply defaults for packer build options

3ad5e83

omit block device configuration in packer build

c996976

Merge branch 'main' into feature/offboard-state

08251f2

use cloud-init rather than block_devices to create state directory

aaddf02

Merge branch 'fix/packer-build' into feature/offboard-state

0a6f189

document problem with block_devices in packer build

06bff56

bump openhpc role version after merging PRs

98846bd

sjpb mentioned this pull request May 11, 2022

Enable build of environment-specific control images #160

Merged

4 tasks

sjpb added 3 commits May 18, 2022 10:40

Merge 'main' (with smslabs+arcus CI and common terraform) into featur…

7181d99

…e/offboard-state

Merge branch 'main' into feature/offboard-state

c4b165c

Merge branch 'main' into feature/offboard-state

72d6fd0

sjpb mentioned this pull request Jul 6, 2022

Users are not always the same uid/gid on state directory after rebuild #183

Closed

sjpb added 2 commits July 6, 2022 14:49

Create users with fixed uids/gids to own all persistent state ...

9893c35

Note before this commit the podman user was created on ALL nodes, now only on nodes in "podman" group

Run hpctests before reimage and check sacct data survives reimage

73208b5

sjpb closed this Jul 6, 2022

sjpb reopened this Jul 6, 2022

sjpb added 3 commits July 6, 2022 16:23

add variable for state volume size & bump default to be larger than d…

6140e26

…efault prometheus storage limit

add docs for appliances_state_dir and default TF

38f62a5

make sacct check no-change

c8d6ef9

sjpb requested a review from m-bull July 7, 2022 18:13

sjpb added 5 commits August 31, 2022 14:43

remove now-uneeded mysql user

19ff7c2

add volume for /etc/exports -> /home in default TF/config

4457328

add missing mysql users default

5ec59e4

add mysql readme

ea8fb48

increase retries when waiting for mysql initialisation

c51ebdf

m-bull requested changes Sep 5, 2022

View reviewed changes

sjpb added 3 commits September 5, 2022 10:33

improve mysql readme

93a28a6

rename mysql:mysql_enabled to mysql:mysql_systemd_service_enabled

3122250

make opendistro systemd unitfile more readable

82ea5f4

m-bull reviewed Sep 5, 2022

View reviewed changes

sjpb added 4 commits September 5, 2022 11:10

remove partitions from default TF volumes

18703e3

remove mysql_*_login_details variables

0f164cf

update block_devices README

1768d76

don't try to start mysql and create users/dbs during packer build

0448b2d

sjpb requested a review from m-bull September 6, 2022 16:19

sjpb added 10 commits September 7, 2022 09:46

require /exports/home mounted before starting nfs-server, in default TF

7e12add

don't NFS-mount /home on control, in defaults

3f8b3b6

ensure podman available for mysql

34a222d

try to fix /exports/home mount

38a66c8

unhackfiy systemd unit modifications for appliances_state_dir

a41a59a

cleanup docs for persistent state

2f455b9

allow multiple persistent systemd services per role

4121939

link hosts running systemd unit adjustments to placement of correspon…

4d51467

…ding services

make systemd role more generic

5351c64

move systemd unit modifications to start of site.yml

7b206fc

m-bull approved these changes Sep 12, 2022

View reviewed changes

sjpb merged commit be6eafc into main Sep 13, 2022

sjpb mentioned this pull request Sep 13, 2022

Create role to manage /etc/hosts and use in arcus #209

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support moving state to persistent storage #173

Support moving state to persistent storage #173

sjpb commented Apr 21, 2022 •

edited

Loading

sjpb commented Apr 21, 2022 •

edited

Loading

sjpb commented Jul 14, 2022

sjpb commented Jul 14, 2022

sjpb commented Jul 14, 2022 •

edited

Loading

m-bull Sep 5, 2022

sjpb Sep 5, 2022

sjpb commented Sep 6, 2022

m-bull commented Sep 6, 2022

m-bull left a comment

Support moving state to persistent storage #173

Support moving state to persistent storage #173

Conversation

sjpb commented Apr 21, 2022 • edited Loading

CI (arcus environment)

Manual checks

Caveats

Requires/TODOs:

sjpb commented Apr 21, 2022 • edited Loading

sjpb commented Jul 14, 2022

sjpb commented Jul 14, 2022

sjpb commented Jul 14, 2022 • edited Loading

m-bull Sep 5, 2022

Choose a reason for hiding this comment

sjpb Sep 5, 2022

Choose a reason for hiding this comment

sjpb commented Sep 6, 2022

m-bull commented Sep 6, 2022

m-bull left a comment

Choose a reason for hiding this comment

sjpb commented Apr 21, 2022 •

edited

Loading

CI (`arcus` environment)

sjpb commented Apr 21, 2022 •

edited

Loading

sjpb commented Jul 14, 2022 •

edited

Loading