Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support moving state to persistent storage #173

Merged
merged 70 commits into from
Sep 13, 2022
Merged

Conversation

sjpb
Copy link
Collaborator

@sjpb sjpb commented Apr 21, 2022

  • Allows defining a hostvar appliances_state_dir to put control node state onto persistent storage.
  • Changes the default TF + group_vars to put /home on persistent storage (for all except control node)

Ticket: https://stackhpc.atlassian.net/browse/DEV-834

In the skeleton TF (as used for arcus):

  • volumes home and state are created and attached to the control node
  • cloud-init userdata is used to partition/format/mount thes to /var/lib/state and /exports/home on the control node
  • the inventory template defines appliances_state_dir for the control group as /var/lib/state. NB this must be defined on the group, not the host, so that the Packer builds for control images also get this set.

Currently the persistent state on the slurm control node covers:

  • slurmctld state
  • MySQL database for slurmdbd
  • Prometheus database
  • Opendistro data
  • Grafana data

It also adds documentation for this feature, and modifies the block_devices docs to explain why this isn't the right way to use volumes.

CI (arcus environment)

This now:

  • defines volumes on the control node, as described above
  • runs hpctests before reimaging the cluster
  • runs the slurm.yml playbook after the reimage to regenerate partition information (this is excluded from the control image build info)
  • checks that the job info for the hpctests runs is still present after the reimage:
    a) fromsacct, which checks mysql state has persisted and slurmdbd restarted
    b) from opendistro proxied through grafana, which checks the opendistro state has been persisted and the datasource works

Manual checks

I have (manually) checked that after a reimage as above:

  • hpctests works
  • FAILED - OOD shell works - TODO: RECHECK now /home is persistent
  • Monitoring works

Caveats

  • ansible/slurm.yml must be rerun after reimaging the control node to redefine partition information
  • The default nfs config now assumes /exports/home exists, which assumes the default TF (or similar) is used.
  • The new mysql role can't change the mysql root password after initialisation.

Requires/TODOs:

@sjpb sjpb requested a review from jovial April 21, 2022 12:36
@sjpb
Copy link
Collaborator Author

sjpb commented Apr 21, 2022

AlaSKA disk usage is as follows:

  • 15M /var/lib/podman/.local/share/containers/storage/volumes
  • 1.5M /var/lib/grafana/
  • 235M /var/lib/mysql/
  • 4.5G /mnt/slurmctld # normally /var/spool/ somwehere - this is nearly all old copies of $HOME though from when volumes got switched.
  • 4.1G /var/lib/prometheus/

Total (after accounting for $HOME problem): ~5GB

Last jobid is 500.

sjpb added 2 commits July 6, 2022 14:49
Note before this commit the podman user was created on ALL nodes, now only on nodes in "podman" group
@sjpb sjpb closed this Jul 6, 2022
@sjpb sjpb reopened this Jul 6, 2022
@sjpb sjpb requested a review from m-bull July 7, 2022 18:13
@sjpb
Copy link
Collaborator Author

sjpb commented Jul 14, 2022

Using this on a deployment, the recreation of a cluster fails with

TASK [geerlingguy.mysql : Disallow root login remotely] *****************************************************************************
failed: [dev-control] (item=DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')) => {
    "ansible_loop_var": "item",
    "changed": false,
    "cmd": [
        "mysql",
        "-NBe",
        "DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')"
    ],
    "delta": "0:00:00.004478",
    "end": "2022-07-14 11:24:54.025773",
    "item": "DELETE FROM mysql.user WHERE User='root' AND Host NOT IN ('localhost', '127.0.0.1', '::1')",
    "rc": 1,
    "start": "2022-07-14 11:24:54.021295"
}

STDERR:

ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

@sjpb
Copy link
Collaborator Author

sjpb commented Jul 14, 2022

Despite the CI there also appears to be a problem with 9893c35, where the firewalld install task fails with a directory error on /home/rocky, which should be /var/lib/rocky.

@sjpb
Copy link
Collaborator Author

sjpb commented Jul 14, 2022

9893c35 also seems problematic in a deployment with this:

TASK [Add users] 
<snip ok users>
failed: [dev-control] (item={'group': {'name': 'prometheus', 'gid': 976}, 'user': {'name': 'prometheus', 'uid': 981, 'home': '/var/lib/prometheus', 'shell': '/usr/sbin/nologin'}, 'enable': True}) => {
    "ansible_loop_var": "item",
    "changed": false,
    "item": {
        "enable": true,
        "group": {
            "gid": 976,
            "name": "prometheus"
        },
        "user": {
            "home": "/var/lib/prometheus",
            "name": "prometheus",
            "shell": "/usr/sbin/nologin",
            "uid": 981
        }
    },
    "name": "prometheus",
    "rc": 8
}

MSG:

usermod: user prometheus is currently used by process 1359
[rocky@dev-control ~]$ id prometheus
uid=981(prometheus) gid=976(prometheus) groups=976(prometheus)
[rocky@dev-control ~]$ ps -p 1359
    PID TTY          TIME CMD
   1359 ?        00:00:00 prometheus

[rocky@dev-control ~]$ grep prometheus /etc/passwd 
prometheus:x:981:976::/var/lib/state/prometheus:/usr/sbin/nologin

Problem is:

prometheus_db_dir: "{{ appliances_state_dir | default('/var/lib') }}/prometheus"

with

ansible/roles/cloudalchemy.prometheus/tasks/install.yml:
- name: create prometheus system user
  user:
    name: prometheus
    system: true
    shell: "/usr/sbin/nologin"
    group: prometheus
    createhome: false
    home: "{{ prometheus_db_dir }}"

conflicts with

environments/common/inventory/group_vars/all/defaults.yml
    - group:
        name: prometheus
        gid: 976
      user:
        name: prometheus
        uid: 981
        home: /var/lib/prometheus   # <-- doesn't include /state/

ansible/ci/check_sacct_hpctests.yml Show resolved Hide resolved
ansible/roles/mysql/README.md Outdated Show resolved Hide resolved
ansible/roles/mysql/README.md Show resolved Hide resolved
ansible/roles/mysql/README.md Outdated Show resolved Hide resolved
ansible/roles/mysql/README.md Outdated Show resolved Hide resolved
ansible/roles/mysql/tasks/main.yml Outdated Show resolved Hide resolved
ansible/roles/opendistro/templates/opendistro.service.j2 Outdated Show resolved Hide resolved
ansible/roles/block_devices/README.md Show resolved Hide resolved
/dev/vdb:
table_type: gpt
layout: true
/dev/vdc:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can we control for device names here? /dev/vda vs /dev/sda etc - these are properties of the SCSI device I think...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't really. It is supposed to depend on the flags set on the image, so setting --property hw_scsi_model=virtio-scsi should I think give you sd* although in my test it didn't. For the use of this TF I think it's ok - in CI we control both the device name here and the image tags. If you're using this TF in your own deployment you may need to change it but that goes for anything

@sjpb sjpb requested a review from m-bull September 6, 2022 16:19
@sjpb
Copy link
Collaborator Author

sjpb commented Sep 6, 2022

TODO: do I need to delay nfs server start till after /home mounted from /etc/exports?

@m-bull
Copy link
Collaborator

m-bull commented Sep 6, 2022

TODO: do I need to delay nfs server start till after /home mounted from /etc/exports?

This might be useful: https://www.freedesktop.org/software/systemd/man/systemd.mount.html#x-systemd.wanted-by=

Copy link
Collaborator

@m-bull m-bull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks okay once that UUID/LABEL docs change is complete

@sjpb sjpb merged commit be6eafc into main Sep 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants