Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Linstor deployment #20

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Basic Linstor deployment #20

wants to merge 5 commits into from

Conversation

luissimas
Copy link

Adds a Linstor playbook with the bare minimum for a development setup. This is needed to support the development of the Linstor integration on Incus, tracked by in lxc/incus#564. The idea of this PR is to both make this automation available for developers working in the feature as well as creating a common place for discussing the integration of Linstor into incus-deploy. I wouldn't consider this production-ready, and I left some notes on things that I think could be improved.

For a production-ready setup we'd probably also want to setup SSL for encrypting both controller<->satellite as well as incus<->controller traffic. It's also worth noting that I'm using the linstor physical-storage create-device-pool to create the underlying storage setup (VGs or zPools). While this makes it easy to support the use of LVM and ZFS on the playbook with no extra logic, it does not expose many options for configuring the underlying storage. Ideally we'd want to create the VGs and zPools manually, which would give the user the ability to configure them as needed through extra variables in the playbook.

The resulting Linstor deployment is the following:

root@server01:~# linstor node list
╭─────────────────────────────────────────────────────────────╮
┊ Node     ┊ NodeType  ┊ Addresses                   ┊ State  ┊
╞═════════════════════════════════════════════════════════════╡
┊ server01 ┊ SATELLITE ┊ 10.172.117.141:3366 (PLAIN) ┊ Online ┊
┊ server02 ┊ SATELLITE ┊ 10.172.117.171:3366 (PLAIN) ┊ Online ┊
┊ server03 ┊ SATELLITE ┊ 10.172.117.123:3366 (PLAIN) ┊ Online ┊
╰─────────────────────────────────────────────────────────────╯
root@server01:~# linstor sp list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ StoragePool          ┊ Node     ┊ Driver   ┊ PoolName                            ┊ FreeCapacity ┊ TotalCapacity ┊ CanSnapshots ┊ State ┊ SharedName                    ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ DfltDisklessStorPool ┊ server01 ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ server01;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server02 ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ server02;DfltDisklessStorPool ┊
┊ DfltDisklessStorPool ┊ server03 ┊ DISKLESS ┊                                     ┊              ┊               ┊ False        ┊ Ok    ┊ server03;DfltDisklessStorPool ┊
┊ incus                ┊ server01 ┊ LVM_THIN ┊ linstor_linstor-incus/linstor-incus ┊    39.91 GiB ┊     39.91 GiB ┊ True         ┊ Ok    ┊ server01;incus                ┊
┊ incus                ┊ server02 ┊ LVM_THIN ┊ linstor_linstor-incus/linstor-incus ┊    39.91 GiB ┊     39.91 GiB ┊ True         ┊ Ok    ┊ server02;incus                ┊
┊ incus                ┊ server03 ┊ LVM_THIN ┊ linstor_linstor-incus/linstor-incus ┊    39.91 GiB ┊     39.91 GiB ┊ True         ┊ Ok    ┊ server03;incus                ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

We can then create a Resource Group that will eventually be consumed by Incus and spawn volumes from it. In this example I'm specifying --place-count 2, which means that Linstor will create two physical replicas and one diskless replica to reach quorum (a TieBreaker).

root@server01:~# linstor rg create incus-volumes --storage-pool incus --place-count 2
...
root@server01:~# linstor rg spawn incus-volumes vol1 10G
root@server01:~# linstor resource list
╭────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName ┊ Node     ┊ Layers       ┊ Usage  ┊ Conns ┊      State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════╡
┊ vol1         ┊ server01 ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2025-01-18 17:59:31 ┊
┊ vol1         ┊ server02 ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊   UpToDate ┊ 2025-01-18 17:59:31 ┊
┊ vol1         ┊ server03 ┊ DRBD,STORAGE ┊ Unused ┊ Ok    ┊ TieBreaker ┊ 2025-01-18 17:59:29 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────╯
root@server01:~# linstor volume list
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Resource ┊ Node     ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ vol1     ┊ server01 ┊ incus                ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊  2.05 MiB ┊ Unused ┊   UpToDate ┊
┊ vol1     ┊ server02 ┊ incus                ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊  2.05 MiB ┊ Unused ┊   UpToDate ┊
┊ vol1     ┊ server03 ┊ DfltDisklessStorPool ┊     0 ┊    1000 ┊ /dev/drbd1000 ┊           ┊ Unused ┊ TieBreaker ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Testing the setup

I've tested the deployment with the following setup. For testing purposes, I reduced the total number of servers to 3, removed Ceph from the deployment and assigned the two Ceph OSD disks on each machine to Linstor instead.

terraform/terraform.tfvars

# Incus variables
incus_remote       = "local"    # Name of the Incus remote to deploy on (see `incus remote list`)
incus_storage_pool = "default"  # Name of the storage pool to use for the VMs and volumes
incus_network      = "incusbr0" # Name of the network to use for the VMs

# OVN uplink configuration
ovn_uplink_ipv4_address = "172.31.254.1/24"
ovn_uplink_ipv6_address = "fd00:1e4d:637d:1234::1/64"

instance_names = ["server01", "server02", "server03"]

ansible/hosts.yaml

all:
  vars:
    ceph_fsid: "e2850e1f-7aab-472e-b6b1-824e19a75071"
    ceph_rbd_cache: "2048Mi"
    ceph_rbd_cache_max: "1792Mi"
    ceph_rbd_cache_target: "1536Mi"

    incus_name: "baremetal"
    incus_release: "stable"

    lvmcluster_name: "baremetal"

    ovn_name: "baremetal"
    ovn_az_name: "zone1"
    ovn_release: "ppa"

    linstor_pool_name: "incus"
    linstor_pool_driver: "lvmthin"
  children:
    baremetal:
      vars:
        ansible_connection: incus
        ansible_incus_remote: local
        ansible_user: root
        ansible_become: no
        ansible_incus_project: dev-incus-deploy

        incus_init:
          network:
            LOCAL:
              type: macvlan
              local_config:
                parent: enp5s0
              description: Directly attach to host networking
            UPLINK:
              type: physical
              config:
                ipv4.gateway: "172.31.254.1/24"
                ipv6.gateway: "fd00:1e4d:637d:1234::1/64"
                ipv4.ovn.ranges: "172.31.254.10-172.31.254.254"
                dns.nameservers: "1.1.1.1,1.0.0.1"
              local_config:
                parent: enp6s0
              description: Physical network for OVN routers
            default:
              type: ovn
              config:
                network: UPLINK
              default: true
              description: Initial OVN network
          storage:
            local:
              driver: zfs
              local_config:
                source: "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk3"
              description: Local storage pool
            # remote:
            #   driver: ceph
            #   local_config:
            #     source: "incus_{{ incus_name }}"
            #   description: Distributed storage pool (cluster-wide)
            shared:
              driver: lvmcluster
              local_config:
                lvm.vg_name: "vg0"
                source: "vg0"
              default: true
              description: Shared storage pool (cluster-wide)

        incus_roles:
          - cluster
          - ui

        lvmcluster_metadata_size: 100m
        lvmcluster_vgs:
          vg0: "/dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk4"

        ovn_roles:
          - host
      hosts:
        server01:
          linstor_disks:
            - nvme-QEMU_NVMe_Ctrl_incus_disk1
            - nvme-QEMU_NVMe_Ctrl_incus_disk2
          linstor_roles:
            - controller
            - satellite

          ovn_roles:
            - central
            - host
        server02:
          linstor_disks:
            - nvme-QEMU_NVMe_Ctrl_incus_disk1
            - nvme-QEMU_NVMe_Ctrl_incus_disk2
          linstor_roles:
            - satellite

          ovn_roles:
            - central
            - host
        server03:
          linstor_disks:
            - nvme-QEMU_NVMe_Ctrl_incus_disk1
            - nvme-QEMU_NVMe_Ctrl_incus_disk2
          linstor_roles:
            - satellite

          ovn_roles:
            - central
            - host

Exposes an instance_names variable to allow users to more easily change
the number of instances for the deployment.

Signed-off-by: Luís Simas <luissimas@protonmail.com>
Adds a new playbook for deploying linstor. The playbook installs the
needed packages for Linstor and the underlying storage utilities. A
storage pool is created on each node using the storage driver and disks
specified in the Ansible inventory.

Signed-off-by: Luís Simas <luissimas@protonmail.com>
Disables secureboot on instances to allow loading the DRBD kernel
modules when deploying Linstor.

Signed-off-by: Luís Simas <luissimas@protonmail.com>
Adds a new disk for instances to be consumed by Linstor.

Signed-off-by: Luís Simas <luissimas@protonmail.com>
Adds values to the Linstor configuration variables in the sample
inventory.

Signed-off-by: Luís Simas <luissimas@protonmail.com>
Copy link

@winiciusallan winiciusallan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Just some questions

@@ -0,0 +1,147 @@
---
- name: Linstor - Add package repository
hosts: all

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use group declaration in hosts.yaml, that way we avoid checking if the host is into "linstor_roles", making the tasks way simpler. Here[1] in docs has examples. Above the children group in hosts.yaml, should have a Linstor hosts declaration.

[1] https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html#hosts-in-multiple-groups

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not currently using groups so probably best to just use the linstor_roles behavior for consistency now.

There is some planned work to re-shuffle things quite a bit and use Ansible roles and other constructs, so that will come in later.


- name: Parse storage pools
set_fact:
satellites_without_storage_pools: >-

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check if a satellite has no storage pool? Should nothing occur if the host already has a storage pool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to make the creation of the storage pool idempotent and ensure that we don't try to create it again when running the playbook multiple times. To be honest I don't remember if the linstor physical-storage create-device-pool already takes care of that for us, I'll check tomorrow and come back with the results.

With that said, I think the current approach is quite naive and doesn't take factors like pre existing storage pools into account. I basically replicated the logic for adding the satellite nodes, but that is a simpler problem to solve.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing the check and running the playbook twice and we indeed get an error. So the check is needed to make the storage pool creation idempotent.

TASK [Create storage pool] ******************************************************************************************************************************************************************************************
skipping: [server03] => (item=server01)
skipping: [server03] => (item=server02)
skipping: [server03] => (item=server03)
skipping: [server03]
skipping: [server02] => (item=server01)
skipping: [server02] => (item=server02)
skipping: [server02] => (item=server03)
skipping: [server02]
failed: [server01] (item=server01) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server01 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.276529", "end": "2025-01-22 14:25:53.877272", "item": "server01", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:53.600743", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m    (Node: 'server01') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m    (Node: 'server01') Zpool name already used."]}
failed: [server01] (item=server02) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server02 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.273696", "end": "2025-01-22 14:25:54.892492", "item": "server02", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:54.618796", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m    (Node: 'server02') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m    (Node: 'server02') Zpool name already used."]}
failed: [server01] (item=server03) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server03 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.298110", "end": "2025-01-22 14:25:55.994142", "item": "server03", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:55.696032", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m    (Node: 'server03') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m    (Node: 'server03') Zpool name already used."]}

NO MORE HOSTS LEFT **************************************************************************************************************************************************************************************************

PLAY RECAP **********************************************************************************************************************************************************************************************************
server01                   : ok=18   changed=0    unreachable=0    failed=1    skipped=6    rescued=0    ignored=0
server02                   : ok=12   changed=0    unreachable=0    failed=0    skipped=13   rescued=0    ignored=0
server03                   : ok=12   changed=0    unreachable=0    failed=0    skipped=13   rescued=0    ignored=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants