-
-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Basic Linstor deployment #20
base: main
Are you sure you want to change the base?
Conversation
Exposes an instance_names variable to allow users to more easily change the number of instances for the deployment. Signed-off-by: Luís Simas <luissimas@protonmail.com>
Adds a new playbook for deploying linstor. The playbook installs the needed packages for Linstor and the underlying storage utilities. A storage pool is created on each node using the storage driver and disks specified in the Ansible inventory. Signed-off-by: Luís Simas <luissimas@protonmail.com>
Disables secureboot on instances to allow loading the DRBD kernel modules when deploying Linstor. Signed-off-by: Luís Simas <luissimas@protonmail.com>
Adds a new disk for instances to be consumed by Linstor. Signed-off-by: Luís Simas <luissimas@protonmail.com>
Adds values to the Linstor configuration variables in the sample inventory. Signed-off-by: Luís Simas <luissimas@protonmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job! Just some questions
@@ -0,0 +1,147 @@ | |||
--- | |||
- name: Linstor - Add package repository | |||
hosts: all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use group declaration in hosts.yaml
, that way we avoid checking if the host is into "linstor_roles", making the tasks way simpler. Here[1] in docs has examples. Above the children
group in hosts.yaml
, should have a Linstor hosts declaration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not currently using groups so probably best to just use the linstor_roles
behavior for consistency now.
There is some planned work to re-shuffle things quite a bit and use Ansible roles and other constructs, so that will come in later.
|
||
- name: Parse storage pools | ||
set_fact: | ||
satellites_without_storage_pools: >- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary to check if a satellite has no storage pool? Should nothing occur if the host already has a storage pool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to make the creation of the storage pool idempotent and ensure that we don't try to create it again when running the playbook multiple times. To be honest I don't remember if the linstor physical-storage create-device-pool
already takes care of that for us, I'll check tomorrow and come back with the results.
With that said, I think the current approach is quite naive and doesn't take factors like pre existing storage pools into account. I basically replicated the logic for adding the satellite nodes, but that is a simpler problem to solve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried removing the check and running the playbook twice and we indeed get an error. So the check is needed to make the storage pool creation idempotent.
TASK [Create storage pool] ******************************************************************************************************************************************************************************************
skipping: [server03] => (item=server01)
skipping: [server03] => (item=server02)
skipping: [server03] => (item=server03)
skipping: [server03]
skipping: [server02] => (item=server01)
skipping: [server02] => (item=server02)
skipping: [server02] => (item=server03)
skipping: [server02]
failed: [server01] (item=server01) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server01 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.276529", "end": "2025-01-22 14:25:53.877272", "item": "server01", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:53.600743", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m (Node: 'server01') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m (Node: 'server01') Zpool name already used."]}
failed: [server01] (item=server02) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server02 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.273696", "end": "2025-01-22 14:25:54.892492", "item": "server02", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:54.618796", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m (Node: 'server02') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m (Node: 'server02') Zpool name already used."]}
failed: [server01] (item=server03) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server03 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.298110", "end": "2025-01-22 14:25:55.994142", "item": "server03", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:55.696032", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m (Node: 'server03') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m (Node: 'server03') Zpool name already used."]}
NO MORE HOSTS LEFT **************************************************************************************************************************************************************************************************
PLAY RECAP **********************************************************************************************************************************************************************************************************
server01 : ok=18 changed=0 unreachable=0 failed=1 skipped=6 rescued=0 ignored=0
server02 : ok=12 changed=0 unreachable=0 failed=0 skipped=13 rescued=0 ignored=0
server03 : ok=12 changed=0 unreachable=0 failed=0 skipped=13 rescued=0 ignored=0
Adds a Linstor playbook with the bare minimum for a development setup. This is needed to support the development of the Linstor integration on Incus, tracked by in lxc/incus#564. The idea of this PR is to both make this automation available for developers working in the feature as well as creating a common place for discussing the integration of Linstor into incus-deploy. I wouldn't consider this production-ready, and I left some notes on things that I think could be improved.
For a production-ready setup we'd probably also want to setup SSL for encrypting both controller<->satellite as well as incus<->controller traffic. It's also worth noting that I'm using the
linstor physical-storage create-device-pool
to create the underlying storage setup (VGs or zPools). While this makes it easy to support the use of LVM and ZFS on the playbook with no extra logic, it does not expose many options for configuring the underlying storage. Ideally we'd want to create the VGs and zPools manually, which would give the user the ability to configure them as needed through extra variables in the playbook.The resulting Linstor deployment is the following:
We can then create a Resource Group that will eventually be consumed by Incus and spawn volumes from it. In this example I'm specifying
--place-count 2
, which means that Linstor will create two physical replicas and one diskless replica to reach quorum (a TieBreaker).Testing the setup
I've tested the deployment with the following setup. For testing purposes, I reduced the total number of servers to 3, removed Ceph from the deployment and assigned the two Ceph OSD disks on each machine to Linstor instead.
terraform/terraform.tfvars
ansible/hosts.yaml