Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Basic Linstor deployment #20

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
147 changes: 147 additions & 0 deletions ansible/books/linstor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
---
- name: Linstor - Add package repository
hosts: all

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use group declaration in hosts.yaml, that way we avoid checking if the host is into "linstor_roles", making the tasks way simpler. Here[1] in docs has examples. Above the children group in hosts.yaml, should have a Linstor hosts declaration.

[1] https://docs.ansible.com/ansible/latest/inventory_guide/intro_inventory.html#hosts-in-multiple-groups

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not currently using groups so probably best to just use the linstor_roles behavior for consistency now.

There is some planned work to re-shuffle things quite a bit and use Ansible roles and other constructs, so that will come in later.

order: shuffle
vars:
task_roles: "{{ linstor_roles | default([]) }}"
any_errors_fatal: true
tasks:
# NOTE: this is a workaround for adding the Linbit PPA. Maybe we could add
# the source file directly like we do with OVN to bypass the need for this
# extra package.
- name: Install gnupg
apt:
name:
- gnupg
state: present
when: 'task_roles|length > 0'

- name: Add repository
apt_repository:
repo: ppa:linbit/linbit-drbd9-stack
state: present
notify: Update apt
when: 'task_roles|length > 0'

handlers:
- name: Update apt
apt:
force_apt_get: yes
update_cache: yes
cache_valid_time: 0

- name: Linstor - Install packages
hosts: all
order: shuffle
vars:
task_roles: "{{ linstor_roles | default([]) }}"
any_errors_fatal: true
tasks:
- name: Install linstor-satellite
apt:
name:
- lvm2
- zfsutils-linux
- drbd-dkms
- drbd-utils
- linstor-satellite
state: present
when: '"satellite" in task_roles'

- name: Install linstor-controller
apt:
name:
- linstor-controller
- linstor-client
state: present
when: '"controller" in task_roles'

- name: Linstor - Enable services
hosts: all
order: shuffle
vars:
task_roles: "{{ linstor_roles | default([]) }}"
any_errors_fatal: true
tasks:
- name: Enable linstor-satellite
systemd:
service: linstor-satellite
state: started
enabled: true
when: '"satellite" in task_roles'

- name: Enable linstor-controller
systemd:
service: linstor-controller
state: started
enabled: true
when: '"controller" in task_roles'

- name: Linstor - Add satellite nodes
hosts: all
order: shuffle
vars:
task_roles: "{{ linstor_roles | default([]) }}"
any_errors_fatal: true
tasks:
- name: List satellite nodes
shell: linstor --machine-readable node list
register: satellite_nodes_output
changed_when: false
when: '"controller" in task_roles'

- name: Parse satellite node names
set_fact:
existing_satellite_nodes: "{{ satellite_nodes_output.stdout | from_json | json_query('[].name') }}"
when: '"controller" in task_roles'

- name: Add satellite nodes
shell: linstor node create {{ item }} {{ hostvars[item].ansible_facts.default_ipv4.address }} --node-type satellite
register: create_node_output
loop: "{{ groups['all'] }}"
when: '("controller" in task_roles) and ("satellite" in hostvars[item]["linstor_roles"]) and (item not in existing_satellite_nodes)'
changed_when: "create_node_output.rc == 0"

- name: Linstor - Create storage pools
hosts: all
order: shuffle
vars:
task_roles: "{{ linstor_roles | default([]) }}"
pool_name: "{{ linstor_pool_name | default('incus') }}"
provider_kind: "{{ linstor_pool_driver | default('lvmthin') }}"
task_disks: "{{ linstor_disks | default([]) | map('regex_replace', '^((?!/dev/disk/by-id/).*)$', '/dev/disk/by-id/\\1') | list }}"
any_errors_fatal: true
tasks:
- name: Gather all satellite hosts
set_fact:
satellite_hosts: >-
{{ groups['all']
| map('extract', hostvars)
| selectattr('linstor_roles', 'defined')
| selectattr('linstor_roles', 'contains', 'satellite')
| map(attribute='inventory_hostname')
| list }}

- name: List storage pools
shell: linstor --machine-readable storage-pool list
register: storage_pool_output
changed_when: false
when: '"controller" in task_roles'

- name: Parse storage pools
set_fact:
satellites_without_storage_pools: >-

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to check if a satellite has no storage pool? Should nothing occur if the host already has a storage pool?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is to make the creation of the storage pool idempotent and ensure that we don't try to create it again when running the playbook multiple times. To be honest I don't remember if the linstor physical-storage create-device-pool already takes care of that for us, I'll check tomorrow and come back with the results.

With that said, I think the current approach is quite naive and doesn't take factors like pre existing storage pools into account. I basically replicated the logic for adding the satellite nodes, but that is a simpler problem to solve.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried removing the check and running the playbook twice and we indeed get an error. So the check is needed to make the storage pool creation idempotent.

TASK [Create storage pool] ******************************************************************************************************************************************************************************************
skipping: [server03] => (item=server01)
skipping: [server03] => (item=server02)
skipping: [server03] => (item=server03)
skipping: [server03]
skipping: [server02] => (item=server01)
skipping: [server02] => (item=server02)
skipping: [server02] => (item=server03)
skipping: [server02]
failed: [server01] (item=server01) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server01 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.276529", "end": "2025-01-22 14:25:53.877272", "item": "server01", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:53.600743", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m    (Node: 'server01') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m    (Node: 'server01') Zpool name already used."]}
failed: [server01] (item=server02) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server02 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.273696", "end": "2025-01-22 14:25:54.892492", "item": "server02", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:54.618796", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m    (Node: 'server02') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m    (Node: 'server02') Zpool name already used."]}
failed: [server01] (item=server03) => {"ansible_loop_var": "item", "changed": false, "cmd": "linstor physical-storage create-device-pool --storage-pool incus --pool-name linstor-incus zfsthin server03 /dev/disk/by-id/nvme-QEMU_NVMe_Ctrl_incus_disk5", "delta": "0:00:00.298110", "end": "2025-01-22 14:25:55.994142", "item": "server03", "msg": "non-zero return code", "rc": 10, "start": "2025-01-22 14:25:55.696032", "stderr": "", "stderr_lines": [], "stdout": "\u001b[1;31mERROR:\n\u001b[0m    (Node: 'server03') Zpool name already used.", "stdout_lines": ["\u001b[1;31mERROR:", "\u001b[0m    (Node: 'server03') Zpool name already used."]}

NO MORE HOSTS LEFT **************************************************************************************************************************************************************************************************

PLAY RECAP **********************************************************************************************************************************************************************************************************
server01                   : ok=18   changed=0    unreachable=0    failed=1    skipped=6    rescued=0    ignored=0
server02                   : ok=12   changed=0    unreachable=0    failed=0    skipped=13   rescued=0    ignored=0
server03                   : ok=12   changed=0    unreachable=0    failed=0    skipped=13   rescued=0    ignored=0

{{
satellite_hosts | difference(
storage_pool_output.stdout | from_json | json_query('[0][?provider_kind!=`DISKLESS`].node_name') | unique
)
}}
changed_when: false
when: '"controller" in task_roles'

- name: Create storage pool
shell: linstor physical-storage create-device-pool --storage-pool {{ pool_name }} --pool-name linstor-{{ pool_name }} {{ provider_kind }} {{ item }} {{ task_disks | join(' ') }}
register: create_storage_pool_output
loop: "{{ groups['all'] }}"
when: '("controller" in task_roles) and ("satellite" in hostvars[item]["linstor_roles"]) and (item in satellites_without_storage_pools)'
changed_when: "create_storage_pool_output.rc == 0"
1 change: 1 addition & 0 deletions ansible/deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
- import_playbook: books/environment.yaml
- import_playbook: books/nvme.yaml
- import_playbook: books/ceph.yaml
- import_playbook: books/linstor.yaml
- import_playbook: books/lvmcluster.yaml
- import_playbook: books/ovn.yaml
- import_playbook: books/incus.yaml
Expand Down
34 changes: 34 additions & 0 deletions ansible/hosts.yaml.example
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@ all:
ovn_name: "baremetal"
ovn_az_name: "zone1"
ovn_release: "ppa"

linstor_pool_name: "incus"
linstor_pool_driver: "lvmthin"
children:
baremetal:
vars:
Expand Down Expand Up @@ -60,6 +63,11 @@ all:
local_config:
source: "incus_{{ incus_name }}"
description: Distributed storage pool (cluster-wide)
linstor:
driver: linstor
local_config:
source: "{{ linstor_pool_name }}"
description: Linstor storage pool (cluster-wide)
shared:
driver: lvmcluster
local_config:
Expand Down Expand Up @@ -93,6 +101,12 @@ all:
ovn_roles:
- central
- host

linstor_roles:
- controller
- satellite
linstor_disks:
- nvme-QEMU_NVMe_Ctrl_incus_disk5
server02:
ceph_disks:
- data: nvme-QEMU_NVMe_Ctrl_incus_disk1
Expand All @@ -107,6 +121,11 @@ all:
ovn_roles:
- central
- host

linstor_roles:
- satellite
linstor_disks:
- nvme-QEMU_NVMe_Ctrl_incus_disk5
server03:
ceph_disks:
- data: nvme-QEMU_NVMe_Ctrl_incus_disk1
Expand All @@ -121,6 +140,11 @@ all:
ovn_roles:
- central
- host

linstor_roles:
- satellite
linstor_disks:
- nvme-QEMU_NVMe_Ctrl_incus_disk5
server04:
ceph_disks:
- data: nvme-QEMU_NVMe_Ctrl_incus_disk1
Expand All @@ -129,6 +153,11 @@ all:
- client
- osd
- rgw

linstor_roles:
- satellite
linstor_disks:
- nvme-QEMU_NVMe_Ctrl_incus_disk3
server05:
ceph_disks:
- data: nvme-QEMU_NVMe_Ctrl_incus_disk1
Expand All @@ -137,3 +166,8 @@ all:
- client
- osd
- rgw

linstor_roles:
- satellite
linstor_disks:
- nvme-QEMU_NVMe_Ctrl_incus_disk5
27 changes: 27 additions & 0 deletions terraform/baremetal-incus/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,9 @@ resource "incus_profile" "this" {
config = {
"limits.cpu" = "4"
"limits.memory" = var.memory
# NOTE: this is needed to allow loading the DRBD kernel module
# on instances for Linstor storage
"security.secureboot" = false
}

device {
Expand Down Expand Up @@ -127,6 +130,19 @@ resource "incus_storage_volume" "disk4" {
}
}

resource "incus_storage_volume" "disk5" {
for_each = var.instance_names

project = incus_project.this.name
name = "${each.value}-disk5"
description = "Linstor drive"
pool = var.storage_pool
content_type = "block"
config = {
"size" = "50GiB"
}
}

resource "incus_instance" "instances" {
for_each = var.instance_names

Expand Down Expand Up @@ -169,6 +185,17 @@ resource "incus_instance" "instances" {
}
}

device {
type = "disk"
name = "disk5"

properties = {
"pool" = var.storage_pool
"io.bus" = "nvme"
"source" = incus_storage_volume.disk5[each.key].name
}
}

lifecycle {
ignore_changes = [running]
}
Expand Down
2 changes: 1 addition & 1 deletion terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ module "baremetal" {
source = "./baremetal-incus"

project_name = "dev-incus-deploy"
instance_names = ["server01", "server02", "server03", "server04", "server05"]
instance_names = var.instance_names
image = "images:ubuntu/22.04"
memory = "4GiB"

Expand Down
5 changes: 5 additions & 0 deletions terraform/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,8 @@ variable "ovn_uplink_ipv6_address" {
type = string
default = "fd00:1e4d:637d:1234::1/64"
}

variable "instance_names" {
type = list(string)
default = ["server01", "server02", "server03", "server04", "server05"]
}
Loading