Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ceph: Add support for full disk encryption of disks deployed as part of the distributed storage #308

Merged
merged 5 commits into from
Jul 11, 2024

Conversation

gabrielmougard
Copy link
Contributor

@gabrielmougard gabrielmougard commented May 14, 2024

JIRA ticket: https://warthogs.atlassian.net/browse/LXD-910

We would like to be able to configure Ceph full disk encryption (FDE) as part of a MicroCloud deployment. The existing MicroCeph API already allows to pass an Encrypt boolean field to encrypt data at-rest at the disk level through the use of LUKS. Each generated keyring for the associated disk OSD is stored in the internel Ceph configuration folder.

We propose to introduce a new --encrypt flag for the microcloud init | add commands to encrypt all the disks selected to be part of the Ceph deployment. We also introduce a way to select the disk to encrypt just like how it is currently done with the disk wipe operation.

One important thing to note is that, Ceph will use dm_crypt under the hood to create a virtual encrypted device for each selected disk device. If choosing to encrypt a disk on a certain node, the user must ensure that the dm_crypt kernel module is enabled (it is by default enabled for LXD ubuntu:24.04 VM images : grep -i dm_crypt config-6.8.0-31-generic should output CONFIG_DM_CRYPT=m)

TODO

  • Update MicroCloud integration test suite

@gabrielmougard gabrielmougard marked this pull request as ready for review May 16, 2024 07:37
Copy link
Contributor

@ru-fu ru-fu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but we'll need the link to MicroCeph for enabling this.

doc/how-to/add_machine.rst Outdated Show resolved Hide resolved
@gabrielmougard
Copy link
Contributor Author

@ru-fu updated

@gabrielmougard gabrielmougard force-pushed the feat/microcloud-fde branch 2 times, most recently from 142d263 to 5b0c5c1 Compare May 22, 2024 12:46
@gabrielmougard
Copy link
Contributor Author

@simondeziel the issue shown here looks similar to the one we saw in Madrid (same failure in the test case) while we discovered a potential race condition in the MicroCloud test suite. But this time, this is part of the preseed test so I don't see why it could be related to a race. As per the interactive test failing, this is an other GH runner timeout in the interactive combination so I don't think we can do much here...

@simondeziel
Copy link
Member

But this time, this is part of the preseed test so I don't see why it could be related to a race.

@masnax, does the preseed case actually side-step the disk selection/table creation race?

@masnax
Copy link
Contributor

masnax commented May 22, 2024

But this time, this is part of the preseed test so I don't see why it could be related to a race.

@masnax, does the preseed case actually side-step the disk selection/table creation race?

Yes, it doesn't use the tables at all

@gabrielmougard
Copy link
Contributor Author

@simondeziel @masnax tests are ok.

Copy link
Contributor

@masnax masnax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few changes & questions

@@ -17,6 +17,7 @@ type cmdAdd struct {

flagAutoSetup bool
flagWipe bool
flagEncrypt bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this flag is necessary. It would only be useful for --auto but the whole point of --auto is that we are deciding for you. So I think it would suffice to either enforce encryption or enforce no encryption in this case.

In the interactive setup we will ask you questions, and the preseed setup has its own key for encryption, so I think it would be safe to drop this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@masnax Then, for the --auto mode, I would suggest to NOT enforce encryption as this is a quite strong choice that has deep implication on the storage. But I think it also really depends on the philosophy of MicroCloud: what should be part of a default MicroCloud cluster.. are we putting the accent on simplicity or security ? (not to say that both aspects are always mutually exclusive, but in this case I think the question is relevant)

There is maybe also a third solution that is more complex: encryption by default, when possible. Which mean that we'll need some system validation logic. For example, is dm_crypt loaded in each cluster member containing the to be encrypted disks? If yes, proceed with encryption, else skip with an information message. What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mionaalex what do you think we should do here?

test/includes/microcloud.sh Outdated Show resolved Hide resolved
@@ -904,7 +937,8 @@ create_system() {
exec > /dev/null
fi

lxc init ubuntu-minimal:22.04 "${name}" --vm -c limits.cpu=2 -c limits.memory=4GiB
# ubuntu-minimal:24.04 does not execute cloud-init to enable the READY status. Still working on finding why this happens...
lxc init ubuntu:24.04 "${name}" --vm -c limits.cpu=2 -c limits.memory=4GiB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this new? How are other PR tests working then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the -minimal vs non-minimal part but 24.04 was picked as it doesn't use the -kvm kernel which is lacking the dm-crypt module. With 24.04's kernel, all the necessary modules are easily loadable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed the change to 24.04.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm facing an issue with ubuntu-minimal:24.04 in regards to cloud-init: strangely, with the minimal flavor, the ready.sh script is not executed:

cat << EOF | lxc profile edit default
config:
  cloud-init.user-data: |
    #cloud-config
    write_files:
      - content: |
          #!/bin/sh
          exec curl --unix-socket /dev/lxd/sock lxd/1.0 -X PATCH -d '{"state": "Ready"}'
        path: /var/lib/cloud/scripts/per-boot/ready.sh
        permissions: "0755"
EOF

whereas it works fine with ubuntu:24.04

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrielmougard I cannot reproduce with ubuntu-minimal-daily:24.04 here:

lxc init ubuntu-minimal-daily:24.04 foo --vm -c limits.cpu=2 -c limits.memory=4GiB
cat << EOF | lxc config set foo cloud-init.user-data -
#cloud-config
write_files:
  - content: |
     #!/bin/sh
     exec curl --unix-socket /dev/lxd/sock lxd/1.0 -X PATCH -d '{"state": "Ready"}'
    path: /var/lib/cloud/scripts/per-boot/ready.sh
    permissions: "0755"
EOF
lxc start foo
$ lxc ls foo
+------+-------+-----------------------+------------------------------------------------+-----------------+-----------+
| NAME | STATE |         IPV4          |                      IPV6                      |      TYPE       | SNAPSHOTS |
+------+-------+-----------------------+------------------------------------------------+-----------------+-----------+
| foo  | READY | 172.24.26.31 (enp5s0) | 2001:470:b1c3:7946:216:3eff:fe0d:bae8 (enp5s0) | VIRTUAL-MACHINE | 0         |
+------+-------+-----------------------+------------------------------------------------+-----------------+-----------+

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ubuntu-minimal:24.04 worked for me too but since it's a much older image, I think we should switch to ubuntu-minimal-daily:24.04 if that also works for you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you give it another try with ubuntu-minimal-daily:24.04, please? It was marked as fixed but maybe some rebase lost the change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gabrielmougard how did it go with ubuntu-minimal-daily:24.04?

Copy link
Contributor Author

@gabrielmougard gabrielmougard Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simondeziel I can reproduce:

lxc init ubuntu-minimal-daily:24.04 foo --vm -c limits.cpu=2 -c limits.memory=4GiB
cat << EOF | lxc config set foo cloud-init.user-data -
#cloud-config
write_files:
  - content: |
     #!/bin/sh
     exec curl --unix-socket /dev/lxd/sock lxd/1.0 -X PATCH -d '{"state": "Ready"}'
    path: /var/lib/cloud/scripts/per-boot/ready.sh
    permissions: "0755"
EOF
lxc start foo

and I have the READY state. But the CI still seems to complain even with ubuntu-minimal-daily:24.04 ... I'll try to increase the timeout duration to see if it is related..

test/suites/basic.sh Outdated Show resolved Hide resolved
@masnax
Copy link
Contributor

masnax commented May 23, 2024

Just a general design question, not sure if it makes sense at all:

Is it reasonable to expect per-disk encryption? Since the data is replicated should the encryption be all-or-nothing and not per-disk?

@gabrielmougard
Copy link
Contributor Author

Just a general design question, not sure if it makes sense at all:

Is it reasonable to expect per-disk encryption? Since the data is replicated should the encryption be all-or-nothing and not per-disk?

The MicroCeph API seems to only works with a per-disk encryption basis (see here). But if I'm following you, do you mean that, among the chosen disks for distributed storage, and if we chose to have encryption, we enforce the fact that either all the disks are encrypted or not at all?

test/includes/microcloud.sh Fixed Show fixed Hide fixed
test/includes/microcloud.sh Fixed Show fixed Hide fixed
@gabrielmougard gabrielmougard force-pushed the feat/microcloud-fde branch 4 times, most recently from a7a870f to 3c3c211 Compare May 24, 2024 10:28
@masnax
Copy link
Contributor

masnax commented May 24, 2024

Just a general design question, not sure if it makes sense at all:
Is it reasonable to expect per-disk encryption? Since the data is replicated should the encryption be all-or-nothing and not per-disk?

The MicroCeph API seems to only works with a per-disk encryption basis (see here). But if I'm following you, do you mean that, among the chosen disks for distributed storage, and if we chose to have encryption, we enforce the fact that either all the disks are encrypted or not at all?

Yes I meant just having a yes/no question for disk encryption that sets the field for every disk payload that we send to the MicroCeph API.

@gabrielmougard gabrielmougard force-pushed the feat/microcloud-fde branch 6 times, most recently from 7f77fab to a3d65b0 Compare May 27, 2024 16:27
@masnax
Copy link
Contributor

masnax commented Jul 2, 2024

@gabrielmougard looks like the test package needs a rebase now.

@masnax
Copy link
Contributor

masnax commented Jul 3, 2024

@gabrielmougard Another rebase, sorry :)

@gabrielmougard gabrielmougard force-pushed the feat/microcloud-fde branch 2 times, most recently from 119af4b to bf0a0ef Compare July 4, 2024 08:18
@masnax
Copy link
Contributor

masnax commented Jul 4, 2024

@gabrielmougard You need to remove the systemd-journald related masks from microcloud.sh as that seems to be what's breaking cloud-init.

@simondeziel
Copy link
Member

simondeziel commented Jul 4, 2024

@gabrielmougard You need to remove the systemd-journald related masks from microcloud.sh as that seems to be what's breaking cloud-init.

Indeed, looks like it was lost in the merge conflict resolution. There should only be one masked systemd-journald and that's the systemd-journal-flush.service one. The other 2 should be gone.

@masnax
Copy link
Contributor

masnax commented Jul 10, 2024

@gabrielmougard This may need a rebase now, to capture the new microcluster changes.

@masnax
Copy link
Contributor

masnax commented Jul 11, 2024

@gabrielmougard Yet another rebase please

gabrielmougard and others added 5 commits July 11, 2024 17:11
Signed-off-by: Gabriel Mougard <gabriel.mougard@canonical.com>
Signed-off-by: Gabriel Mougard <gabriel.mougard@canonical.com>
Signed-off-by: Gabriel Mougard <gabriel.mougard@canonical.com>
Signed-off-by: Gabriel Mougard <gabriel.mougard@canonical.com>
Co-authored-by: Ruth Fuchss <ruth.fuchss@canonical.com>
Signed-off-by: Gabriel Mougard <gabriel.mougard@canonical.com>
Copy link
Contributor

@masnax masnax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@masnax masnax merged commit e61d3a1 into canonical:main Jul 11, 2024
13 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants