Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provisioning VMs with certain root disk sizes on fresh Noble system results in 'volsize' must be a multiple of volume block size (16K) #13420

Closed
webdock-io opened this issue May 2, 2024 · 18 comments
Assignees
Labels
Bug Confirmed to be a bug
Milestone

Comments

@webdock-io
Copy link

webdock-io commented May 2, 2024

Ubuntu Noble
LXD v5.21.1 LTS
zfs-2.2.2-0ubuntu9
zfs-kmod-2.2.2-0ubuntu9

Running:
# /snap/bin/lxc init ubuntu:noble mytest --profile myprofile--vm -c security.secureboot=false

Where profile for disk is:

root:
    path: /
    pool: lxd
    size: 200GB
    type: disk

results in:

Creating mytest
Error: Failed instance creation: Failed creating instance from image: Failed to run: zfs set volsize=200000004096 lxd/virtual-machines/mytest.block: exit status 255 (cannot set property for 'lxd/virtual-machines/mytest.block': 'volsize' must be a multiple of volume block size (16K))

We never saw this in the past, so this seems like something new in zfs or lxd - I believe our existing systems have a volume block size of 8K

Many other disk sizes in GB shorthand do not work either, 13GB for example if you don't have 200GB in your pool to test this :D

Are we supposed to jump through hoops trying to calculate "close enough to xGB we want but in bytes that will work with zfs" by hand before provisioning VMs with a specific amount of root storage available?

If this is a zfs limitation, which it seems it is, it feels like there should be logic in LXD to catch this and fix up the volsize command under the hood so it hits the number of bytes that zfs will eat.

That is, if I understand this issue correctly. It feels like some black magic is going on :D

Edit: some more information which may be helpful, in case this is specific to our system setup

  • we create the lxd pool by hand and tell lxd to use it
  • it's a mirrored pool and the pool settings we touch are:
zfs set xattr=sa lxd
zfs set sync=disabled lxd
zfs set compression=off lxd
zfs set atime=off lxd
zpool set autotrim=on lxd
  • we also tell lxd to use requota

lxc storage set lxd volume.zfs.use_refquota true

@tomponline tomponline added the Bug Confirmed to be a bug label May 2, 2024
@tomponline tomponline added this to the lxd-6.1 milestone May 2, 2024
@tomponline
Copy link
Member

LXD should be rounding to the nearest required block size for the storage pool, so we should check why this is happening now.

@webdock-io
Copy link
Author

Thanks for the quick response as usual @tomponline

Based on this info, I tried reverting all the way back to 5.0/stable - here the issue is the same

Which sucks, as I would not be comfortable going further back than that in order to get our system working properly

Can you share what the timeframe is on the 6.1 milestone?

If it's far into the future, I'll have to look into fixing up profiles in our scripting on these systems and do the rounding ourselves, which I am not even sure how we would do and it would be complicated to implement for sure.

Anyway, thank you in advance for any info you can share. In the meantime I'll try 4.0 and 3.0 just for kicks and see what happens

@webdock-io
Copy link
Author

FYI I couldnt get 4.0 or 3.0 to work as it complained about the "zfs tool" not being present. Some old bug that has since been fixed I assume. I didn't want to spend time trying to get ancient versions of lxd to work just to check if the issue is still present there.

But in any case, it's present all the way back to 5.0 that's for certain

@tomponline
Copy link
Member

@simondeziel do you have time to see if you can reproduce/identity the issue here, as i know you're pretty familiar with ZFS.

I would have thought we would have seen this before, and in our tests that use various sizes, so it sounds like the pool is setup in a specific way thats triggering this perhaps.

@simondeziel
Copy link
Member

A similar issue was reported on Launchpad recently (https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/2063105). The gist of the issue is when using base-10 units (GB, MB), you don't always line up with 16KiB as required by ZFS. Using base-2 units (GiB, MiB) doesn't have this problem.

@tomponline
Copy link
Member

Sounds like something has changed in LXD or ZFS as we have been rounding them for some time.

@simondeziel
Copy link
Member

@tomponline the reported on LP had 5.0.3-babaaf8 (27948).

@tomponline
Copy link
Member

@tomponline the reported on LP had 5.0.3-babaaf8 (27948).

Yeah indeed, but its interesting both that and this have only happened recently.

@webdock-io
Copy link
Author

We are kind of married to GB/MB units at this point, so identifying this and a fix or workaround would be ideal

@tomponline
Copy link
Member

I tried to get a basic reproducer (unsuccessfully):

lxc storage create zfs zfs size=15GiB
lxc init ubuntu:noble vtest --vm -d root,size=13GiB

This is on 5.21/stable and Noble.

So appears to require a specific set of circumstances.

@simondeziel
Copy link
Member

@tomponline s/13GiB/13GB/ worked for me

@tomponline
Copy link
Member

tomponline commented May 2, 2024

Thanks, I used GiB :(

The rounding logic is called here for ZFS: https://github.com/canonical/lxd/blob/main/lxd/storage/drivers/driver_zfs_volumes.go#L87

It uses https://github.com/canonical/lxd/blob/main/lxd/storage/drivers/driver_common.go#L554-L566 which rounds to 8192 bytes (https://github.com/canonical/lxd/blob/main/lxd/storage/drivers/utils.go#L25).

LVM has its own rounding logic here: https://github.com/canonical/lxd/blob/main/lxd/storage/drivers/driver_lvm.go#L787-L802

So maybe we just need to implement an override for ZFS.

@simondeziel
Copy link
Member

Yeah, that sounds like a good idea to do the rounding in LXD, the question is why now? Couldn't find anything relevant on the OpenZFS side, unfortunately.

@tomponline
Copy link
Member

It'd be good to get a reproducer so we can check its actually fixed.

@webdock-io
Copy link
Author

@tomponline I have a hard time following the back and forth here. Are you confirming you see the issue and can reproduce or not?

Seems to me your test was using GiB when you should be trying with GB

I can reliably reproduce here, but as stated we create the pool beforehand and then tell lxd to use it when doing lxd init

If there is some.patch or version I can grab to test a fix, let me know and I'm all yours:)

@webdock-io
Copy link
Author

Ps I can get you access to the system if you want to inspect something. Just send me a public key at arni@webdock.io

@simondeziel
Copy link
Member

simondeziel commented May 2, 2024

@webdock-io no, we are good, we have that easy reproducer:

$ lxc init ubuntu-daily:22.04 u1 --vm -d root,size=13GB
Creating u1
Error: Failed instance creation: Failed creating instance from image: Failed to run: zfs set volsize=13000007680 default/virtual-machines/u1.block: exit status 255 (cannot set property for 'default/virtual-machines/u1.block': 'volsize' must be a multiple of volume block size (16K))

@capriciousduck
Copy link

I too hit the same error. Any update on this guys?

Thanks.

@MggMuggins MggMuggins self-assigned this May 9, 2024
MggMuggins added a commit to MggMuggins/lxd that referenced this issue May 10, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
MggMuggins added a commit to MggMuggins/lxd that referenced this issue May 10, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
MggMuggins added a commit to MggMuggins/lxd that referenced this issue May 13, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
hamistao pushed a commit to hamistao/lxd that referenced this issue May 29, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
tomponline pushed a commit to tomponline/lxd that referenced this issue Jun 6, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
tomponline pushed a commit to tomponline/lxd that referenced this issue Jun 6, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
tomponline pushed a commit to tomponline/lxd that referenced this issue Jun 25, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
(cherry picked from commit 39e5776)
tomponline pushed a commit to tomponline/lxd that referenced this issue Jun 25, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
tomponline pushed a commit to tomponline/lxd that referenced this issue Jun 25, 2024
Fixes canonical#13420

Signed-off-by: Wesley Hershberger <wesley.hershberger@canonical.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Confirmed to be a bug
Projects
None yet
Development

No branches or pull requests

5 participants