-
Notifications
You must be signed in to change notification settings - Fork 522
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for unified image layout #1870
Conversation
The repart service is a oneshot, so we can't use a drop-in to replace the `ExecStart` command that runs by default, which attempts to add or grow defined partitions on the same device as the root filesystem. This is OK for "unified" images, where the data partition is at the end of the device, but not OK for "split" images, where it lives on a different device. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Switch from our `growpart` tool to `systemd-repart` to resize the data partition. For a unified root+data image. `growpart` uses the `gptman` crate, which calls the BLKRRPART ioctl to tell the kernel to re-read the partition table. This call fails if the device contains mounted partitions. `systemd-repart` uses the newer BLKPG ioctl, which manipulates the kernel's view of individual partitions. This works even if the root filesystem is present on the same device and already mounted. It also avoids the need to handle the partition symlink going away and coming back, since udev does not get the change event that triggers this. The two tools differ in how much free space is left on the device after the last partition is resized. `growpart` ends the partition one sector before the last 1 MiB boundary, while `systemd-repart` ends it just before the GPT label. Both tools run on every boot. To avoid problems on downgrade after a newer release resizes the data filesystem beyond where the older release will end the partition, we constrain `systemd-repart` to leave the older number of free sectors. Since `/local` can be mounted during the resize operation, we can use a real mount unit for it, which greatly simplifies the dependencies, and allows us to decouple the "prepare" logic from "resize" logic. Signed-off-by: Ben Cressey <bcressey@amazon.com>
For some targets such as bare metal systems, the requirement for a separate block device to hold the data partition is unworkable. Implement a "unified" image layout, which places the data partition after the final OS partition, and is suitable for targets which may only have one disk. The old "split" layout remains the default. Signed-off-by: Ben Cressey <bcressey@amazon.com>
b706d30
to
9e66877
Compare
%{S:1001} %{S:1002} %{S:1003} %{S:1004} %{S:1005} \ | ||
%{S:1006} %{S:1007} %{S:1008} %{S:1009} %{S:1010} %{S:1011} %{S:1012} \ | ||
%{S:1015} %{S:1040} %{S:1041} %{S:1060} %{S:1061} %{S:1062} %{S:1080} \ | ||
%{S:1001} %{S:1002} %{S:1003} %{S:1004} %{S:1005} %{S:1006} %{S:1007} \ | ||
%{S:1008} %{S:1009} %{S:1010} %{S:1011} %{S:1012} %{S:1013} %{S:1015} \ | ||
%{S:1040} %{S:1041} %{S:1042} %{S:1060} %{S:1061} %{S:1062} %{S:1080} \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This hurts to review 😰
echo "No OVF template or VMDK images, skipping OVA build" | ||
exit 0 | ||
else | ||
# Warn the user if a VMDK exists but an OVF template does not. Assume we do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I don't know if this is intended, but there is an extra space here
# Warn the user if a VMDK exists but an OVF template does not. Assume we do not | |
# Warn the user if a VMDK exists but an OVF template does not. Assume we do not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to preserve the existing "two spaces after a period" convention in this section, even though I belong to the "one space after a period" tribe.
Makefile.toml
Outdated
fi | ||
|
||
root_image_size_bytes="$(measure_image "${root_vmdk_path}")" | ||
root_image_size_gib="$((root_image_size_bytes / 1024 / 1024 / 1024))" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for 1024 * 1024 * 1024
and 1024 / 1024 / 1024
to be constants?
@@ -101,6 +104,9 @@ install -p -m 0644 %{S:11} %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir} | |||
install -d %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/wicked/ifconfig | |||
install -p -m 0644 %{S:1000} %{buildroot}%{_cross_factorydir}%{_cross_sysconfdir}/wicked/ifconfig | |||
|
|||
install -d %{buildroot}%{_cross_libdir}/repart.d | |||
install -p -m 0644 %{S:96} %{buildroot}%{_cross_libdir}/repart.d/80-local.conf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a drop-in, and not part of the unit itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
systemd-repart
looks for its config files in /usr/lib/repart.d
, like howsystemd-tmpfiles
looks in /usr/lib/tmpfiles.d
.
Although the existing AWS and VMware variants use the "split" image layout, custom variants for these platforms might use the "unified" layout instead. Adapt the AMI registration and OVA creation logic to account for the possibility that we only build a single disk image. Signed-off-by: Ben Cressey <bcressey@amazon.com>
Using `state_t` as the label makes the directories read-only for all unprivileged containers, even if they have access via a host mount. Signed-off-by: Ben Cressey <bcressey@amazon.com>
9e66877
to
29de89f
Compare
Added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⛵
fi | ||
|
||
root_image_size_gib="$(($(stat -c %s "${root_image}") / 1024 / 1024 / 1024))" | ||
if [ "${is_split}" == "yes" ] ; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels like we are at the limit of what should be done with scriptlets inside of a Makefile.toml!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider the gauntlet thrown down!
(But yes - we definitely need a different approach.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🥇
Issue number:
N/A
Description of changes:
Add support for a "unified" image layout, where the OS and data partitions reside on the same disk.
In order to resize the data partition when it's on the same disk as the root filesystem, switch from our homegrown
growpart
tool tosystemd-repart
, which knows how to tell the kernel about changes to individual partition entries.Refactor the unit dependencies for setting up
/local
to take advantage of the new repart functionality, which lets us an ordinary mount unit for the filesystem.Testing done:
/local
is correctly resized to fill the disk for both "unified" and "split"systemd-repart
andgrowpart
resize the disk to the same end sectorTerms of contribution:
By submitting this pull request, I agree that this contribution is dual-licensed under the terms of both the Apache License, version 2.0, and the MIT license.