Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

commit: fix ostree deployment on 64-bit inode fs #2874

Merged
merged 1 commit into from
Jun 6, 2023

Conversation

aospan
Copy link
Contributor

@aospan aospan commented Jun 6, 2023

This commit addresses a bug that was causing ostree deployment to become corrupted on the large fs, when any package was installed using 'rpm-ostree install'.

In such instances, multiple files were assigned the same inode. For example, the '/home' directory and a regular file 'pkg-get' were assigned the same inode (2147484070), making the deployment unusable.

A root cause analysis was performed, running the process under gdb, which revealed a lossy conversion from guint64 to guint32, for example 6442451366 converted to 2147484070:

(gdb) p name
$10 = 0x7fe9224d2d70 "home"

(gdb) p inode
$73 = 6442451366

(gdb) s
device=66311, modifier=0x7fe914791840) at
src/libostree/ostree-repo-commit.c:1590

The conversion resulted in entirely independent files potentially receiving the same inode.

The issue was discovered on PoC machine equipped with a large NVME (3.4TB), but the bug can be easily reproduced using cosa run -m 4000 --qemu-size +3TB', followed by installation of any package using rpm-ostree install`. The resulting deployment will be unusable due to many files being "corrupted" by the aforementioned issue.

This commit addresses a bug that was causing ostree deployment
to become corrupted on the large fs, when any package was installed using
'rpm-ostree install'.

In such instances, multiple files were assigned the same inode. For
example, the '/home' directory and a regular file 'pkg-get' were
assigned the same inode (2147484070), making the deployment unusable.

A root cause analysis was performed, running the process under gdb,
which revealed a lossy conversion from guint64 to guint32, for example
6442451366 converted to 2147484070:

(gdb) p name
$10 = 0x7fe9224d2d70 "home"

(gdb) p inode
$73 = 6442451366

(gdb) s
    device=66311, modifier=0x7fe914791840) at
src/libostree/ostree-repo-commit.c:1590

The conversion resulted in entirely independent files potentially
receiving the same inode.

The issue was discovered on PoC machine equipped with a large NVME
(3.4TB), but the bug can be easily reproduced using `cosa run -m 4000
--qemu-size +3TB', followed by installation of any package using
`rpm-ostree install`. The resulting deployment will be unusable due to
many files being "corrupted" by the aforementioned issue.
@openshift-ci
Copy link

openshift-ci bot commented Jun 6, 2023

Hi @aospan. Thanks for your PR.

I'm waiting for a ostreedev member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, ouch. Nice find. This bug has been lurking for a long time...I think we've mostly been saved by how most Linux filesystems keep inodes generally below 32 bits unless there's pressure to go higher.

@cgwalters
Copy link
Member

/ok-to-test

@cgwalters
Copy link
Member

Side note: Rust doesn't allow silent integer truncations like this.

Hmm...we could try building with -Werror=conversion...there's definitely stuff to fix for that in a quick check it looks like it may be tractable in an hour of work. Well, maybe...with this I also see warnings from glib.h like

src/ostree/ot-builtin-reset.c: In function ‘ostree_builtin_reset’:
/usr/include/glib-2.0/gio/gioerror.h:42:20: error: conversion to ‘gint’ {aka ‘int’} from ‘GQuark’ {aka ‘unsigned int’} may change the sign of the result [-Werror=sign-conversion]
   42 | #define G_IO_ERROR g_io_error_quark()
      |                    ^~~~~~~~~~~~~~~~~~
src/ostree/ot-builtin-reset.c:71:39: note: in expansion of macro ‘G_IO_ERROR’
   71 |       g_set_error (error, G_IO_ERROR, G_IO_ERROR, "Invalid ref '%s'", ref);
      |                                  

But maybe we can juggle that with a pragma that only enables the warning for our C code, not headers (particularly not external ones).

Well, a project for another time.

Thanks for the fix!

@aospan
Copy link
Contributor Author

aospan commented Jun 6, 2023

Absolutely, happy to contribute! Agreed, it's certainly essential to have some safeguards in place to defend against such bugs.

@cgwalters cgwalters merged commit 8762062 into ostreedev:main Jun 6, 2023
19 checks passed
@cgwalters
Copy link
Member

cgwalters commented Jun 20, 2023

I want to do a bit of a "risk analysis" on this. It's important to emphasize that I believe right now that this bug does not affect "plain ostree" usage, where the ostree client is fetching objects.

This issue affects "builds" - that's where we do the device/inode caching. rpm-ostree's local installs here count as a "build".

But, if one was doing an ostree based build system and using --link-checkout-speedup e.g. on a filesystem that uses >32 bit inode numbers, this might result in corrupted (i.e. incorrect files) in a resulting commit. For example, I think...uh oh, flatpak-builder has a copy of this code (edit: fixed here)

cgwalters added a commit to cgwalters/flatpak-builder that referenced this pull request Jun 20, 2023
See ostreedev/ostree#2874
"commit: fix ostree deployment on 64-bit inode fs"
TingPing pushed a commit to flatpak/flatpak-builder that referenced this pull request Jun 20, 2023
See ostreedev/ostree#2874
"commit: fix ostree deployment on 64-bit inode fs"
@cgwalters
Copy link
Member

cgwalters commented Jul 19, 2023

Thanks so much for finding this. It turns out that I was wrong...in the ostree-container flow today, we do use a devino cache to "flatten" container image layers, and so anything using the ostree-container flow is subject to this.

To xref I'm going to be backporting this with high urgency to RHEL; see https://bugzilla.redhat.com/show_bug.cgi?id=2224081

@cgwalters
Copy link
Member

I wrote a lot of new code to detect this situation (to prove it's happening): ostreedev/ostree-rs-ext#504

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants