Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix marking static delta commits as partial #2549

Merged
merged 2 commits into from
Feb 21, 2022

Conversation

mwleeds
Copy link
Member

@mwleeds mwleeds commented Feb 19, 2022

This patch makes it so that we mark the .commit file from a static delta
as partial before writing the commit to the staging directory. This
exactly mirrors what we do in meta_fetch_on_complete() when writing the
commit on that codepath, which should lend some credibility to the
correctness of this patch.

I have checked that this fixes an issue Flatpak users have been
encountering (flatpak/flatpak#3479) which
results in error messages like "error: Failed to install
org.freedesktop.Sdk.Extension.texlive: Failed to read commit
c7958d966cfa8b80a42877d1d6124831d7807f93c89461a2a586956aa28d438a: No
such metadata object
8bdaa943b957f3cf14d19301c59c7eec076e57389e0fbb3ef5d30082e47a178f.dirtree"

Here's the sequence of events that lead to the error:

  1. An install operation is started that fetches static deltas.
  2. The fetch is interrupted for some reason such as network connectivity
    dropping.
  3. The .commit and .commitmeta files for the commit being pulled are
    left in the staging dir, e.g.
    "~/.local/share/flatpak/repo/tmp/staging-dfe862b2-13fc-49a2-ac92-5a59cc0d8e18-RURckd"
  4. There is no .commitpartial file for the commit in
    "~/.local/share/flatpak/repo/state/"
  5. The next time the user attempts the install, libostree reuses the
    existing staging dir, pulls the commit and commitmeta objects into
    the repo from the staging dir on the assumption that it's a complete
    commit.
  6. Flatpak then tries to deploy the commit but fails in
    ostree_repo_read_commit() in flatpak_dir_deploy(), leading to the
    error message "Failed to read commit ..."
  7. This happens again any subsequent time the user attempts the install,
    until the incomplete commit is removed with "flatpak repair --user".

I will try to also add a workaround in Flatpak so this is fixed even
when Flatpak links against affected versions of libostree.

This patch makes it so that we mark the .commit file from a static delta
as partial before writing the commit to the staging directory. This
exactly mirrors what we do in meta_fetch_on_complete() when writing the
commit on that codepath, which should lend some credibility to the
correctness of this patch.

I have checked that this fixes an issue Flatpak users have been
encountering (flatpak/flatpak#3479) which
results in error messages like "error: Failed to install
org.freedesktop.Sdk.Extension.texlive: Failed to read commit
c7958d966cfa8b80a42877d1d6124831d7807f93c89461a2a586956aa28d438a: No
such metadata object
8bdaa943b957f3cf14d19301c59c7eec076e57389e0fbb3ef5d30082e47a178f.dirtree"

Here's the sequence of events that lead to the error:
1. An install operation is started that fetches static deltas.
2. The fetch is interrupted for some reason such as network connectivity
   dropping.
3. The .commit and .commitmeta files for the commit being pulled are
   left in the staging dir, e.g.
   "~/.local/share/flatpak/repo/tmp/staging-dfe862b2-13fc-49a2-ac92-5a59cc0d8e18-RURckd"
4. There is no `.commitpartial` file for the commit in
   "~/.local/share/flatpak/repo/state/"
5. The next time the user attempts the install, libostree reuses the
   existing staging dir, pulls the commit and commitmeta objects into
   the repo from the staging dir on the assumption that it's a complete
   commit.
6. Flatpak then tries to deploy the commit but fails in
   ostree_repo_read_commit() in flatpak_dir_deploy(), leading to the
   error message "Failed to read commit ..."
7. This happens again any subsequent time the user attempts the install,
   until the incomplete commit is removed with "flatpak repair --user".

I will try to also add a workaround in Flatpak so this is fixed even
when Flatpak links against affected versions of libostree.
@mwleeds
Copy link
Member Author

mwleeds commented Feb 19, 2022

Looks like this commit is already on a WIP PR by @jlebon #2054: 7a26b7e

Copy link
Member

@dbnicholson dbnicholson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense. I'm surprised this hasn't come up more before. It would be good to test that the partial is actually written, but I can't think of a good way to do it. Can you add something to a test that makes sure there's no partial after the pull completes?

mwleeds added a commit to flatpak/flatpak that referenced this pull request Feb 19, 2022
All the details of the bug are in:
ostreedev/ostree#2549
#3479

This patch works around it by marking the commit we're about to pull
partial, so that if the .commit object exists in a staging directory
from a previous failed pull, it will not be erroneously considered a
complete commit, even by affected versions of libostree that don't have
the above patch. If for some reason the commit in the staging dir is
complete, libostree should harmlessly verify that and pull it in.

Usually the commit we are pulling does not already exist in the local
repo, but add a check anyway so we don't risk marking a complete commit
as partial, and so this works on the code path from
"flatpak install --reinstall ..."

Fixes #3479
@mwleeds
Copy link
Member Author

mwleeds commented Feb 20, 2022

Oh, that makes sense. I'm surprised this hasn't come up more before. It would be good to test that the partial is actually written, but I can't think of a good way to do it. Can you add something to a test that makes sure there's no partial after the pull completes?

Strictly speaking that would be testing existing code not this patch. I'm not sure if I'll have time soon to look at writing a unit test for this, but I think it would make sense to get it merged and get a release out since it's pretty high impact.

I don't think these two lines of code necessarily need a unit test since they are pretty obviously correct, but it would be good to test the robustness of interrupted pulls in general, if we can think of a way to do that.

@AdrianVovk
Copy link

This bug looks like the cause of a repo corruption I've run into more than once. The symptoms seem exactly the same (thought it seems I was also experiencing the bug fixed in #2544). I never checked if a .commitpartial file exists, nor did I touch anything in repo/tmp. I fixed my repo by manually finding the dead commit(s) and removing them

Thanks for the fix!

mwleeds added a commit to flatpak/flatpak that referenced this pull request Feb 20, 2022
All the details of the bug are in:
ostreedev/ostree#2549
#3479

This patch works around it by marking the commit we're about to pull
partial, so that if the .commit object exists in a staging directory
from a previous failed pull, it will not be erroneously considered a
complete commit, even by affected versions of libostree that don't have
the above patch. If for some reason the commit in the staging dir is
complete, libostree should harmlessly verify that and pull it in.

Usually the commit we are pulling does not already exist in the local
repo, but add a check anyway so we don't risk marking a complete commit
as partial, and so this works on the code path from
"flatpak install --reinstall ..."

Fixes #3479
mwleeds added a commit to flatpak/flatpak that referenced this pull request Feb 21, 2022
All the details of the bug are in:
ostreedev/ostree#2549
#3479

This patch works around it by marking the commit we're about to pull
partial, so that if the .commit object exists in a staging directory
from a previous failed pull, it will not be erroneously considered a
complete commit, even by affected versions of libostree that don't have
the above patch. If for some reason the commit in the staging dir is
complete, libostree should harmlessly verify that and pull it in.

Usually the commit we are pulling does not already exist in the local
repo, but add a check anyway so we don't risk marking a complete commit
as partial, and so this works on the code path from
"flatpak install --reinstall ..."

Fixes #3479
Copy link
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cgwalters
Copy link
Member

/override continuous-integration/jenkins/pr-merge

@openshift-ci
Copy link

openshift-ci bot commented Feb 21, 2022

@cgwalters: Overrode contexts on behalf of cgwalters: continuous-integration/jenkins/pr-merge

In response to this:

/override continuous-integration/jenkins/pr-merge

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@cgwalters cgwalters merged commit 4304505 into main Feb 21, 2022
@dbnicholson
Copy link
Member

This bug looks like the cause of a repo corruption I've run into more than once. The symptoms seem exactly the same (thought it seems I was also experiencing the bug fixed in #2544). I never checked if a .commitpartial file exists, nor did I touch anything in repo/tmp. I fixed my repo by manually finding the dead commit(s) and removing them

Thanks for the fix!

Yeah, it's pretty bad. I'm definitely going to backport it to our Endless stable branch and maybe even our old stable branch. I might submit a PR for Debian too.

@dbnicholson dbnicholson deleted the mwleeds/fix-partial-delta-fetches branch February 21, 2022 20:16
mwleeds added a commit to flatpak/flatpak that referenced this pull request Feb 21, 2022
All the details of the bug are in:
ostreedev/ostree#2549
#3479

This patch works around it by marking the commit we're about to pull
partial, so that if the .commit object exists in a staging directory
from a previous failed pull, it will not be erroneously considered a
complete commit, even by affected versions of libostree that don't have
the above patch. If for some reason the commit in the staging dir is
complete, libostree should harmlessly verify that and pull it in.

Usually the commit we are pulling does not already exist in the local
repo, but add a check anyway so we don't risk marking a complete commit
as partial, and so this works on the code path from
"flatpak install --reinstall ..."

Fixes #3479

(cherry picked from commit 11158c2)
mwleeds added a commit to flatpak/flatpak that referenced this pull request Feb 21, 2022
All the details of the bug are in:
ostreedev/ostree#2549
#3479

This patch works around it by marking the commit we're about to pull
partial, so that if the .commit object exists in a staging directory
from a previous failed pull, it will not be erroneously considered a
complete commit, even by affected versions of libostree that don't have
the above patch. If for some reason the commit in the staging dir is
complete, libostree should harmlessly verify that and pull it in.

Usually the commit we are pulling does not already exist in the local
repo, but add a check anyway so we don't risk marking a complete commit
as partial, and so this works on the code path from
"flatpak install --reinstall ..."

Fixes #3479

(cherry picked from commit 11158c2)
@cgwalters cgwalters mentioned this pull request Mar 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants