Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed concurrent-ks-tree-syncs by making disttree ID repo-specific. #2851

Merged
merged 1 commit into from
Nov 7, 2022

Conversation

ggainey
Copy link
Contributor

@ggainey ggainey commented Oct 31, 2022

DistributionTree digest and subrepo-names now both end with the pulp-id of the "owning" Repository, making them unique to that repo and therefore protected from concurrent-updates against anything that is changing that Repository.

Addon/Variant/Image are transitively made unique by virtue of having their DistributionTree be part of their unique-together.

Sub-repo content (e.g. Packages et al) are de-duplicated via their existing uniqueness constraints.

The end result is a minor increase in Content objects (i.e., DistTrees/Addons/Images/Variants that used to have only one instance are now one-per-containing-repo), and a small impact on subrepo-syncing (since previously-unique subrepos will now have a first-sync that would have been skipped). Content will continue to only be sync'd once.

fixes #2278.
fixes #2775.
closes #2304.
[nocoverage]

DistributionTree digest and subrepo-names now both end with the
pulp-id of the "owning" Repository, making them unique to that
repo and therefore protected from concurrent-updates against
anything that is changing that Repository.

Addon/Variant/Image are transitively made unique by virtue of
having their DistributionTree be part of their unique-together.

Sub-repo **content** (e.g. Packages et al) are de-duplicated via
their existing uniqueness constraints.

The end result is a minor increase in Content objects (i.e.,
DistTrees/Addons/Images/Variants that used to have only one
instance are now one-per-containing-repo), and a small impact
on subrepo-syncing (since previously-unique subrepos will now
have a first-sync that would have been skipped). Content will
continue to only be sync'd once.

fixes pulp#2278.
[nocoverage]
@ggainey
Copy link
Contributor Author

ggainey commented Oct 31, 2022

See overview/commentary/braindump of the process that got me here, in this doc: https://hackmd.io/@ggainey/subrepo_problem

@ggainey ggainey marked this pull request as ready for review October 31, 2022 18:37
Copy link
Member

@fao89 fao89 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dralley
Copy link
Contributor

dralley commented Nov 4, 2022

@ggainey What happens with everything that has already been saved under the old naming scheme, presumably all the cleanup and uniqueness constraints will still work fine?

  • Unless the distribution tree digest changes, no new DistriubtionTree content will be created, and new Addon and Variant probably won't be created?
    • Is there a chance these could be saved to the old distribution tree even if the digest doesn't change, causing duplicates?
  • When it does change, we should get a new DistributionTree content, and new Addon and Variant under the new name scheme
    • And presumably the old stuff is all correctly removed from the repo, even in additive mode?

@ggainey
Copy link
Contributor Author

ggainey commented Nov 4, 2022

@ggainey What happens with everything that has already been saved under the old naming scheme, presumably all the cleanup and uniqueness constraints will still work fine?

  • Unless the distribution tree digest changes, no new DistriubtionTree content will be created, and new Addon and Variant probably won't be created?

New content will not be created, addon/variant/image WILL be created, because their uniqueness-constrain includes the dist-tree they are pointing to.

  • Is there a chance these could be saved to the old distribution tree even if the digest doesn't change, causing duplicates?

No, because they are created when the new disttree shows up and are specific to that disstree.

  • When it does change, we should get a new DistributionTree content, and new Addon and Variant under the new name scheme

Whether the digest changes or not, we are going to get new disttrees the first time this goes live - because the "digest" field will be "digest-repoid", which does not currently exist, and therefore this is a 'new' disttree.

  • And presumably the old stuff is all correctly removed from the repo, even in additive mode?

Yes - because the new-repo-version processing says "there can be only one disttree", and takes the "new" one, removing the old one. See https://github.com/pulp/pulp_rpm/blob/main/pulp_rpm/app/models/repository.py#L393

Great questions, keep 'em coming!

Copy link
Contributor

@dralley dralley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. @ipanova (and @goosemania if you want to), does this all (and the explanation document https://hackmd.io/@ggainey/subrepo_problem) look good to you as well?

@goosemania
Copy link
Member

Makes sense. @ipanova (and @goosemania if you want to), does this all (and the explanation document https://hackmd.io/@ggainey/subrepo_problem) look good to you as well?

I see that apart from subrepo names, @ggainey made the disttree digest repo specific. That should do the trick. LGTM

@ggainey ggainey merged commit 52a9acc into pulp:main Nov 7, 2022
@patchback
Copy link

patchback bot commented Nov 7, 2022

Backport to 3.17: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.17/52a9accc79e49991a8941622b60485e1845d23a2/pr-2851

Backported as #2860

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Nov 7, 2022

Backport to 3.14: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 52a9acc on top of patchback/backports/3.14/52a9accc79e49991a8941622b60485e1845d23a2/pr-2851

Backporting merged PR #2851 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulp_rpm.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.14/52a9accc79e49991a8941622b60485e1845d23a2/pr-2851 upstream/3.14
  4. Now, cherry-pick PR Fixed concurrent-ks-tree-syncs by making disttree ID repo-specific. #2851 contents into that branch:
    $ git cherry-pick -x 52a9accc79e49991a8941622b60485e1845d23a2
    If it'll yell at you with something like fatal: Commit 52a9accc79e49991a8941622b60485e1845d23a2 is a merge but no -m option was given., add -m 1 as follows intead:
    $ git cherry-pick -m1 -x 52a9accc79e49991a8941622b60485e1845d23a2
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Fixed concurrent-ks-tree-syncs by making disttree ID repo-specific. #2851 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.14/52a9accc79e49991a8941622b60485e1845d23a2/pr-2851
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Nov 7, 2022

Backport to 3.18: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.18/52a9accc79e49991a8941622b60485e1845d23a2/pr-2851

Backported as #2861

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@ipanova
Copy link
Member

ipanova commented Nov 8, 2022

@ggainey I reviewed your doc and the PR and all looks good to me expect for the copy usecase.
If i sync same dist tree into A and B as a result I will have 2 dist trees and 2x sub-repos in DB.
Then if i copy dist tree from A to B, distr-tree-repoA will be present in repo B and distreeB will be removed in version2. Except for the weirdness that repo will have in it dist-tree-digest id of repoA is there anything else?

@ggainey
Copy link
Contributor Author

ggainey commented Nov 8, 2022

@ggainey I reviewed your doc and the PR and all looks good to me expect for the copy usecase. If i sync same dist tree into A and B as a result I will have 2 dist trees and 2x sub-repos in DB. Then if i copy dist tree from A to B, distr-tree-repoA will be present in repo B and distreeB will be removed in version2. Expect for the weirdness that it will have in its digest id of repoA is there anything else?

Yes, you'll have the "dist-tree and related subrepos named-for repo-A", contained in repo-B. This is weird, and distressing, and exactly what we do today. It's also "harmless"; the most unexpected result happens if we sync repo-A, and only the subrepo-content has changed, it will end up changing the content of repo-B's subrepos as well. Fixing this edge case is going to need its own issue, and some pretty ugly heuristics in copy.

@ipanova
Copy link
Member

ipanova commented Nov 8, 2022

@ggainey I reviewed your doc and the PR and all looks good to me expect for the copy usecase. If i sync same dist tree into A and B as a result I will have 2 dist trees and 2x sub-repos in DB. Then if i copy dist tree from A to B, distr-tree-repoA will be present in repo B and distreeB will be removed in version2. Expect for the weirdness that it will have in its digest id of repoA is there anything else?

Yes, you'll have the "dist-tree and related subrepos named-for repo-A", contained in repo-B. This is weird, and distressing, and exactly what we do today. It's also "harmless"; the most unexpected result happens if we sync repo-A, and only the subrepo-content has changed, it will end up changing the content of repo-B's subrepos as well. Fixing this edge case is going to need its own issue, and some pretty ugly heuristics in copy.

urgh but i think i can live with that, especially if we end up having pretty ugly heuristics in copy then leaving this as is maybe is a lesser evil, plus as discussed with @ggainey over matrtix "what this PR does, is make the current approach WAY more solid, while leaving one usecase unfixed. Fortunately, the problem introduced by that usecase is "hypothetical" - kickstart-tree-subrepo-content can, theoretically, change, but in "the real world" they are static forever"

@dralley
Copy link
Contributor

dralley commented Nov 8, 2022

@ggainey Could you file that issue so that we have it documented?

@ggainey
Copy link
Contributor Author

ggainey commented Nov 8, 2022

@ggainey Could you file that issue so that we have it documented?

Done - see #2864

@ggainey ggainey deleted the 2278_deunique_disttree branch March 31, 2023 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants