Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create better primary keys for subtrees #2180

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CHANGES/9566.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Scenario:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey! The changelog entries are usually short descriptions, could you replace all of this something more like:

"Fixed a bug where sub-repos (distribution tree repos) could conflict with each other in common workflows."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HolgerHees , great job at finding not very straightforward issues and solution for them! Thanks a lot


My primary pulp instance is hosting 2 distributions of the same repository (like staging and production) which are referencing different versions of the same repository. During my initial run, both distributions are point to version 1. So far so good.

Now I have a secondary pulp instance which is mirroring the 2 primary distributions by creating separate remotes and repositories.

The repository on the primary node contains now a subtree which is identically in version 1 for staging and production. Means it has the same hash.

Now, during the sync process the metadata for this subtree are stored by createing a primary key like "{repodata}-{treeinfo['hash']}". This collides with staging and production, because contentwise and with the hash, the subtree is the same for both staging and production. The key should be something like "{repodata}-{treeinfo['hash']}-{repository_pk}"
2 changes: 1 addition & 1 deletion pulp_rpm/app/tasks/synchronizing.py
Original file line number Diff line number Diff line change
Expand Up @@ -478,7 +478,7 @@ def is_subrepo(directory):
if repodata == DIST_TREE_MAIN_REPO_PATH:
treeinfo["repositories"].update({repodata: None})
continue
name = f"{repodata}-{treeinfo['hash']}"
name = f"{repodata}-{treeinfo['hash']}-{repository_pk}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goosemania Question: We probably need to clean up the repos using the old naming scheme somehow, and I can think of two different ways to do that. Either we could look for them under the old name in the sync code, which is the only way to backport this patch properly, or we could do a migration, or both.

I'm thinking we probably need to do both? Do we have enough information available to perform a rename in a migration in the first place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to backport it, I wonder if it's possible at all to get to this situation using Katello, Pulp upstream folks should be able to upgrade.

I'm for migrations and cleaner code (not to introduce new bugs because of handling 2 different naming schemes). There is one potential problem but the probability is extremely low - if someone named some other repo manually like suggested in the patch, we'll run into conflict during a migration. (The probability is getting higher if someone applied this patch as is and later decided to upgrade).

As for info, I think we have enough, which part are you concerned about? We need to look for repos which are user_hidden=True and which names do not end with repository_pk and add -<repository_pk> to those. Do I miss anything?

Copy link
Contributor

@dralley dralley Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is one potential problem but the probability is extremely low - if someone named some other repo manually like suggested in the patch, we'll run into conflict during a migration. (The probability is getting higher if someone applied this patch as is and later decided to upgrade).

We should probably consider adding user_hidden to the uniqueness constraint.

Do I miss anything?

No, I suppose the question is just whether we know what the base repository PK is for any arbitrary sub-repo. Or is it the sub-repo's repository PK?

Copy link
Member

@goosemania goosemania Nov 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, I had the exactly the same thoughts, in that order :). I thought that it's base repo PK (figured out how to get it, but it's painful - you go through every dist tree and check which repos its addons and variants refer to.) but than looking at the code it seems to be a subrepo PK.

Copy link
Contributor

@dralley dralley Nov 30, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm ok with merging this as-is, as long as we find out before the release whether this might potentially need to be backported. I don't want to end up in a situation where we do need to backport and then need to go back and change the released migration to accomodate. Unless we can write the migration in an agnostic way to begin with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting to merge as is now but then have some code changes before the release (depending on the need for backports, it would be a migration approach or support for 2 schemes)? If that's the case, maybe we need an issue which will block the upcoming release, not to forget about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main consideration is simply to not make the author write a migration for us too :) I agree about filing an issue though.

Or I can just close this and re-open it to write the migration myself. I guess I'll do that.

sub_repo, created = RpmRepository.objects.get_or_create(name=name, user_hidden=True)
if created:
sub_repo.save()
Expand Down