Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use retain_package_versions to filter out older pkgs at sync time #2547

Merged
merged 1 commit into from
Jul 18, 2022

Conversation

dralley
Copy link
Contributor

@dralley dralley commented May 26, 2022

closes #2479

@dralley dralley force-pushed the filter-old-pkgs branch 10 times, most recently from 0679f60 to 0bbf62f Compare May 31, 2022 03:51
@dralley
Copy link
Contributor Author

dralley commented May 31, 2022

Successful proof of concept, tests using RHEL 7 repo:

immediate, no package limit

Sync time: I ran out of disk space and the whole thing failed (at least I think that was what happened, the DB started refusing to execute heartbeat writes), but it took about an hour to get ~80% finished

Package data downloaded: 60 Gb (measured separately)

immediate, package limit = 1

Sync time: 570 seconds (9m30s)

Package data downloaded: 4.9 Gb

on_demand, package limit = 1

Sync time reduced from 460 -> 203 seconds (7m40s -> 3m3s)

One thing I (re) discovered is that the EVRs need to be grouped by arch, which makes my calculation in the initial post wrong. Instead of ~4200 packages vs. 32,000 it's ~5500 packages vs 32,000. Didn't change the situation too much though.

@dralley dralley force-pushed the filter-old-pkgs branch 6 times, most recently from bd5843f to 7ae000a Compare June 3, 2022 17:17
@dralley dralley requested review from ggainey and ipanova June 3, 2022 17:17
@dralley dralley marked this pull request as ready for review June 3, 2022 17:18
@@ -0,0 +1,243 @@
# Sourced from https://github.com/nexB/univers
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vendor, or pull the whole huge package? It supports version comparisons for a dozen different package types (arch, debian, python, ruby, go, etc.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take would be to just take the code in-line, rather than dragging a huge surface into pulp_rpm most of which we do not need.

@dralley
Copy link
Contributor Author

dralley commented Jun 3, 2022

@ipanova @ggainey This is ready for a first-pass lookthrough, I left some open questions inline.

@dralley dralley force-pushed the filter-old-pkgs branch 2 times, most recently from 17b2bab to 47fa6c2 Compare June 3, 2022 20:42
@dralley
Copy link
Contributor Author

dralley commented Jun 4, 2022

So the new test I wrote actually passes despite failing on CI, but the unfortunate reason is that the packages used in our fixtures are the same packages, so if the non-modular fixture gets synced first Pulp sets is_modular: false on the content unit and if the modular fixture gets synced first then Pulp sets is_modular: true. I think this is going to potentially cause problems for a lot of tests and it might be a general problem as well.

@dralley dralley force-pushed the filter-old-pkgs branch from aaf699c to 657880a Compare June 4, 2022 01:04
@dralley
Copy link
Contributor Author

dralley commented Jun 4, 2022

The first commit and second commit use two different strategies if you want to compare. The second is simpler and probably safer.

@dralley dralley force-pushed the filter-old-pkgs branch 3 times, most recently from 066c25b to a65e277 Compare June 16, 2022 14:28
@dralley dralley force-pushed the filter-old-pkgs branch 2 times, most recently from 1e73e66 to eeb49ae Compare June 24, 2022 15:59

# TODO: should this reflect the actual # of packages or just the # synced?
# Do we need a new progress bar?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need feedback on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just synced otherwise it can bring questions "why the counter is at 95/100 packages"

@dralley dralley force-pushed the filter-old-pkgs branch 2 times, most recently from 503a915 to c3e6bf8 Compare June 26, 2022 03:28
@dralley
Copy link
Contributor Author

dralley commented Jun 26, 2022

I deleted the previous coment about TODOs, because one was addressed and the other I realized that if it is a problem, it's a problem we would already have, not a new one. I will file it as a new task to investigate that.

@dralley
Copy link
Contributor Author

dralley commented Jun 26, 2022

@ipanova This should be ready for review now. Tests were failing because of #2617

Copy link
Member

@ipanova ipanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to rebase once you merge the #2611
I gave it a look and changes seem to be fine.

@dralley dralley force-pushed the filter-old-pkgs branch from 4dd667e to 601dfb5 Compare July 3, 2022 04:10
Copy link
Contributor

@ggainey ggainey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Def a significant improvement. Between the experimentation you did initially, the test modifications, and going thru the code-logic in my head a few times, it looks like it gives us what we (and our users) are looking for. lg2m!

@dralley dralley merged commit 788687f into pulp:main Jul 18, 2022
@dralley dralley deleted the filter-old-pkgs branch July 18, 2022 13:58
@patchback
Copy link

patchback bot commented Jul 18, 2022

Backport to 3.17: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 788687f on top of patchback/backports/3.17/788687f1fe1f9d443f3edd855070fde5c9caaceb/pr-2547

Backporting merged PR #2547 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulp_rpm.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.17/788687f1fe1f9d443f3edd855070fde5c9caaceb/pr-2547 upstream/3.17
  4. Now, cherry-pick PR Use retain_package_versions to filter out older pkgs at sync time #2547 contents into that branch:
    $ git cherry-pick -x 788687f1fe1f9d443f3edd855070fde5c9caaceb
    If it'll yell at you with something like fatal: Commit 788687f1fe1f9d443f3edd855070fde5c9caaceb is a merge but no -m option was given., add -m 1 as follows intead:
    $ git cherry-pick -m1 -x 788687f1fe1f9d443f3edd855070fde5c9caaceb
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Use retain_package_versions to filter out older pkgs at sync time #2547 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.17/788687f1fe1f9d443f3edd855070fde5c9caaceb/pr-2547
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@patchback
Copy link

patchback bot commented Jul 26, 2022

Backport to 3.14: 💔 cherry-picking failed — conflicts found

❌ Failed to cleanly apply 788687f on top of patchback/backports/3.14/788687f1fe1f9d443f3edd855070fde5c9caaceb/pr-2547

Backporting merged PR #2547 into main

  1. Ensure you have a local repo clone of your fork. Unless you cloned it
    from the upstream, this would be your origin remote.
  2. Make sure you have an upstream repo added as a remote too. In these
    instructions you'll refer to it by the name upstream. If you don't
    have it, here's how you can add it:
    $ git remote add upstream https://github.com/pulp/pulp_rpm.git
  3. Ensure you have the latest copy of upstream and prepare a branch
    that will hold the backported code:
    $ git fetch upstream
    $ git checkout -b patchback/backports/3.14/788687f1fe1f9d443f3edd855070fde5c9caaceb/pr-2547 upstream/3.14
  4. Now, cherry-pick PR Use retain_package_versions to filter out older pkgs at sync time #2547 contents into that branch:
    $ git cherry-pick -x 788687f1fe1f9d443f3edd855070fde5c9caaceb
    If it'll yell at you with something like fatal: Commit 788687f1fe1f9d443f3edd855070fde5c9caaceb is a merge but no -m option was given., add -m 1 as follows intead:
    $ git cherry-pick -m1 -x 788687f1fe1f9d443f3edd855070fde5c9caaceb
  5. At this point, you'll probably encounter some merge conflicts. You must
    resolve them in to preserve the patch from PR Use retain_package_versions to filter out older pkgs at sync time #2547 as close to the
    original as possible.
  6. Push this branch to your fork on GitHub:
    $ git push origin patchback/backports/3.14/788687f1fe1f9d443f3edd855070fde5c9caaceb/pr-2547
  7. Create a PR, ensure that the CI is green. If it's not — update it so that
    the tests and any other checks pass. This is it!
    Now relax and wait for the maintainers to process your pull request
    when they have some cycles to do reviews. Don't worry — they'll tell you if
    any improvements are necessary when the time comes!

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make retain_package_versions more useful at sync time
3 participants