Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serve previously published artifacts for up to 3 days #925

Merged
merged 1 commit into from
Feb 1, 2024

Conversation

daviddavis
Copy link
Contributor

@daviddavis daviddavis commented Nov 1, 2023

This code requires pulp/pulpcore#4636.

fixes #911

@daviddavis daviddavis changed the title Serve previously published artifacts for 3 days Serve previously published artifacts for up to 3 days Nov 1, 2023
@daviddavis daviddavis marked this pull request as ready for review November 6, 2023 21:35
@daviddavis
Copy link
Contributor Author

I got the code to use a single query.

I'm still thinking we'd only like to serve the by-hash files (but not other old artifacts) for 3 days. Any objection to me adding a PUBLICATION_CACHE_PATH_REGEX setting?

@quba42 quba42 added the .feature CHANGES/<issue_number>.feature label Nov 13, 2023
@hstct
Copy link
Contributor

hstct commented Nov 20, 2023

Due to recent CI changes this PR needs to be rebased on top of the current main branch

@daviddavis
Copy link
Contributor Author

@hstct done.

@quba42
Copy link
Collaborator

quba42 commented Dec 13, 2023

Maybe I am missing something obvious, but I am having trouble understanding this features usage. My test workflow was to set APT_BY_HASH = True, and then perform a full sync publish workflow on a test repo, remove some content from the test repo, publish and distribute the new version, and then look for files from the original publication on the content app, but I did not end up finding them. Can you confirm that this is what I should be doing? (It is always possible I somehow did not apply the change to my development environment correctly and therefor wasn't testing the right code state...)

I think if I could find some verifiable difference with and without this change to some standard workflow, it would really help me move the review along. Please help out my morning brain. 😉

@daviddavis
Copy link
Contributor Author

daviddavis commented Dec 13, 2023

@quba42 when you say you looked for the content, I assume you were looking at the directory listing pages? The old cached content does not show up there. Rather you have to try to request it directly. One thing you could test is noting the path to a particular by-hash file, change the content, republish, and then request the by-hash file. This would also work for a package that you removed too.

@quba42
Copy link
Collaborator

quba42 commented Jan 2, 2024

@daviddavis I just keep getting 404's for everything I try (regardless if I am using curl, wget, or browser). Tried for both metadata and packages in a clean oci-env. I must be missing something trivial!

Example I tried:

curl http://localhost:5001/pulp/content/apt_by_hash_test/dists/pulp/upload/binary-amd64/by-hash/SHA256/74ae5140c9e318f851c9168f13dcbb2a3a66db06e65da40943203e44e4f1caf4
404: Not Found

Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found the problem why my workflow has not been working, see the comment below:

@@ -47,6 +61,47 @@ class AptDistribution(Distribution):

TYPE = "apt-distribution"
SERVE_FROM_PUBLICATION = True
PUBLICATION_CACHE_DURATION = timedelta(days=3)

@hook(AFTER_SAVE, when="publication", has_changed=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the condition for this hook is insufficient to ensure that a DistributedPublication is created for all possible workflows.

In my case, when I run the quickstart example from the docs, no DistributedPublication is created. If I modify the workflow to create the distribution before the publication, then everything works as expected (the DistributedPublication is created). The problem is that I create the publication first (at which time the publication is not yet distributed, so the hook on the publication does not create a DistributedPublication), but the hook on the distribution only fires when publication has changed, which is not the case, because I am relying on the repository link, rather than a direct publication link. See:

$ pulp deb distribution list
[
  {
    "pulp_href": "/pulp/api/v3/distributions/deb/apt/018ccf09-46a2-7589-b318-9094a2b6bbe8/",
    "pulp_created": "2024-01-03T11:15:40.835141Z",
    "base_path": "quickstart-nginx-bookworm-amd642",
    "base_url": "http://localhost:5001/pulp/content/quickstart-nginx-bookworm-amd642/",
    "content_guard": null,
    "hidden": false,
    "pulp_labels": {},
    "name": "quickstart-nginx-bookworm-amd642",
    "repository": "/pulp/api/v3/repositories/deb/apt/018ccf09-03be-7e77-ad7b-00e23902aeed/",
    "publication": null
  },
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah interesting find. I think I need to add another hook then to handle this case.

Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just someone more minor questions I have (in addition to my previous review):

pulp_deb/app/models/publication.py Show resolved Hide resolved
pulp_deb/app/models/publication.py Show resolved Hide resolved
pulp_deb/app/models/publication.py Outdated Show resolved Hide resolved
@daviddavis daviddavis force-pushed the fix911 branch 4 times, most recently from 0290a3f to e5b5179 Compare January 3, 2024 16:30
@daviddavis
Copy link
Contributor Author

While updating the hooks, I realized there was a problem with the design: the life of a DistributedPublication is based on when its created and not when it was last used to serve content. Which is a problem because it'll get cleaned up early.

So I've updated the code to create the DistributedPublication so it gets created with the previous publication when a new publication is used to serve content. But this creates a new problem: the list of DistributedPublications doesn't include current publication so it serves the old content over the new/current publication content. I need to look more into this.

I also tried to look at simplifying the code since having it create DistributedPublication with the previous publication makes it quite unwieldy. My first thought was to continue creating DistributedPublication with the current publication and add an expires_at field but this has the same problem: the life of the DistributedPublication is based on when its created and it'll get deleted too early. So I think I need to stick with creating DistributedPublication with the previous publication.

@daviddavis daviddavis marked this pull request as draft January 3, 2024 16:56
@daviddavis daviddavis force-pushed the fix911 branch 8 times, most recently from 15971c6 to 9a6d6b8 Compare January 3, 2024 19:58
@daviddavis
Copy link
Contributor Author

Ok, I think I've resolved the issue in the most practical/simplest way. Instead of creating a DistributedPublication for the previous publication, I create a DistributedPublication for the current publication being served by the Distribution. Then when a new publication is created (or set on the Distribution), older DistributedPublications will have their expires_at field set.

@daviddavis
Copy link
Contributor Author

Lint failure is unrelated to this PR. Fix is here: #998.

Copy link
Collaborator

@quba42 quba42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks pretty clean and reduced, and for the first time everything worked as expected using my chosen test workflow.

Am I forgetting anything or are we finally ready to merge? 😉

@daviddavis
Copy link
Contributor Author

I think this is ready to merge unless anyone else has any other last minute feedback.

@quba42 quba42 merged commit 999ee50 into pulp:main Feb 1, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
.feature CHANGES/<issue_number>.feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[DESIGN] A phasing effect for publications that were recently served by a distribution.
5 participants