Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only update the page title for non-error versions #1061

Conversation

Mr0grog
Copy link
Member

@Mr0grog Mr0grog commented Jan 17, 2023

If a new version is an error (e.g. a 404 status code because the page was removed), don't update the page record's title with the title from the error response. This also uses the filename component of the URL as a fallback title in case no versions with a parsed title can be found.

Fixes #751.
Fixes #468.

To-Do:

  • Check for error statuses before updating
  • Fall back to URL-based title
  • Add data migration to reset all page titles

@Mr0grog
Copy link
Member Author

Mr0grog commented Jan 19, 2023

Taking a slight pause here; I discovered that we wound up with some placeholder titles (e.g. "None", ""No title available") from Versionista in the distant past. Going to see about a data migration to remove those.

If a new version is an error (e.g. a 404 status code because the page was removed), don't update the page record's title with the title from the error response. Fixes #751.
We have a lot of PDFs with no title, so fall back to the last path component of the URL (i.e. the filename) for a title if nothing else can be found.
@Mr0grog Mr0grog force-pushed the 751-if-every-error-page-has-the-same-title-its-hard-to-know-which-page-you-are-talking-about branch from f1692b4 to 6ec8131 Compare January 19, 2023 18:23
@Mr0grog Mr0grog merged commit 387f8a8 into main Jan 19, 2023
@Mr0grog Mr0grog deleted the 751-if-every-error-page-has-the-same-title-its-hard-to-know-which-page-you-are-talking-about branch January 19, 2023 18:38
Mr0grog added a commit that referenced this pull request Jan 19, 2023
Mr0grog added a commit that referenced this pull request Jan 19, 2023
I messed up a bunch of page titles in #1061 because I overlooked the fact that `Version#sync_page_title` only functioned if called on the latest version. This fixes the issue by moving all the meaningful logic about where and when to grab a title to `Page#update_page_title` and changes `Version#sync_page_title` to just call that, but with an argument that tells it to only look forward from the version's capture time. That's not *exactly* the same behavior for the Version method, but gets us effectively the same result. This should also make the migration run faster.
Mr0grog added a commit that referenced this pull request Jan 19, 2023
I messed up a bunch of page titles in #1061 because I overlooked the fact that `Version#sync_page_title` only functioned if called on the latest version. This fixes the issue by moving all the meaningful logic about where and when to grab a title to `Page#update_page_title` and changes `Version#sync_page_title` to just call that, but with an argument that tells it to only look forward from the version's capture time. That's not *exactly* the same behavior for the Version method, but gets us effectively the same result. This should also make the migration run faster.
Mr0grog added a commit to edgi-govdata-archiving/web-monitoring-ops that referenced this pull request Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant