-
-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Page titles should not be updated by versions that are errors #751
Labels
Comments
def is_error?(strict: false)
return true if status && status >= 400
return false if strict
# Use some basic heuristics to determine if a page with an OK
# status code is actually an error. There's lots of room for
# improvement here.
case_title = title.downcase
(
case_title.include?('error 404')
|| (case_title.match?(/\b404\b/) && title.include?('not found'))
|| case_title.match?(/(page|file)( was)? not found/)
|| case_title.split(/\s*(?:-+|\|)\s*/).any? {|p| p == 'restricted access'}
)
end |
17 tasks
Mr0grog
added a commit
that referenced
this issue
Jan 17, 2023
If a new version is an error (e.g. a 404 status code because the page was removed), don't update the page record's title with the title from the error response. Fixes #751.
This was referenced Jan 17, 2023
Mr0grog
added a commit
that referenced
this issue
Jan 19, 2023
If a new version is an error (e.g. a 404 status code because the page was removed), don't update the page record's title with the title from the error response. Fixes #751.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The
Version
model has anafter_create
callback that updates its page’stitle
field:web-monitoring-db/app/models/version.rb
Lines 107 to 112 in 213dc89
However, it shouldn’t update the page title if the version in question represents an error — the page model represents the conceptual page (not the URL) that we are tracking, and it’s unhelpful to rename the page to represent a temporary error, or worse, a “not found” message when the page is permanently removed. It makes it hard to tell what page we’re talking about.
For example, this page: https://monitoring.envirodatagov.org/page/9c85fc4f-c0ef-4327-ac4e-df2b6314917f/ba9b8e97-668d-4c92-bf95-d2ee5ffa46f2..0f4b06b2-6473-4ff6-a5e6-aada11f65cdd
…shows the title “Page Not Found | USDA” when it should show “USDA | Office of the Chief Economist | Climate Change | Effects | Climate Change and Agriculture Report.”
To do this, we should probably add a method on
Version
namedis_error?
or similar rather than directly checking the status code so that we can account for things like #468 in the future.The text was updated successfully, but these errors were encountered: