-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add scripts/cron_watcher.py #6366
Conversation
I believe that we should use the ia library to fetch the newest dump, eliminating the need for html scraping |
I suggest we rewrite find_last_months_dumps_on_ia as from internetarchive import search_items
from datetime import date, timedelta
last_day_of_last_month = date.today().replace(day=1) - timedelta(days=1)
yyyy_mm = f"{last_day_of_last_month:%Y-%m}"
def find_last_months_dumps_on_ia(yyyy_mm: str = yyyy_mm) -> bool:
"""
Return True if both ol_dump_yyyy and ol_cdump_yyyy files have been saved on the
Internet Archive.
"""
prefixes = (f"ol_dump_{yyyy_mm}", f"ol_cdump_{yyyy_mm}")
found=0
for item in search_items("collection:ol_exports"):
if item["identifier"].startswith(prefixes):
found+=1
if found >= 2:
break
return found>=2 |
Unfortunately, the current |
ok, maybe we should talk to api directly then (with httpx)? |
I think these calls should be negligible in cost and using the IA tool seems right |
&& || ; in cron? |
A script that checks to see that our cron jobs have competed their respective tasks.
Currently it checks to see if the ol-dump and ol-cdumps for the previous month are stored at
Long term, the script should cover more cron jobs:
Daily Cron-audit task (Python) sentry (who watches the watchers)
If dump and cdump for last YYYY-MM were not uploaded to archive.org
Or if sitemaps updated for this YYYY-MM were not on transferred to ol-www
Or if partner dumps for this YYYY-MM were not uploaded to archive.org
Or if there have been no imports in last 48 hours (i.e. 2 days)
Or if DD>17 for YYYY-MM and bwb
batchname
doesn’t exist in import psql tableThen send daily email with failures only or slack failures
Sentry cron-jobs:
https://sentry.archive.org/organizations/ia-ux/projects/ol-cron-jobs
Technical
Testing
Screenshot
Stakeholders