Docker + Ferm → Open Library Increased Errors + 503s #4706
Labels
Affects: Admin/Maintenance
Issues relating to support scripts, bots, cron jobs and admin web pages. [managed]
Priority: 0
Fix now: Issue prevents users from using the site or active data corruption. [managed]
Type: Post-Mortem
Log for when having to resolve a P0 issue
Summary
openlibrary.org became slugish w/ increased 503s. We noticed infobase down on ol-home0. We restarted ol-www1's nginx and haproxy as sometimes these become saturated. We checked ol-mem* to check our memory usage (in case of swapping).
Postmortem: @cdrini noticed infobase down on ol-home0, further investigation yielded that docker was having iptable issues (i.e. all docker on that host seemed in a strange state). Hypothesis is, during cron job testing 1h earlier, ferm + rsync rules + a checkout to olsystem may have affected the state of docker (which mounts olsystem and which may rely on ferm).
The error presented itself as:
Restarting docker (itself, i.e. the daemon) with sudo systemctl restart docker resolved the issue.
Opportunity for us to add nagios alerts to certain hosts on slack (as the only reason we noticed this issue was usage of the actual site).
Steps to close
Affects:
label applied?The text was updated successfully, but these errors were encountered: