503s Solr Performance Issues #10287
Labels
Lead: @mekarpeles
Issues overseen by Mek (Staff: Program Lead) [managed]
Module: Solr
Issues related to the configuration or use of the Solr subsystem. [managed]
Priority: 0
Fix now: Issue prevents users from using the site or active data corruption. [managed]
Type: Post-Mortem
Log for when having to resolve a P0 issue
Summary
Since ~Jan 1, 2025, the website has been seeing lots of 503s and slow performance
(Annotations needed on each)
Sentry performance shows increase in request load time
Digging into book page, solr seems to be the bottleneck
Solr strain evident on CPU charts
Increase in errors (related to solr 503s in sentry):
Load times correlated with decrease in page views and error response codes:
A part of our solr re-indexing flow is to run an index
optimize
. If we don't rebuild the frequently enough, the current index may become fragmented.Possible a lack of
optimization
step on our index is partially responsible for the problem.There was also low free disk space
What fixed it?
What was the impact?
Degraded website performance over multiple days as solr traffic increased
Followup / What could have gone better
Monitoring of:
More routine:
docker prune
(to be handled by Add ol-solr0 to deploy flow #10239running solr index optimize (and possibly automated monthly re-indexing)
Followup actions:
Steps to close
Affects:
label applied?The text was updated successfully, but these errors were encountered: