Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to solr performance #7037

Merged

Conversation

cdrini
Copy link
Collaborator

@cdrini cdrini commented Sep 30, 2022

Work towards #7017

Technical

  • Switch to soft commits instead auto of explicit hard commits
    • Commits are what make user edits visible to search
    • Soft commits make user edits visible to search but do not persist them to disk. The cost of this, is that they stick around in Solr's updateLog until a hard commit is performed.
    • Also soft commits will invalidate active searchers, meaning new searchers need to be warmed
    • But hard commits, which persist updates to disk, are slow
    • We now auto soft commit every minute, and hard commit every 2 minutes
    • Previous we would hard commit by sending explicit commit in solr-updater every minute
    • In a future PR, I will remove some of the code around explicit commits sent by solr-updater, if everything looks good
    • In a future PR, we should also remove commitWithin wherever it appears, but commitWithin performs a soft commit, so doesn't really matter.
  • Add newSearcher cache warming
    • I noticed that after a soft commit, solr would serve 503s for ~3s. I.e. every minute, there would be 3s of 503s from solr. This is because after a soft commit, the old searchers and their caches are invalidated, and solr need to re-warm up
    • Added newSearcher cache warming, so when a newSearcher is created (ie after a soft commit), it'll run queries to warm up the cache
    • I chose queries we actually use, and which happen frequently (according to Google Analytics). These will need to be updated for edition-aware solr once that's the default, so we get better caching
    • It would be good to actually autogen this section using the functions in worksearch/code.py , so we're guaranteed to be caching the correct queries
  • Switched to CaffeineCache
    • This is the default in Solr 9, and the FastLRUCache, the default in solr 8, is removed in solr 9. So thought it might be good.
  • Set autoWarmCount on all applicable caches
    • I noticed that the cache warming queries would show "autowarm took 0ms" in solr's docker container logs. That didn't look right. Increasing this number, cause this to be something like "autowarm took 18000ms", so it was actually running and making use of the cache!
    • This is what finally fixed the 503 spike after a soft commit!
      image
  • Increase max warming searchers
    • This was recommended to be slightly greater than 2 for Leader nodes. Since we're not in a cluster, our one and only solr node is a leader! So bumped up

Testing

Patch deployed onto prod and monitored perf metrics. Performance was unchanged, but 503 errors drastically dropped.

Patch deploying a solrconfig.xml change looks like:

# On staging solr
docker cp conf/solr/conf/solrconfig.xml solr_builder_solr_1:/var/solr/data/openlibrary/conf/solrconfig.xml
docker restart solr_builder_solr_1

# On prod solr
docker cp conf/solr/conf/solrconfig.xml openlibrary_solr_1:/var/solr/data/openlibrary/conf/solrconfig.xml
docker restart openlibrary_solr_1

The process is similar for local environment, which is where I tested first.

Screenshot

Stakeholders

@cdrini cdrini changed the title Disable solr_updater hard commits Improvements to solr performance Oct 4, 2022
@cdrini cdrini marked this pull request as ready for review October 4, 2022 18:40
@cdrini cdrini marked this pull request as draft October 4, 2022 18:44
@cdrini cdrini marked this pull request as ready for review October 4, 2022 18:50
@cdrini cdrini added the Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle. label Oct 4, 2022
@mekarpeles mekarpeles added the Priority: 1 Do this week, receiving emails, time sensitive, . [managed] label Oct 4, 2022
@mekarpeles mekarpeles self-assigned this Oct 4, 2022
Copy link
Member

@mekarpeles mekarpeles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, pending research into hard commit timeouts

@mekarpeles mekarpeles merged commit f1e1aa3 into internetarchive:master Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Patch Deployed This PR has been deployed to production independently, outside of the regular deploy cycle. Priority: 1 Do this week, receiving emails, time sensitive, . [managed]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants