-
Notifications
You must be signed in to change notification settings - Fork 918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bedrock performance tooling and initial optimisations #15513
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15513 +/- ##
==========================================
+ Coverage 79.06% 79.13% +0.06%
==========================================
Files 160 160
Lines 8355 8411 +56
==========================================
+ Hits 6606 6656 +50
- Misses 1749 1755 +6 ☔ View full report in Codecov by Sentry. |
Talking with @bkochendorfer, we're both OK with the idea of using Redis for the Web deployments too (the same Redis we use for the CMS task queues because there's little real-world risk of task eviction if Redis gets low on memory). As such, I will follow this PR with another one focused on using Redis as an FE cache backend. @pmac @robhudson Are you aware of any issues that are likley switching from LocMem to Redis for a cache - is there anywhere where we specifically exploit the local nature of it? |
I don't know of any issues specific to locmem other than this will be a lot slower. I am a little concerned about using the same redis for cache and queue. We are doing this with basket, but that's a much lower volume. My understanding is that you'd use very different settings in redis for a cache vs. queue. |
I hear you. I'm happy to try these cache uses based on LocMemCache for now and see what the uplift is like. My leaning towards a shared cache was to reduce cache misses from pods with cold local-memory caches, but doing things one step as a time is good - may not need to use a shared cache once the pods are warm. |
if qs is None: | ||
qs = Position.objects.exclude(job_locations="Remote") | ||
cache.set(_key, qs, settings.CACHE_TIME_SHORT) | ||
return qs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we cache the queryset or cache the list of objects?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally yeah, we'd cache the objects, not the qs, but it's a bit more fiddly than I expected.
Because these are CBVs, cacheing the queryset in get_queryset
is easier to do (and to discover) than cacheing the Position
objects themselves, because there's no clean way to slot in cacheing when the view sets the object_list
attribute that I can see
Later, even though we don't use pagination in these pages, the list view runs the data through paginate_queryset()
which currently (Django 4.2.x) takes a queryset
arg but treats it as a general iterable -- all of which means we could make get_queryset
return a list of objects (from the cache), not a QuerySet
, but it kind of muddies things.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sharing that. It's a bit more complex than first glance. I'm actually a bit surprised we can cache the queryset objects, which seems more complex. I suppose they get pickled? But this is fine with me, I was mostly curious about your thoughts on the consideration. Thanks!
from bedrock.careers.models import Position | ||
from bedrock.careers.tests import PositionFactory | ||
from bedrock.mozorg.tests import TestCase | ||
|
||
|
||
class TestPositionModel(TestCase): | ||
def setUp(self): | ||
cache.clear() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for bonus points you could test that the db call doesn't get triggered on a 2nd call to the wrapped methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - good call. Time to break out assertNumQueries
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+wc
6784cc0
to
9e6ee0f
Compare
3683216
to
dc54a7d
Compare
Also add script to hit some common/popular URLs to give djanjo-silk some traffic to capture
Saves 11 SQL queries on the releasnotes page by cacheing the country code lookup for an hour. Tested on /en-US/firefox/132.0.1/releasenotes/ Cold cache: 14 queries / 2066ms Warm cache: 3 queries / 222ms
Caches derived values (location options, etc) for longer than the actual page of positions Takes careers listing page down from 9 queries to 2 for the CMS deployment and 0 on the Web deployment
It's used on the newsletter form which shows up in a bunch of places on the site.
dc54a7d
to
dbd3748
Compare
This changeset adds support for profiling Bedrock using django-silk locally (or anywhere the bedrock_test image is used - but not in production).
It also contains some optimisations - via cacheing - to reduce the DB queries executed on the busiest pages.
Significant changes and points to review
Please revew this PR commit by commit, paying sceptical attention to the usage of cacheing etc. They contain details of the number queries saved.
I know that at the moment Bedrock is using a verson of Django's LocMemCache backend - this means that for 45 pods in production, we'll still get plenty of cache misses until the pods have all had a call that warms the cache. It might be that, given the TTL of the cached items, we never really get to that point in prod, but we will in Dev and Stage where there are far fewer pods.
We'd certainly get more cacheing uplift if we had a shared cache backend, such as Redis. Given we now have Redis in play for the rq backend, we could switch it at this point (expanding this PR), or as a separate change - opinions welcome!
Issue / Bugzilla link
#15505
Testing
Unit tests passing should be enough here, but feel free to follow the notes in
profiling/hit_popular_pages.py
to test drive things yourself.Questions
Is the addition of django-silk work mentioning in formal documentation?