From e370930f8719079e8202b474b9de806218aad405 Mon Sep 17 00:00:00 2001 From: Nathan Levesque Date: Mon, 5 Aug 2024 16:42:49 -0400 Subject: [PATCH] Doc on how to optimize db query performance --- how-to/performance.md | 103 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 103 insertions(+) create mode 100644 how-to/performance.md diff --git a/how-to/performance.md b/how-to/performance.md new file mode 100644 index 0000000..31e9544 --- /dev/null +++ b/how-to/performance.md @@ -0,0 +1,103 @@ + +--- +parent: How To +--- +# Optimize DB Queries + +### Areas of Focus + +- REST APIs +- Bulk operations (e.g. celery tasks) + +### Causes of bad performance + +- Missing DB indexes (sorting and filtering) +- N+1 queries (these will compound on each other) + - DB queries in serializer methods (excluding `create()`, `update()`, and validators) + +### Patterns + +For the below patterns, presume the following models as a starting point: + +```python +from django.db import models + +class Company(models.Model): + name = models.CharField() + +class Employee(models.Model): + name = models.CharField() + retired = models.BooleanField(default=False) + employer = models.ForeignKey(Company, related_name="employees") +``` + +#### Additional selections + +This strategy correlates to the `QuerySet.select_related()` method. It results in a `JOIN` so it can also be detrimental to performance if it's overused. It is useful to models for which there is a 1-1 relationship. + +Example: + +```python +# 1 db query +Employee.objects.select_related("employer") +``` + +#### Prefetching + +Prefetching optimizes away N+1 issues by bulk fetching nested data within a query. It is applied by using the `QuerySet.prefetch_related` method. + +Basic example: + +```python +Company.objects.prefetch_related("employees") +``` + +A more complex example would be if we want to limit the list of employees to those that aren't retired: + +```python +companies = Company.objects.prefetch_related( + Prefetch("employees", Employee.objects.filter(retired=False)) +) +for company in companies: + for employee in company.active_employees.all(): + print(employee.name) +``` + +An even better solution is to put this on a different property so we leave `employees` unfiltered: + +```python +companies = Company.objects.prefetch_related( + Prefetch("employees", Employee.objects.filter(retired=False), to_attr="active_employees") +) + +for company in companies: + for employee in company.active_employees.all(): + print(employee.name) +``` + +Rounding out these examples, since we're now assigning to a separate property, we can update our `Company` model with a custom property to take advantage of this: + +```python +class Company: + ... + @cached_property + def active_employees(self): + # if we've prefetched to _active_employees, use that + if hasattr(self, "_active_employees"): + return self._active_employees + + # otherwise fallback to the slower but still accurate N+1 queries + return self.employees.filter(retired=False) + +# both these queries result in the name correctly filtered active_employees list but perform differently +# this ensures we always have correct data but ideally we've prefetched it +# performs N+1 queries +Company.objects.all() + +# performs 2 queries +Company.objects.prefetch_related( + Prefetch("employees", Employee.objects.filter(retired=False), to_attr="_active_employees") +) + + +```