Skip to content

Commit

Permalink
Version 0.3.0: support for separate where_postgresql='' and where_sql…
Browse files Browse the repository at this point in the history
…ite='' expressions.
  • Loading branch information
mattiaslinnap committed Aug 7, 2017
1 parent 69fe048 commit 1482863
Show file tree
Hide file tree
Showing 8 changed files with 240 additions and 48 deletions.
70 changes: 60 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,30 @@ Requirements:

## Usage

Set up a PartialIndex and insert it into your model's class-based Meta.indexes list:

```python
from partial_index import PartialIndex

class MyModel(models.Model):
class Meta:
indexes = [
PartialIndex(fields=['user', 'room'], unique=True, where='deleted_at IS NULL'),
PartialIndex(fields=['created_at'], unique=False, where_postgresql='is_complete = false', where_sqlite='is_complete = 0'),
]
```

Of course, these (unique) indexes could be created by a handwritten [RunSQL migration](https://docs.djangoproject.com/en/1.11/ref/migration-operations/#runsql).
But the constraints are part of the business logic, and best kept close to the model definitions.

### Partial unique constraints

With `unique=True`, this can be used to create unique constraints for a subset of the rows.
For example, to enforce that each user can only have one non-deleted room booking at a time:

For example, you might have a model that has a deleted_at field to mark rows as archived instead of deleting them forever.
You wish to add unique constraints on "alive" rows, but allow multiple copies in the archive.
[Django's unique_together](https://docs.djangoproject.com/en/1.11/ref/models/options/#unique-together) is not sufficient here, as that cannot
distinguish between the archived and alive rows.

```python
from partial_index import PartialIndex
Expand All @@ -43,34 +65,62 @@ class RoomBooking(models.Model):
]
```

### Partial non-unique indexes

With `unique=False`, partial indexes can be used to optimise lookups that return only a small subset of the rows.
For example, on a job queue table with millions of completed, and very few pending jobs, it can be used to
speed up a "find next pending job" query:

For example, you might have a job queue table which keeps an archive of millions of completed rows. Among these are a few pending jobs,
which you want to find with a `.filter(is_complete=0)` query.

```python
from partial_index import PartialIndex

class Job(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
is_complete = models.BooleanField(default=False)
is_complete = models.IntegerField(default=0)

class Meta:
indexes = [
PartialIndex(fields=['created_at'], unique=False, where='is_complete = false')
PartialIndex(fields=['created_at'], unique=False, where='is_complete = 0')
]
```

Compared to an usual full index on the `is_complete` field, this can be significantly smaller on disk and memory, and faster to update.

### Different where-expressions for PostgreSQL and SQLite

Note that the where-expression is directly inserted into the `CREATE INDEX` sql statement, and must be valid for your database backend.
This means that you would have to use `where='is_complete = false'` on PostgreSQL and `where='is_complete = 0'` on SQLite for the Job model.
Using [Django's query expressions](https://docs.djangoproject.com/en/1.11/ref/models/expressions/) that check the syntax and generate valid SQL

In rare cases, PostgreSQL and SQLite differ in the syntax that they expect. One such case is boolean literals:
SQLite only accepts numbers 0/1, and PostgreSQL only accepts unquoted false/true and a few quoted strings (but not numbers). You can provide
a separate where expression if you wish to support both backends in your project:

```python
from partial_index import PartialIndex

class Job(models.Model):
created_at = models.DateTimeField(auto_now_add=True)
is_complete = models.BooleanField(default=0)

class Meta:
indexes = [
PartialIndex(fields=['created_at'], unique=False, where_postgresql='is_complete = false', where_sqlite='is_complete = 0')
]
```

If the expressions for both backends are the same, you must use the single `where=''` argument for consistency.

It is up to you to ensure that the expressions are otherwise valid SQL and have the same behaviour. Using [Django's query expressions](https://docs.djangoproject.com/en/1.11/ref/models/expressions/) that check the syntax and generate valid SQL
for either database is planned for a future version.

Of course, these (unique) indexes could be created by a handwritten [RunSQL migration](https://docs.djangoproject.com/en/1.11/ref/migration-operations/#runsql).
But the constraints are part of the business logic, and best kept close to the model definitions.

## Version History

## 0.2.1 (latest)
### 0.3.0 (latest)
* Add support for separate `where_postgresql=''` and `where_sqlite=''` predicates, when the expression has different syntax on the two
database backends and you wish to support both.

### 0.2.1
* Ensure that automatically generated index names depend on the "unique" and "where" parameters. Otherwise two indexes with the same fields would be considered identical by Django.

### 0.2.0
Expand Down
51 changes: 38 additions & 13 deletions partial_index/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Provide a nicer error message than failing to import models.Index.

VERSION = (0, 2, 1)
VERSION = (0, 3, 0)
__version__ = '.'.join(str(v) for v in VERSION)


Expand Down Expand Up @@ -31,49 +31,74 @@ class PartialIndex(Index):
}

# Mutable default fields=[] looks wrong, but it's copied from super class.
def __init__(self, fields=[], name=None, unique=None, where=''):
def __init__(self, fields=[], name=None, unique=None, where='', where_postgresql='', where_sqlite=''):
if unique not in [True, False]:
raise ValueError('unique must be True or False')
if not where:
raise ValueError('where predicate must be provided')
raise ValueError('Unique must be True or False')
if where:
if where_postgresql or where_sqlite:
raise ValueError('If providing a single where predicate, must not provide where_postgresql or where_sqlite')
else:
if not where_postgresql and not where_sqlite:
raise ValueError('At least one where predicate must be provided')
if where_postgresql == where_sqlite:
raise ValueError('If providing a separate where_postgresql and where_sqlite, then they must be different.' +
'If the same expression works for both, just use single where.')
self.unique = unique
self.where = where
self.where_postgresql = where_postgresql
self.where_sqlite = where_sqlite
super(PartialIndex, self).__init__(fields, name)

def __repr__(self):
return "<%(name)s: fields=%(fields)s, unique=%(unique)s, where='%(where)s'>" % {
if self.where:
anywhere = "where='%s'" % self.where
else:
anywhere = "where_postgresql='%s', where_sqlite='%s'" % (self.where_postgresql, self.where_sqlite)

return "<%(name)s: fields=%(fields)s, unique=%(unique)s, %(anywhere)s>" % {
'name': self.__class__.__name__,
'fields': "'{}'".format(', '.join(self.fields)),
'unique': self.unique,
'where': self.where,
'anywhere': anywhere
}

def deconstruct(self):
path, args, kwargs = super(PartialIndex, self).deconstruct()
kwargs['unique'] = self.unique
kwargs['where'] = self.where
kwargs['where_postgresql'] = self.where_postgresql
kwargs['where_sqlite'] = self.where_sqlite
return path, args, kwargs

def get_valid_vendor(self, schema_editor):
vendor = schema_editor.connection.vendor
if vendor not in self.sql_create_index:
raise ValueError('Database vendor %s is not supported for django-partial-index.' % vendor)
return vendor

def get_sql_create_template_values(self, model, schema_editor, using):
parameters = super(PartialIndex, self).get_sql_create_template_values(model, schema_editor, using)
parameters['unique'] = ' UNIQUE' if self.unique else ''
# Note: the WHERE predicate is not yet checked for syntax or field names, and is inserted into the CREATE INDEX query unescaped.
# This is bad for usability, but is not a security risk, as the string cannot come from user input.
parameters['where'] = self.where
vendor = self.get_valid_vendor(schema_editor)
if vendor == 'postgresql':
parameters['where'] = self.where_postgresql or self.where
elif vendor == 'sqlite':
parameters['where'] = self.where_sqlite or self.where
else:
raise ValueError('Should never happen')
return parameters

def create_sql(self, model, schema_editor, using=''):
vendor = schema_editor.connection.vendor
if vendor not in self.sql_create_index:
raise ValueError('Database vendor %s is not supported for django-partial-index.' % vendor)

vendor = self.get_valid_vendor(schema_editor)
# Only change from super function - override query template to insert optional UNIQUE at start, and WHERE at the end.
sql_template = self.sql_create_index[vendor]
sql_parameters = self.get_sql_create_template_values(model, schema_editor, using)
return sql_template % sql_parameters

def name_hash_extra_data(self):
return [str(self.unique), self.where]
return [str(self.unique), self.where, self.where_postgresql, self.where_sqlite]

def set_name_with_model(self, model):
"""Sets an unique generated name for the index.
Expand Down
4 changes: 2 additions & 2 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,13 @@
setup(
name='django-partial-index',
packages=['partial_index'],
version='0.2.1',
version='0.3.0',
description='PostgreSQL and SQLite partial indexes for Django models',
long_description=open('README.md').read(),
author='Mattias Linnap',
author_email='mattias@linnap.com',
url='https://github.com/mattiaslinnap/django-partial-index',
download_url='https://github.com/mattiaslinnap/django-partial-index/archive/0.2.1.tar.gz',
download_url='https://github.com/mattiaslinnap/django-partial-index/archive/0.3.0.tar.gz',
license='BSD',
install_requires=[],
classifiers=[
Expand Down
2 changes: 1 addition & 1 deletion tests/runner.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def main(args):
django.setup()

from django.test.runner import DiscoverRunner
test_runner = DiscoverRunner(top_level=TESTS_DIR, interactive=False, keepdb=True)
test_runner = DiscoverRunner(top_level=TESTS_DIR, interactive=False, keepdb=False)
failures = test_runner.run_tests(['tests'])
if failures:
sys.exit(1)
Expand Down
32 changes: 26 additions & 6 deletions tests/test_models.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from testapp.models import User, Room, RoomBooking, Job


class PartialIndexModelTest(TestCase):
class PartialIndexRoomBookingTest(TestCase):
"""Test that partial unique constraints work as expected when inserting data to the db.
Models and indexes are created when django creates the test db, they do not need to be set up.
Expand Down Expand Up @@ -40,8 +40,28 @@ def test_roombooking_same_conflict(self):
with self.assertRaises(IntegrityError):
RoomBooking.objects.create(user=self.user1, room=self.room1)

def test_job_same(self):
now = timezone.now()
job1 = Job.objects.create(created_at=now)
job2 = Job.objects.create(created_at=now)
self.assertEqual(job1.created_at, job2.created_at)

class PartialIndexJobTest(TestCase):
"""Test that partial unique constraints work as expected when inserting data to the db.
Models and indexes are created when django creates the test db, they do not need to be set up.
"""
def test_job_same_id(self):
job1 = Job.objects.create(order=1, group=1)
job2 = Job.objects.create(order=1, group=2)
self.assertEqual(job1.order, job2.order)

def test_job_same_group(self):
Job.objects.create(order=1, group=1)
with self.assertRaises(IntegrityError):
Job.objects.create(order=2, group=1)

def test_job_complete_same_group(self):
job1 = Job.objects.create(order=1, group=1, is_complete=True)
job2 = Job.objects.create(order=1, group=1)
self.assertEqual(job1.order, job2.order)

def test_job_complete_later_same_group(self):
job1 = Job.objects.create(order=1, group=1)
job2 = Job.objects.create(order=1, group=1, is_complete=True)
self.assertEqual(job1.order, job2.order)
101 changes: 95 additions & 6 deletions tests/test_partial_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,23 +8,57 @@
from testapp.models import AB


class PartialIndexTest(SimpleTestCase):
class PartialIndexWhereRulesTest(SimpleTestCase):
"""Test the rules for providing where arguments."""

def test_no_where(self):
with self.assertRaisesMessage(ValueError, 'At least one where predicate must be provided'):
PartialIndex(fields=['a', 'b'], unique=True)

def test_single_and_pg_where_same(self):
with self.assertRaisesRegexp(ValueError, '^If providing a single'):
PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL', where_postgresql='a IS NULL')

def test_single_and_pg_where_different(self):
with self.assertRaisesRegexp(ValueError, '^If providing a single'):
PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL', where_postgresql='a IS NOT NULL')

def test_single_and_sqlite_where_same(self):
with self.assertRaisesRegexp(ValueError, '^If providing a single'):
PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL', where_sqlite='a IS NULL')

def test_single_and_sqlite_where_different(self):
with self.assertRaisesRegexp(ValueError, '^If providing a single'):
PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL', where_sqlite='a IS NOT NULL')

def test_all_where_same(self):
with self.assertRaisesRegexp(ValueError, '^If providing a single'):
PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL', where_postgresql='a IS NULL', where_sqlite='a IS NULL')

def test_all_where_different(self):
with self.assertRaisesRegexp(ValueError, '^If providing a single'):
PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL', where_postgresql='a IS NOT NULL', where_sqlite='a = 3')

def test_pg_and_sqlite_where_same(self):
with self.assertRaisesRegexp(ValueError, '^If providing a separate'):
PartialIndex(fields=['a', 'b'], unique=True, where_postgresql='a IS NULL', where_sqlite='a IS NULL')


class PartialIndexSingleWhereTest(SimpleTestCase):
"""Test simple fields and methods on the PartialIndex class."""

def setUp(self):
self.idx = PartialIndex(fields=['a', 'b'], unique=True, where='a IS NULL')

def test_no_unique(self):
with self.assertRaisesMessage(ValueError, 'unique must be True or False'):
with self.assertRaisesMessage(ValueError, 'Unique must be True or False'):
PartialIndex(fields=['a', 'b'], where='a is null')

def test_no_where(self):
with self.assertRaisesMessage(ValueError, 'where predicate must be provided'):
PartialIndex(fields=['a', 'b'], unique=True)

def test_fields(self):
self.assertEqual(self.idx.unique, True)
self.assertEqual(self.idx.where, 'a IS NULL')
self.assertEqual(self.idx.where_postgresql, '')
self.assertEqual(self.idx.where_sqlite, '')

def test_repr(self):
self.assertEqual(repr(self.idx), "<PartialIndex: fields='a, b', unique=True, where='a IS NULL'>")
Expand All @@ -36,6 +70,8 @@ def test_deconstruct(self):
self.assertEqual(kwargs['fields'], ['a', 'b'])
self.assertEqual(kwargs['unique'], True)
self.assertEqual(kwargs['where'], 'a IS NULL')
self.assertEqual(kwargs['where_postgresql'], '')
self.assertEqual(kwargs['where_sqlite'], '')
self.assertIn('name', kwargs) # Exact value of name is not tested.

def test_suffix(self):
Expand Down Expand Up @@ -73,3 +109,56 @@ def test_where_changes_generated_name(self):
idx2 = PartialIndex(fields=['a', 'b'], unique=False, where='a IS NOT NULL')
idx2.set_name_with_model(AB)
self.assertNotEqual(idx1.name, idx2.name)


class PartialIndexMultiWhereTest(SimpleTestCase):
"""Test simple fields and methods on the PartialIndex class with separate where_vendor='' arguments."""

def setUp(self):
self.idx = PartialIndex(fields=['a', 'b'], unique=True, where_postgresql='a = false', where_sqlite='a = 0')

def test_no_unique(self):
with self.assertRaisesMessage(ValueError, 'Unique must be True or False'):
PartialIndex(fields=['a', 'b'], where_postgresql='a = false', where_sqlite='a = 0')

def test_fields(self):
self.assertEqual(self.idx.unique, True)
self.assertEqual(self.idx.where, '')
self.assertEqual(self.idx.where_postgresql, 'a = false')
self.assertEqual(self.idx.where_sqlite, 'a = 0')

def test_repr(self):
self.assertEqual(repr(self.idx), "<PartialIndex: fields='a, b', unique=True, where_postgresql='a = false', where_sqlite='a = 0'>")

def test_deconstruct(self):
path, args, kwargs = self.idx.deconstruct()
self.assertEqual(path, 'partial_index.PartialIndex')
self.assertEqual((), args)
self.assertEqual(kwargs['fields'], ['a', 'b'])
self.assertEqual(kwargs['unique'], True)
self.assertEqual(kwargs['where'], '')
self.assertEqual(kwargs['where_postgresql'], 'a = false')
self.assertEqual(kwargs['where_sqlite'], 'a = 0')
self.assertIn('name', kwargs) # Exact value of name is not tested.

def test_suffix(self):
self.assertEqual(self.idx.suffix, 'partial')

def test_generated_name_ends_with_partial(self):
idx = PartialIndex(fields=['a', 'b'], unique=False, where_postgresql='a = false', where_sqlite='a = 0')
idx.set_name_with_model(AB)
self.assertEqual(idx.name[-8:], '_partial')

def test_where_postgresql_changes_generated_name(self):
idx1 = PartialIndex(fields=['a', 'b'], unique=False, where_postgresql='a = false', where_sqlite='a = 0')
idx1.set_name_with_model(AB)
idx2 = PartialIndex(fields=['a', 'b'], unique=False, where_postgresql='a = true', where_sqlite='a = 0')
idx2.set_name_with_model(AB)
self.assertNotEqual(idx1.name, idx2.name)

def test_where_sqlite_changes_generated_name(self):
idx1 = PartialIndex(fields=['a', 'b'], unique=False, where_postgresql='a = false', where_sqlite='a = 0')
idx1.set_name_with_model(AB)
idx2 = PartialIndex(fields=['a', 'b'], unique=False, where_postgresql='a = false', where_sqlite='a = 1')
idx2.set_name_with_model(AB)
self.assertNotEqual(idx1.name, idx2.name)
Loading

0 comments on commit 1482863

Please sign in to comment.