Django has native support for Postgres' jsonb
field since version 1.9.
It comes with support for the most common queries.
But when looking at the documentation of postgres'
“JSON Functions and Operators”
we still find lots of stuff that cannot be accessed using Django's __
notation.
This post addresses this by showing how to use any of postgres' functions and operators from Django. Note that this is about functionality, not performance.
The playground
For this example we are storing authors and their books in the database, similar to an example from Django's documentation. But instead of many-to-many relationships we are using a jsonb field to store the books' names:
from django.contrib.postgres.fields import JSONField
from django.db import models
class Author(models.Model):
name = models.TextField
data = JSONField(null=True, blank=True)
Let's fill the database with something to play with:
Author.objects.bulk_create([
Author(name='First', data=None),
Author(name='Second', data=list()),
Author(name='Third', data={'books': []}),
Author(name='Fourth', data={'books': ['One', 'Two', 'Three']}),
Author(name='Fifth', data={'books': ['Two', 'Four']}),
])
Our task is to get the (unique) set of books that are present in the database. Condition is to do as much work as possible in the database and not in python.
A first attempt could be to get all the json fields from the database and do the rest in python:
# the order_by is for the assert; we don't really need it
result = Author.objects \
.order_by('pk') \
.values_list('data')
assert list(result) == [
None,
[],
{'books': []},
{'books': ['One', 'Two', 'Three']},
{'books': ['Two', 'Four']},
]
result = Author.objects \
.filter(data__isnull=False) \
.order_by('pk') \
.values_list('data', flat=True)
assert list(result) == [
[],
{'books': []},
{'books': ['One', 'Two', 'Three']},
{'books': ['Two', 'Four']},
]
For this example this looks good enough, but keep in mind that data
might
contain a massive load of content. Therefore the benefit of values_list
can be minimal.
ObjectAtPath
Looking at postgres' documentation
we find the operator #>
which exists since 9.3 to “get JSON object at
specified path”. In the given example
'{"a": {"b":{"c": "foo"}}}'::json#>'{a,b}'
returns {"c": "foo"}
.
In our example we do not need a path like a,b
, a key would be enough. But
since books
is a valid path, there's no point in restricting ourselves to
keys instead of paths.
Subclassing Func
we can easily create a class ObjectAtPath
which uses postgres' operator #>
and returns the object at a given path:
from django.db.models.aggregates import Func
class ObjectAtPath(Func):
function = '#>'
template = "%(expressions)s%(function)s'{%(path)s}'"
arity = 1
def __init__(self, expression, path, **extra):
# if path is a list, convert it to a comma separated string
if isinstance(path, (list, tuple)):
path = ','.join(path)
super().__init__(expression, path=path, **extra)
Now let's try it with a simple query:
result = Author.objects \
.filter(data__isnull=False) \
.order_by('pk') \
.annotate(books=ObjectAtPath('data', 'books')) \
.values_list('books', flat=True)
assert list(result) == [
None,
[],
['One', 'Two', 'Three'],
['Two', 'Four'],
]
We successfully got rid of all of data's elements we're not interested in. This is closer to the result we wanted in the first place, but we still have to merge the lists and take care of uniqueness in python.
JsonbArrayElements
We cannot use distinct, because instead of one list with all the books we have
a list of lists (and None
s). Postgres gives us jsonb_array_elements
(since
9.4) to solve this:
class JsonbArrayElements(Func):
function = 'jsonb_array_elements'
template = '%(function)s(%(expressions)s)'
arity = 1
Now we have a list of books and can use distinct:
result = Author.objects \
.filter(data__isnull=False) \
.order_by('pk') \
.annotate(
books=JsonbArrayElements(ObjectAtPath('data', 'books')),
) \
.values_list('books', flat=True)
assert list(result) == ['One', 'Two', 'Three', 'Two', 'Four']
result = Author.objects \
.filter(data__isnull=False) \
.annotate(
books=JsonbArrayElements(ObjectAtPath('data', 'books')),
) \
.order_by('books') \
.distinct('books') \
.values_list('books', flat=True)
assert list(result) == ['Four', 'One', 'Three', 'Two']
In the end we delegated all the work to the database. Keep in mind that this is not always what performs best, but now there's a choice.
Tell us what you think about this. Is something unclear? Do you have questions or ideas? Leave your comments below.