AX Semantics Developer Blog
  • Home
  • Categories
  • Tags
  • Archives

Django and Postgres' jsonb field beyond the basics

Django has native support for Postgres' jsonb field since version 1.9. It comes with support for the most common queries. But when looking at the documentation of postgres' “JSON Functions and Operators” we still find lots of stuff that cannot be accessed using Django's __ notation.

This post addresses this by showing how to use any of postgres' functions and operators from Django. Note that this is about functionality, not performance.

The playground

For this example we are storing authors and their books in the database, similar to an example from Django's documentation. But instead of many-to-many relationships we are using a jsonb field to store the books' names:

from django.contrib.postgres.fields import JSONField
from django.db import models


class Author(models.Model):
    name = models.TextField
    data = JSONField(null=True, blank=True)

Let's fill the database with something to play with:

Author.objects.bulk_create([
    Author(name='First', data=None),
    Author(name='Second', data=list()),
    Author(name='Third', data={'books': []}),
    Author(name='Fourth', data={'books': ['One', 'Two', 'Three']}),
    Author(name='Fifth', data={'books': ['Two', 'Four']}),
])

Our task is to get the (unique) set of books that are present in the database. Condition is to do as much work as possible in the database and not in python.

A first attempt could be to get all the json fields from the database and do the rest in python:

# the order_by is for the assert; we don't really need it
result = Author.objects \
    .order_by('pk') \
    .values_list('data')

assert list(result) == [
    None,
    [],
    {'books': []},
    {'books': ['One', 'Two', 'Three']},
    {'books': ['Two', 'Four']},
]

result = Author.objects \
    .filter(data__isnull=False) \
    .order_by('pk') \
    .values_list('data', flat=True)

assert list(result) == [
    [],
    {'books': []},
    {'books': ['One', 'Two', 'Three']},
    {'books': ['Two', 'Four']},
]

For this example this looks good enough, but keep in mind that data might contain a massive load of content. Therefore the benefit of values_list can be minimal.

ObjectAtPath

Looking at postgres' documentation we find the operator #> which exists since 9.3 to “get JSON object at specified path”. In the given example '{"a": {"b":{"c": "foo"}}}'::json#>'{a,b}' returns {"c": "foo"}.

In our example we do not need a path like a,b, a key would be enough. But since books is a valid path, there's no point in restricting ourselves to keys instead of paths.

Subclassing Func we can easily create a class ObjectAtPath which uses postgres' operator #> and returns the object at a given path:

from django.db.models.aggregates import Func


class ObjectAtPath(Func):

    function = '#>'
    template = "%(expressions)s%(function)s'{%(path)s}'"
    arity = 1

    def __init__(self, expression, path, **extra):
        # if path is a list, convert it to a comma separated string
        if isinstance(path, (list, tuple)):
            path = ','.join(path)
        super().__init__(expression, path=path, **extra)

Now let's try it with a simple query:

result = Author.objects \
    .filter(data__isnull=False) \
    .order_by('pk') \
    .annotate(books=ObjectAtPath('data', 'books')) \
    .values_list('books', flat=True)

assert list(result) == [
     None,
     [],
     ['One', 'Two', 'Three'],
     ['Two', 'Four'],
]

We successfully got rid of all of data's elements we're not interested in. This is closer to the result we wanted in the first place, but we still have to merge the lists and take care of uniqueness in python.

JsonbArrayElements

We cannot use distinct, because instead of one list with all the books we have a list of lists (and Nones). Postgres gives us jsonb_array_elements (since 9.4) to solve this:

class JsonbArrayElements(Func):

    function = 'jsonb_array_elements'
    template = '%(function)s(%(expressions)s)'
    arity = 1

Now we have a list of books and can use distinct:

result = Author.objects \
    .filter(data__isnull=False) \
    .order_by('pk') \
    .annotate(
        books=JsonbArrayElements(ObjectAtPath('data', 'books')),
    ) \
    .values_list('books', flat=True)

assert list(result) == ['One', 'Two', 'Three', 'Two', 'Four']

result = Author.objects \
    .filter(data__isnull=False) \
    .annotate(
        books=JsonbArrayElements(ObjectAtPath('data', 'books')),
    ) \
    .order_by('books') \
    .distinct('books') \
    .values_list('books', flat=True)

assert list(result) == ['Four', 'One', 'Three', 'Two']

In the end we delegated all the work to the database. Keep in mind that this is not always what performs best, but now there's a choice.

Tell us what you think about this. Is something unclear? Do you have questions or ideas? Leave your comments below.

Comments
comments powered by Disqus

  • « Unescaped UTF-8 in Django's admin with JSONField
  • git - Prepare commit msg »

Published

Nov 14, 2016

Category

Techstack

Tags

  • django 3
  • postgres 1
  • python 8
  • AX Semantics Developer Blog - Technical stuff and learnings we had while developing the AX environment
  • Powered by Pelican. Theme: Elegant by Talha Mansoor