Git Product home page Git Product logo

django-ltree-demo's Introduction

How to store trees with Django & PostgreSQL

Rationale

If you ever had the need to store hierarchical data (trees) with Django, you probably had to use a library like django-mptt or django-treebeard. Those libraries work fine at a small scale, but here at PeopleDoc we have encountered a lot of issues when using them at a bigger scale (tables with hundreds of thousands of rows and quite a lot of writings).

It turns out that storing trees in a database has been a solved problem since a long time, at least with PostgreSQL. The ltree extension provides a convenient data structure which is very fast on reads, and with almost no impact on writes. The algorithm used is very close to django-treebeard's materialized paths, but with all the power of PostgreSQL.

The main downside of using ltree is that you have to maintain the materialized path yourself. It doesn't come with any tool to do it automatically. But fortunately, it's actually quite simple to maintain this path using PostgreSQL triggers!

Integration with Django

In demo/categories/ltree.py you will find a very simple Django field for the ltree data type. This field can be used in any Django model, and adds two lookups: descendant and ancestor. Those lookups allow you to query the descendants or the ancestors of any object with a very simple SQL query.

For example, let's say you have the following model:

from django.db import models

from project.ltree import LtreeField


class Category(models.Model):
  parent = models.ForeignKey('self', null=True)
  code = models.CharField(maxlength=32, unique=True)
  path = LtreeField()

The path field represents the path from the root to the node, where each node is represented by its code (it could also be its id, but using the code is more readable when debugging). For example, if you have a genetic category, under a science category, under a top category, its path would be top.science.category.

Thanks to the descendant and ancestor lookups, the get_descendants method in django-mptt can be rewritten as:

def get_descendants(self):
    return Category.objects.filter(path__descendant=self.path)

This would generate a SQL query close to:

SELECT * FROM category WHERE path <@ 'science.biology'

The magic part: PostgreSQL triggers

If you add a ltree field to your model, you will have to keep the field up-to-date when inserting or updating instances. We could do that with Django signals, but it turns out that PostgreSQL is far better for maintaining integrity & writing efficient code.

Every time we insert or update a row, we can reconstruct its path by appending its code to the path of its parent. If the path has changed, we'll also need to update the path of the children, which can be written as a simple UPDATE query.

All that can be done easily with PostgreSQL triggers. You can find an implementation of those triggers in the file demo/categories/sql/triggers.sql.

The demo

In the demo, the following files are the most important:

How to install the demo

  • Create & activate a virtualenv
  • Install the dependencies with pip install -r requirements.txt
  • Install PostgreSQL with your favorite way
  • Export the PGHOST and PGUSER variables accordingly
  • Create the django_ltree_demo table
  • Run python manage.py migrate
  • Launch the test with pytest -v

Conclusion

With a few lines a declarative, idiomatic Django code and ~50 lines of SQL we have implemented a fast and consistent solution for storing and querying trees.

Sometimes it's good to delegate complicated data manipulation to the database instead of doing everything in Python :) .

django-ltree-demo's People

Contributors

k4nar avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.