adamchainz / django-mysql Goto Github PK

View Code? Open in Web Editor NEW

564.0 14.0 110.0 3.56 MB

:dolphin: :horse: Extensions to Django for use with MySQL/MariaDB

Home Page: https://django-mysql.readthedocs.io/

License: MIT License

Python 100.00%

django mysql mariadb python

django-mysql's Introduction

Django-MySQL

https://img.shields.io/badge/Coverage-100%25-success?style=for-the-badge

https://img.shields.io/pypi/v/django-mysql.svg?style=for-the-badge

https://img.shields.io/badge/code%20style-black-000000.svg?style=for-the-badge

The dolphin-pony - proof that cute + cute = double cute.

The dolphin-pony - proof that cute + cute = double cute.

Django-MySQL extends Django's built-in MySQL and MariaDB support their specific features not available on other databases.

What kind of features?

Includes:

QuerySet extensions:
- 'Smart' iteration - chunked pagination across a large queryset
- approx_count for quick estimates of count()
- Query hints
- Quick pt-visual-explain of the underlying query
Model fields:
- MariaDB Dynamic Columns for storing dictionaries
- Comma-separated fields for storing lists and sets
- 'Missing' fields: differently sized BinaryField/TextField classes, BooleanFields represented by BIT(1)
ORM expressions for over 20 MySQL-specific functions
A new cache backend that makes use of MySQL's upsert statement and does compression
Status variable inspection and utility methods
Named locks for easy locking of e.g. external resources
Table lock manager for hard to pull off data migrations

To see them all, check out the exposition at https://django-mysql.readthedocs.io/en/latest/exposition.html .

Requirements and Installation

Please see https://django-mysql.readthedocs.io/en/latest/installation.html .

Documentation

Every detail documented on Read The Docs.

django-mysql's People

Contributors

Stargazers

Watchers

Forkers

graingert lwiecek adamyp babbageclunk michaelaquilina duqiao ezc adammalinowski glogiotatidis marco-santamaria vladimirkuzmin cassiemeharry mathewspjacob typesetio shanx ttz21 perkville igniteflow benjaminrigaud n3h3m arnau126 flanaman alannnna charettes pczerkas apsl tybiot godgoodnight dikang123 kjam cometsong nextiams billrock khadegd beomi haoziyeung rayen lvboqi yolley myblup vvangelovski bettertomorrowpy adarshkeshan maikroeder jcrogel naitreey xiangcaibing geekfish sendbird-graveyard doxiaodong chen19921212 lybwb shaolindeng rubbish822 seanbermejo gambhirvijay nbpatil0 shimothefrog garcsl ygdmxy jzylks ronaldchoi incopro fredpalmer chinmoy12c rbevers bevy pombredanne harvardinformatics javiplx sdklab007 ashleyredzko benwebber ice-caramel-macchiato lollitop poriya1376 jheld gelo-zhukov lxngoddess5321 utsukushiihime opeerator userlocalhost lmyfzx muzige2000 macboy012 mikpim01 dthorell saisai renatocarnauba lanjinglingtamei marsofti rs-store djaizz frankfanslc juh10 barseghyanartur hardcoretech timgates42 caramdache jimsybeldon

django-mysql's Issues

MariaDB Dynamic Columns

A rival to HStore! MariaDB Dynamic Columns seem great. A full feature support in django would require:

Doing the blob [de]serializing python-side. It seems there is no way to easily make Django ORM always apply a function to a field when selecting it (i.e. forcing SELECT COLUMN_JSON(field_name) on queries), but this is anyway a bit less scalable, since it makes the DB unpack the columns every time. There is some C code on sourceforge for doing this but making it accessible in python woudl be some work (separate library I think!) - http://karlssonondatabases.blogspot.co.uk/2013/10/mariadb-dynamic-columns-client-api.html
Special F objects for updating. e.g. Model.objects.filter(pk=1).update(attrs=DynF('attrs', color='blue'))
Lookups e.g. Model.objects.filter(attrs__dyn_color='blue') (n.b. how does django.contrib.postgres do dynamic-column-name lookups?)
Special F or function object for updating absolutely (could be dupe of 2?) e.g. Model.objects.filter(pk=1).update(attrs=DynC(j=1))
Maybe specify base_field types for deserializing particular declared columns? Or is this just adding a schema to schemaless data?

"Vertical partitioning" of tables

When a core model gets big it may have large fields on it such as TextFields, which aren't useful for it the majority of the time. It would be good to support vertical partitioning easily. This isn't really a MySQL-specific task, so maybe make a separate package.

Migrations via pt-online-schema-change

It may be possible to easily/for some cases convert the SQL output of django migrations into a series of pt-online-schema-change commands + mysql command to INSERT into migration history in a script.

CAST

We could have CAST expression(s) in two forms, to help solve problems for people with data modelling issues:

As a transform, e.g. Model.objects.filter(x__cast_integer__gte=1).order_by('y__cast_integer').
As a Func, e.g. Model.objects.filter(x=Cast('y', 'bigint'))

Example tutorial and realworld use case: http://www.electrictoolbox.com/mysql-order-string-as-int/

GROUP_CONCAT

MySQL's GROUP_CONCAT would great to support for aggregation. By passing, or even autodetecting, the base field object, the concatenation could be automatically cast back into a list of the base type values.

GroupConcat deserialize to lists or sets

It's kinda annoying that GroupConcat doesn't automatically split back into lists (or maybe sets when distinct=True) on the python side - 90% of the use cases require the user to do this manually with str.split on the result before being able to use this code.

I've looked into using the list and set fields from #10 and #11 but it's a bit complicated - they don't support the custom separators, the datatypes aren't so neat to convert, it requires a bit of trickery. It might still be possible, otherwise we might have to just write custom code or some dynamic field type to achieve this - so that it ties in with the rest of the ORM.

Migration Operations - Change Storage Engine

MySQL's storage engines are underrepresented in Django. A migration operation to change the storage engine from X to Y would be helpful for e.g. changing a table's transaction support (MyISAM <-> InnoDB), or moving to MariaDB (InnoDB -> XtraDB).

Maybe storage engines can be grafted onto Model.Meta even?

LOAD DATA LOCAL INFILE for faster bulk_create

I'm sure there are situations where LOAD DATA LOCAL INFILE would be faster for bulk-inserts than bulk_create.

DELETE RETURNING

MariaDB has it, and as mentioned in #42 it can be done with user variables with a double-query in vanilla MySQL so why not?

Support HandlerSocket

Follow up to #9 - we can make it faster by supporting HandlerSocket as well, which is available for MySQL and distributed with MariaDB. The same ORM based API may even be possible.

There are a few Python HandlerSocket libraries out there, but integrating with Django can be made easier.

ORDER BY FIELD(name, val, val, val...)

As suggested by @danielroseman.

As of Django 1.8, order_by can take expressions - https://docs.djangoproject.com/en/1.8/ref/models/querysets/#order-by , thus the FIELD function ( https://mariadb.com/kb/en/mariadb/field/ ) should be implemented, which can be used for getting results ordered in a specific fashion. This tutorial covers it: http://www.electrictoolbox.com/mysql-order-specific-field-values/ .

It may also be useful to supply a shortcut that puts the unspecified values at the end, rather than the beginning, as noted in the second half of the tutorial. It could maybe be done without the double ordering forwards and backwards by using an IF, i.e. something like IF(@tmp:=FIELD(x, val1, val2) > 0, @tmp, 99999999999). Might be faster that way.

Singleton global/session status objects

It's a bit undjango to have to instantiate a class to do a simple lookup of status. It would be more natural to have a dict like the connection wrapper for getting the status, with singleton lazy wrapper for each type for your main DB connection.

from django_mysql.status import global_status
global_status.dict()

Support handler API

In my benchmarks, HANDLER can perform about twice as fast for reading tons of rows (10k random pk lookups). It could be useful for some situations to be directly supported on django models. Imagined API:

with Author.objects.handler('pk') as handler:
    handler.read(5)

Table locking context manager

For MyISAM tables especially, taking a table lock can be sometimes useful. A simple context manager could be implemented to help lock them. There are some snippets about already:

This would need to work, or error sanely, with:

Also it should work as both a context manager and a function decorator.

XMLField

If feeling particularly sadistic, an XMLField could be implemented. MySQL has two functions - ExtractValue, and UpdateXML that can be used for ORM lookups/updates.

approx_count can go further

The approx_count function can be used to count QuerySets other than the basic full-table ones it accepts at current. For example, you can get a reliable(-ish) approximate count on an index range, such as id BETWEEN 1 AND 500 (with id = primary key). This would make it much more useful too.

MariaDB virtual columns

It may be useful to be able to define Django fields which are backed by virtual columns. They could be read-only but would allow you to control their schema in Django (add db_index, filter by them). I imagine you'd have to specify their functional SQL fragment.

Full text search support

Django removed the __search lookup. It would be good to reimplement it here, with a custom Index class for creating the index, and database functions for doing searching. Compare with django.contrib.postgres' search functions: https://docs.djangoproject.com/en/3.0/ref/contrib/postgres/search/

Query Killer

It would be useful to have a function for killing queries taking longer than a specified threshold. I've written it a couple times already in non-open codebases... Also it would be great to expose it as a celery task that one can just pop into CELERY_TASK_SCHEDULE and feel (a bit) safer.

Would use #20

Django memcached backend works with MySQL memcached plugin

It should be checked that the django memcached backend works with the MySQL memcached plugin, and any missing tools provided. For example the django database cache backend has a createcachetable command, it may be useful to provide such a command specialized for memcached, and migration operations to make sure the plugin works with it.

fingerprint function that wraps pt-fingerprint

Simple subprocess wrapping, would help with fingerprinting queries for analysis. Maybe useful in #18 ?

Audit command

There are a number of best practices / common mistakes I've seen with django and mysql. A check command could be useful for comparing the codebases and schemas, or even automatically running tools such as pt-duplicate-key-checker for you, which would make running Django on MySQL more pleasant.

Examples:

tables where index length > data length
duplicate indexes (pt-duplicate-key-checker)
warnings about too many indexes
... rack brains

Handler API could be nicer

The HANDLER API as introduced in #9 is good for reading and iterating, but basic primary-key lookups could be made nicer with something like:

handler.get(pk=5)

Also maybe read could use any used kwargs to construct its where, so you could simply:

handler.iter(age__gte=18)

Rather than the awkward pass-in where we have now (which may still be needed for more complex queries though).

FUNCTIONS()

Django 1.8 supports database functions https://docs.djangoproject.com/en/dev/ref/models/database-functions/ . I guess people would like a place to import all the MySQL / MariaDB functions from so they don't have to redefine them in every project!!

RandomField

Pretty much if you want to do random ordering efficiently you should have an indexed column holding random numbers that get updated frequently, like http://www.electrictoolbox.com/mysql-random-order-random-value/ or http://mysql.rjweb.org/doc.php/random .

A RandomField could be made that always serializes to a random number on get_prep_value. Again, like #74 , not exactly MySQL specific.

Query re-writing API for hints

Somewhat depends on whether https://code.djangoproject.com/ticket/24638 is accepted, but I should still consider open-sourcing the query-rewriting API I have for adding hints to queries via special WHERE 1/* comment */ clauses. Should support straight join, caching, force index, ... for SELECTs.

N.B. because this requires a monkey-patch on the MySQL CursorWrapper, it should be opt-in via settings (there's a slight overhead to check queries for the rewrite-clauses).

Fix name consistency

In all the django multiple database code, the user says which database to use with an argument using, whilst BaseStatus and Lock in django-mysql currently use the lengthy connection_name. To reduce cognitive load, renaming them to using would be great.

Also they don't have explicit multiple database tests but they should.

Set Field F objects

Extension to #10.

Set fields could have atomic updates to add or remove members at the database level via an expression similar to the F object.

Example SQL (imagine 'a,b' is a column):

SELECT IF(
    FIND_IN_SET('c', 'a,b'),
    'a,b',
    CONCAT_WS(',', 'a,b', 'c')
);

List forms and validators documentation missing

#53 did not include all the necessary documentation - fix it! Some of it is copy/paste/reword the related Set-based documentation

Make a faster DatabaseCache

Django's DatabaseCache is a bit slow:

On every set, it does cull check via SELECT COUNT(*), which means a full tablescan
It uses longtext with base64-encoded data rather than blobs
It uses a couple queries per set/add, rather than the single one that MySQL can do with INSERT... ON DUPLICATE KEY UPDATE ...

... so make a faster one!

ProcessList API

Just as GlobalStatus and Lock implement easy pythonic access to the data from the referred SQL commands, a ProcessList object could be useful for viewing all running queries, and killing some.

Could be useful in implementing #19.

MariaDB regex functions

Extra regex lookups and functions could be created from the MariaDB 10.0.5+ regex functions: https://mariadb.com/kb/en/mariadb/regular-expressions-functions/

Considered in #39 but would moved to a separate task, as it's a lot and some would be better available as ORM lookups too than just functions.

Migration Operations - Install Plugin

django.contrib.postgres has extension loading operations, mysql/mariadb syntax should be supported too!

Common_schema integration

common_schema provides some interesting integration possibilities. More research required - it may just be too DBA/SQL centric to make sense for app code, but worth consideration. http://common-schema.googlecode.com/svn/trunk/common_schema/doc/html/introduction.html

List fields

Similar to #10, MySQL's FIND_IN_SET could be used to implement model fields that serialize to a list, as a poor man's parallel of the postgres array fields that django.contrib.postgres has.

Set fields

MySQL has FIND_IN_SET. This could be used to implement model fields that serialize to a set, as a poor man's parallel of the postgres array fields that django.contrib.postgres has.

GroupConcat support order by

GROUP_CONCAT ( https://mariadb.com/kb/en/mariadb/group_concat/ ) supports ORDER BY. I guess that could be useful for someone out there.

Statsd integration

Everyone needs monitoring. A best-practice function/manage command/celery task for monitoring MySQL variables with statsd could be provided.

Django 1.7.7 Test

Whenever I try to run the django test (on windows) I see this error

ImportError: No module named aggregation_regress.tests

Occurs from that module code snippet sqlserver_ado:

    def create_test_db(self, *args, **kwargs):
        self.mark_tests_as_expected_failure(self.connection.features.failing_tests)
        super(DatabaseCreation, self).create_test_db(*args, **kwargs)

For what reason 'sqlserver_ado' module to import something that is not within the django?

https://github.com/django/django/tree/master/tests/aggregation_regress

Full stack:

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_manage.py", line 129, in <module>
    utility.execute()
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_manage.py", line 104, in execute
    PycharmTestCommand().run_from_argv(self.argv)
  File "project-path\core_packages\django\core\management\commands\test.py", line 50, in run_from_argv
    super(Command, self).run_from_argv(argv)
  File "project-path\core_packages\django\core\management\base.py", line 288, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "project-path\core_packages\django\core\management\commands\test.py", line 71, in execute
    super(Command, self).execute(*args, **options)
  File "project-path\core_packages\django\core\management\base.py", line 338, in execute
    output = self.handle(*args, **options)
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_manage.py", line 91, in handle
    failures = TestRunner(test_labels, verbosity=verbosity, interactive=interactive, failfast=failfast, keepdb='--keepdb' in sys.argv)
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_runner.py", line 232, in run_tests
    extra_tests=extra_tests, **options)
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_runner.py", line 132, in run_tests
    return super(DjangoTeamcityTestRunner, self).run_tests(test_labels, extra_tests, **kwargs)
  File "project-path\core_packages\django\test\runner.py", line 147, in run_tests
    old_config = self.setup_databases()
  File "project-path\core_packages\django\test\runner.py", line 109, in setup_databases
    return setup_databases(self.verbosity, self.interactive, **kwargs)
  File "project-path\core_packages\django\test\runner.py", line 299, in setup_databases
    serialize=connection.settings_dict.get("TEST", {}).get("SERIALIZE", True),
  File "project-path\core_packages\sqlserver_ado\creation.py", line 126, in create_test_db
    self.mark_tests_as_expected_failure(self.connection.features.failing_tests)
  File "project-path\core_packages\sqlserver_ado\creation.py", line 120, in mark_tests_as_expected_failure
    test_case = import_string(test_case_name)
  File "project-path\core_packages\django\utils\module_loading.py", line 26, in import_string
    module = import_module(module_path)
  File "C:\Python27\Lib\importlib\__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named aggregation_regress.tests

MySQL user variable Expressions

Considered in #39.

@a:= and @a could be useful constructs, along with LAST_VALUE etc., for constructing complex queries in the ORM. https://mariadb.com/kb/en/mariadb/last_value/

We could then maybe add db-level ranking with expressions/queryset magic.

List Field F objects

Parallel of #37 , extension to #11 .

List Fields too could have atomic database updates via similar SQL constructs.

Smart Iteration Algorithms

The current algorithm used by SmartChunkedIterator is currently restricted to the one from pt-online-schema-change, with filtering added. This works badly for really sparse distributions of results.

There are other algorithms we can use, in fact listed on the pt-table-sync documentation. For example, the current algorithm is (approximately) there as 'chunk'. The 'nibble' strategy would work quite well for sparse queries - it's the same as pt-archiver, using LIMIT to determine how many get fetched.

algorithm could be another argument to SmartChunkedIterator - maybe even defaulting to 'auto' with some heuristic to pick between the algos.

Fix the terrible functions examples

The functions examples in the docs are all terribly non-specific and just use annotate. Django's manual has much clearer examples which also have some kind of rationalization behind them. Let's do that.

pt_visual_explain need not execute the query

There must be some way of monkey-patching somewhere in MySQLdb/mysqlclient or django to grab the full query from the queryset without executing it. Find it!

HANDLER use prepared statements on MariaDB

Since MariaDB 5.3 HANDLER has supported prepared statements: https://mariadb.com/kb/en/mariadb/handler-commands/

"Starting from MariaDB 5.3 prepared statements work with HANDLER READ, which gives a much higher performance (50 % speedup) as there is no parsing and all data is transformed in binary (without conversions to text, as with the normal protocol)."

Would make handlersocket in #14 less important.

Migration Operations - FK underscore prefixing

When running a migration with pt-online-schema-change, due to MySQL uniqueness limitations, every FK constraint ends up with an extra _ prefixed. When writing migrations for applying through django migrations on dev/staging machines and pt-online-schema-change in production, it would be useful to have such an operation so that those not done online are still in parity.

Unix timestamp fields

This is "the sane rule for timestamps" - everything as UTC unsigned ints. This is what #66 is using (though in milliseconds). As suggested in http://www.xaprb.com/blog/2014/01/30/timestamps-in-mysql/ .

This could be implemented as a generic django field, can't see how it's MySQL specific yet, probably worth doing in a separate package... ?

django-debug-toolbar panel?

https://github.com/django-debug-toolbar/django-debug-toolbar has an API for adding your own panels. A MySQL specific panel could show the status variable changes in processing a view, for example the Com_* counts.

I've looked at pt-visual-explain integration when showing the EXPLAIN of a query, but that would require modifying the explain panel in django-debug-toolbar core.

Small / medium / large fields

Django's fields all prescribe one size for things, but maybe sometimes you want a TINYTEXT or a TINYBLOB to save some bytes. Also maybe you have a legacy application that uses these that you want to accurately represent in djangoland.

These are easy subclasses of the built-in fields with just the db_type changed, but there should be some place to import them from (django_mysql.fields !) rather than be forced to do the subclassing yourself in your app. Making a field class isn't as trivial as it could be.

iter_smart and finding the min id on large tables

Sometimes when trying to find the first ID to start at on a large table, the query ends up being terribly long and blocking the database still, especially when nothing matches the given queryset. This could be done more intelligently, e.g. using the same slicing strategy to scan up the table. Unfortunately there is no example code in pt-online-schema-change to copy since it does not have this problem as it always copies all the rows and has no min/max ids to worry about.