Git Product home page Git Product logo

django-mysql's Introduction

Django-MySQL

https://img.shields.io/readthedocs/django-mysql?style=for-the-badge https://img.shields.io/github/actions/workflow/status/adamchainz/django-mysql/main.yml.svg?branch=main&style=for-the-badge https://img.shields.io/badge/Coverage-100%25-success?style=for-the-badge https://img.shields.io/pypi/v/django-mysql.svg?style=for-the-badge https://img.shields.io/badge/code%20style-black-000000.svg?style=for-the-badge pre-commit
The dolphin-pony - proof that cute + cute = double cute.
The dolphin-pony - proof that cute + cute = double cute.

Django-MySQL extends Django's built-in MySQL and MariaDB support their specific features not available on other databases.

What kind of features?

Includes:

  • QuerySet extensions:
    • 'Smart' iteration - chunked pagination across a large queryset
    • approx_count for quick estimates of count()
    • Query hints
    • Quick pt-visual-explain of the underlying query
  • Model fields:
    • MariaDB Dynamic Columns for storing dictionaries
    • Comma-separated fields for storing lists and sets
    • 'Missing' fields: differently sized BinaryField/TextField classes, BooleanFields represented by BIT(1)
  • ORM expressions for over 20 MySQL-specific functions
  • A new cache backend that makes use of MySQL's upsert statement and does compression
  • Status variable inspection and utility methods
  • Named locks for easy locking of e.g. external resources
  • Table lock manager for hard to pull off data migrations

To see them all, check out the exposition at https://django-mysql.readthedocs.io/en/latest/exposition.html .

Requirements and Installation

Please see https://django-mysql.readthedocs.io/en/latest/installation.html .

Documentation

Every detail documented on Read The Docs.

django-mysql's People

Contributors

adamchainz avatar angusholder avatar arnau126 avatar avelis avatar conormcgee avatar dependabot[bot] avatar flanaman avatar glogiotatidis avatar graingert avatar hjaarnio avatar hyzyla avatar jheld avatar kjam avatar le717 avatar lwiecek avatar maikroeder avatar marco-santamaria avatar michaelaquilina avatar naitreey avatar pre-commit-ci[bot] avatar searchdream avatar shanx avatar sourcery-ai[bot] avatar timgates42 avatar tuky avatar yunera avatar zhaoqi99 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

django-mysql's Issues

MariaDB Dynamic Columns

A rival to HStore! MariaDB Dynamic Columns seem great. A full feature support in django would require:

  1. Doing the blob [de]serializing python-side. It seems there is no way to easily make Django ORM always apply a function to a field when selecting it (i.e. forcing SELECT COLUMN_JSON(field_name) on queries), but this is anyway a bit less scalable, since it makes the DB unpack the columns every time. There is some C code on sourceforge for doing this but making it accessible in python woudl be some work (separate library I think!) - http://karlssonondatabases.blogspot.co.uk/2013/10/mariadb-dynamic-columns-client-api.html
  2. Special F objects for updating. e.g. Model.objects.filter(pk=1).update(attrs=DynF('attrs', color='blue'))
  3. Lookups e.g. Model.objects.filter(attrs__dyn_color='blue') (n.b. how does django.contrib.postgres do dynamic-column-name lookups?)
  4. Special F or function object for updating absolutely (could be dupe of 2?) e.g. Model.objects.filter(pk=1).update(attrs=DynC(j=1))
  5. Maybe specify base_field types for deserializing particular declared columns? Or is this just adding a schema to schemaless data?

"Vertical partitioning" of tables

When a core model gets big it may have large fields on it such as TextFields, which aren't useful for it the majority of the time. It would be good to support vertical partitioning easily. This isn't really a MySQL-specific task, so maybe make a separate package.

Migrations via pt-online-schema-change

It may be possible to easily/for some cases convert the SQL output of django migrations into a series of pt-online-schema-change commands + mysql command to INSERT into migration history in a script.

CAST

We could have CAST expression(s) in two forms, to help solve problems for people with data modelling issues:

  1. As a transform, e.g. Model.objects.filter(x__cast_integer__gte=1).order_by('y__cast_integer').
  2. As a Func, e.g. Model.objects.filter(x=Cast('y', 'bigint'))

Example tutorial and realworld use case: http://www.electrictoolbox.com/mysql-order-string-as-int/

GROUP_CONCAT

MySQL's GROUP_CONCAT would great to support for aggregation. By passing, or even autodetecting, the base field object, the concatenation could be automatically cast back into a list of the base type values.

GroupConcat deserialize to lists or sets

It's kinda annoying that GroupConcat doesn't automatically split back into lists (or maybe sets when distinct=True) on the python side - 90% of the use cases require the user to do this manually with str.split on the result before being able to use this code.

I've looked into using the list and set fields from #10 and #11 but it's a bit complicated - they don't support the custom separators, the datatypes aren't so neat to convert, it requires a bit of trickery. It might still be possible, otherwise we might have to just write custom code or some dynamic field type to achieve this - so that it ties in with the rest of the ORM.

Migration Operations - Change Storage Engine

MySQL's storage engines are underrepresented in Django. A migration operation to change the storage engine from X to Y would be helpful for e.g. changing a table's transaction support (MyISAM <-> InnoDB), or moving to MariaDB (InnoDB -> XtraDB).

Maybe storage engines can be grafted onto Model.Meta even?

DELETE RETURNING

MariaDB has it, and as mentioned in #42 it can be done with user variables with a double-query in vanilla MySQL so why not?

Support HandlerSocket

Follow up to #9 - we can make it faster by supporting HandlerSocket as well, which is available for MySQL and distributed with MariaDB. The same ORM based API may even be possible.

There are a few Python HandlerSocket libraries out there, but integrating with Django can be made easier.

ORDER BY FIELD(name, val, val, val...)

As suggested by @danielroseman.

As of Django 1.8, order_by can take expressions - https://docs.djangoproject.com/en/1.8/ref/models/querysets/#order-by , thus the FIELD function ( https://mariadb.com/kb/en/mariadb/field/ ) should be implemented, which can be used for getting results ordered in a specific fashion. This tutorial covers it: http://www.electrictoolbox.com/mysql-order-specific-field-values/ .

It may also be useful to supply a shortcut that puts the unspecified values at the end, rather than the beginning, as noted in the second half of the tutorial. It could maybe be done without the double ordering forwards and backwards by using an IF, i.e. something like IF(@tmp:=FIELD(x, val1, val2) > 0, @tmp, 99999999999). Might be faster that way.

Singleton global/session status objects

It's a bit undjango to have to instantiate a class to do a simple lookup of status. It would be more natural to have a dict like the connection wrapper for getting the status, with singleton lazy wrapper for each type for your main DB connection.

from django_mysql.status import global_status
global_status.dict()

Support handler API

In my benchmarks, HANDLER can perform about twice as fast for reading tons of rows (10k random pk lookups). It could be useful for some situations to be directly supported on django models. Imagined API:

with Author.objects.handler('pk') as handler:
    handler.read(5)

XMLField

If feeling particularly sadistic, an XMLField could be implemented. MySQL has two functions - ExtractValue, and UpdateXML that can be used for ORM lookups/updates.

approx_count can go further

The approx_count function can be used to count QuerySets other than the basic full-table ones it accepts at current. For example, you can get a reliable(-ish) approximate count on an index range, such as id BETWEEN 1 AND 500 (with id = primary key). This would make it much more useful too.

MariaDB virtual columns

It may be useful to be able to define Django fields which are backed by virtual columns. They could be read-only but would allow you to control their schema in Django (add db_index, filter by them). I imagine you'd have to specify their functional SQL fragment.

Query Killer

It would be useful to have a function for killing queries taking longer than a specified threshold. I've written it a couple times already in non-open codebases... Also it would be great to expose it as a celery task that one can just pop into CELERY_TASK_SCHEDULE and feel (a bit) safer.

Would use #20

Django memcached backend works with MySQL memcached plugin

It should be checked that the django memcached backend works with the MySQL memcached plugin, and any missing tools provided. For example the django database cache backend has a createcachetable command, it may be useful to provide such a command specialized for memcached, and migration operations to make sure the plugin works with it.

Audit command

There are a number of best practices / common mistakes I've seen with django and mysql. A check command could be useful for comparing the codebases and schemas, or even automatically running tools such as pt-duplicate-key-checker for you, which would make running Django on MySQL more pleasant.

Examples:

  • tables where index length > data length
  • duplicate indexes (pt-duplicate-key-checker)
  • warnings about too many indexes
  • ... rack brains

Handler API could be nicer

The HANDLER API as introduced in #9 is good for reading and iterating, but basic primary-key lookups could be made nicer with something like:

handler.get(pk=5)

Also maybe read could use any used kwargs to construct its where, so you could simply:

handler.iter(age__gte=18)

Rather than the awkward pass-in where we have now (which may still be needed for more complex queries though).

Query re-writing API for hints

Somewhat depends on whether https://code.djangoproject.com/ticket/24638 is accepted, but I should still consider open-sourcing the query-rewriting API I have for adding hints to queries via special WHERE 1/* comment */ clauses. Should support straight join, caching, force index, ... for SELECTs.

N.B. because this requires a monkey-patch on the MySQL CursorWrapper, it should be opt-in via settings (there's a slight overhead to check queries for the rewrite-clauses).

Fix name consistency

In all the django multiple database code, the user says which database to use with an argument using, whilst BaseStatus and Lock in django-mysql currently use the lengthy connection_name. To reduce cognitive load, renaming them to using would be great.

Also they don't have explicit multiple database tests but they should.

Set Field F objects

Extension to #10.

Set fields could have atomic updates to add or remove members at the database level via an expression similar to the F object.

Example SQL (imagine 'a,b' is a column):

SELECT IF(
    FIND_IN_SET('c', 'a,b'),
    'a,b',
    CONCAT_WS(',', 'a,b', 'c')
);

Make a faster DatabaseCache

Django's DatabaseCache is a bit slow:

  1. On every set, it does cull check via SELECT COUNT(*), which means a full tablescan
  2. It uses longtext with base64-encoded data rather than blobs
  3. It uses a couple queries per set/add, rather than the single one that MySQL can do with INSERT... ON DUPLICATE KEY UPDATE ...

... so make a faster one!

ProcessList API

Just as GlobalStatus and Lock implement easy pythonic access to the data from the referred SQL commands, a ProcessList object could be useful for viewing all running queries, and killing some.

Could be useful in implementing #19.

List fields

Similar to #10, MySQL's FIND_IN_SET could be used to implement model fields that serialize to a list, as a poor man's parallel of the postgres array fields that django.contrib.postgres has.

Set fields

MySQL has FIND_IN_SET. This could be used to implement model fields that serialize to a set, as a poor man's parallel of the postgres array fields that django.contrib.postgres has.

Statsd integration

Everyone needs monitoring. A best-practice function/manage command/celery task for monitoring MySQL variables with statsd could be provided.

Django 1.7.7 Test

Whenever I try to run the django test (on windows) I see this error

ImportError: No module named aggregation_regress.tests

Occurs from that module code snippet sqlserver_ado:

    def create_test_db(self, *args, **kwargs):
        self.mark_tests_as_expected_failure(self.connection.features.failing_tests)
        super(DatabaseCreation, self).create_test_db(*args, **kwargs)

For what reason 'sqlserver_ado' module to import something that is not within the django?

https://github.com/django/django/tree/master/tests/aggregation_regress

Full stack:

Traceback (most recent call last):
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_manage.py", line 129, in <module>
    utility.execute()
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_manage.py", line 104, in execute
    PycharmTestCommand().run_from_argv(self.argv)
  File "project-path\core_packages\django\core\management\commands\test.py", line 50, in run_from_argv
    super(Command, self).run_from_argv(argv)
  File "project-path\core_packages\django\core\management\base.py", line 288, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "project-path\core_packages\django\core\management\commands\test.py", line 71, in execute
    super(Command, self).execute(*args, **options)
  File "project-path\core_packages\django\core\management\base.py", line 338, in execute
    output = self.handle(*args, **options)
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_manage.py", line 91, in handle
    failures = TestRunner(test_labels, verbosity=verbosity, interactive=interactive, failfast=failfast, keepdb='--keepdb' in sys.argv)
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_runner.py", line 232, in run_tests
    extra_tests=extra_tests, **options)
  File "C:\Program Files (x86)\JetBrains\PyCharm 4.0.6\helpers\pycharm\django_test_runner.py", line 132, in run_tests
    return super(DjangoTeamcityTestRunner, self).run_tests(test_labels, extra_tests, **kwargs)
  File "project-path\core_packages\django\test\runner.py", line 147, in run_tests
    old_config = self.setup_databases()
  File "project-path\core_packages\django\test\runner.py", line 109, in setup_databases
    return setup_databases(self.verbosity, self.interactive, **kwargs)
  File "project-path\core_packages\django\test\runner.py", line 299, in setup_databases
    serialize=connection.settings_dict.get("TEST", {}).get("SERIALIZE", True),
  File "project-path\core_packages\sqlserver_ado\creation.py", line 126, in create_test_db
    self.mark_tests_as_expected_failure(self.connection.features.failing_tests)
  File "project-path\core_packages\sqlserver_ado\creation.py", line 120, in mark_tests_as_expected_failure
    test_case = import_string(test_case_name)
  File "project-path\core_packages\django\utils\module_loading.py", line 26, in import_string
    module = import_module(module_path)
  File "C:\Python27\Lib\importlib\__init__.py", line 37, in import_module
    __import__(name)
ImportError: No module named aggregation_regress.tests

List Field F objects

Parallel of #37 , extension to #11 .

List Fields too could have atomic database updates via similar SQL constructs.

Smart Iteration Algorithms

The current algorithm used by SmartChunkedIterator is currently restricted to the one from pt-online-schema-change, with filtering added. This works badly for really sparse distributions of results.

There are other algorithms we can use, in fact listed on the pt-table-sync documentation. For example, the current algorithm is (approximately) there as 'chunk'. The 'nibble' strategy would work quite well for sparse queries - it's the same as pt-archiver, using LIMIT to determine how many get fetched.

algorithm could be another argument to SmartChunkedIterator - maybe even defaulting to 'auto' with some heuristic to pick between the algos.

Fix the terrible functions examples

The functions examples in the docs are all terribly non-specific and just use annotate. Django's manual has much clearer examples which also have some kind of rationalization behind them. Let's do that.

Migration Operations - FK underscore prefixing

When running a migration with pt-online-schema-change, due to MySQL uniqueness limitations, every FK constraint ends up with an extra _ prefixed. When writing migrations for applying through django migrations on dev/staging machines and pt-online-schema-change in production, it would be useful to have such an operation so that those not done online are still in parity.

Small / medium / large fields

Django's fields all prescribe one size for things, but maybe sometimes you want a TINYTEXT or a TINYBLOB to save some bytes. Also maybe you have a legacy application that uses these that you want to accurately represent in djangoland.

These are easy subclasses of the built-in fields with just the db_type changed, but there should be some place to import them from (django_mysql.fields !) rather than be forced to do the subclassing yourself in your app. Making a field class isn't as trivial as it could be.

iter_smart and finding the min id on large tables

Sometimes when trying to find the first ID to start at on a large table, the query ends up being terribly long and blocking the database still, especially when nothing matches the given queryset. This could be done more intelligently, e.g. using the same slicing strategy to scan up the table. Unfortunately there is no example code in pt-online-schema-change to copy since it does not have this problem as it always copies all the rows and has no min/max ids to worry about.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.