Git Product home page Git Product logo

schemaless's Introduction

Disclaimer: this is alpha-quality code, and the API is not yet stable

Introduction

Schemaless is a Python module that implements the pattern described by Bret Taylor in his post How FriendFeed uses MySQL to store schema-less data. There are a couple of other Python modules out there that do this already. Here's how schemaless is different:

  • Only MySQL is supported. That said, I'd love to add SQLite support in the future.
  • Sharding isn't yet supported. Should be pretty straightforward to implement, though.
  • There's an optional "ORM" (which isn't really relational) implemented as schemaless.orm. The "ORM" really is optional, and the interface described by FriendFeed is all usable and decoupled from the session/object stuff.
  • The ORM is designed to be mostly declarative and easy to use. That means that you can say, "I have have a document type User, and please can I have an index on (user_id), and I'd also like an index on (first_name, last_name) please." The ORM will then create the necessary index tables and automatically update them when you add new users; it will also know how to pick the most specific index, given an arbitrary query.

Basic Usage

The code exported under the schemaless module exactly mimics the behavior and interface described by FriendFeed.

Example

Consider the following MySQL database schema:

CREATE TABLE entities (
    added_id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    id BINARY(16) NOT NULL,
    updated TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    body MEDIUMBLOB,
    UNIQUE KEY (id),
    KEY (updated)
) ENGINE=InnoDB;

CREATE TABLE index_user_id (
    entity_id BINARY(16) NOT NULL UNIQUE,
    user_id CHAR(32) NOT NULL,
    PRIMARY KEY (user_id, entity_id)
) ENGINE=InnoDB;

CREATE TABLE index_user_name (
    entity_id BINARY(16) NOT NULL UNIQUE,
    first_name VARCHAR(255) NOT NULL,
    last_name VARCHAR(255) NOT NULL,
    PRIMARY KEY (first_name, last_name, entity_id)
) ENGINE=InnoDB;

CREATE TABLE index_foo (
    entity_id BINARY(16) NOT NULL UNIQUE,
    bar INTEGER NOT NULL,
    PRIMARY KEY (bar, entity_id)
) ENGINE=InnoDB;

The meaning of all of these tables should be clear to you if you've read Bret's blog post. The following code is a simple example of the interface that Schemaless provides:

import schemaless
from schemaless import c

ds = schemaless.DataStore(mysql_shards=['localhost:3306'], user='foo', password='foo', database='foo')

# declare which indexes are available
user = ds.define_index('index_user_id', ['user_id'])
user_name = ds.define_index('index_user_name', ['first_name', 'last_name'])
foo = ds.define_index('index_foo', ['bar'])

# automatically knows that index entries should be created in index_user_id and
# index_user_name, based on the keys in the row given
row = ds.put({'first_name': 'evan', 'last_name': 'klitzke', 'user_id': schemaless.guid()})

# query based on user_id, using the index defined by 'index_user_id'
print user.query(c.user_id == row.user_id)

# query based on first/last name, using the index defined by 'index_user_name'
print user_name.query(c.first_name == 'evan', c.last_name == 'klitzke')

ORM Layer

There's an optional ORM layer, exported via the module schemaless.orm. When you use the ORM layer you can use indexes declaratively, and Schemaless can automatically pick the correct index to use based on your query. The ORM layer also knows how to do queries when a full index isn't available (e.g. if you add a query restriction that isn't fully covered by an index).

Example

The best way to get a feel for the ORM is to look at the example in examples/blog/main.py. This is the implementation of a trivial "blog" application that uses Schemaless and Tornado. It's only about a hundred lines of code, and shows a few different working parts interacting together.

Adding Indexes

There's a class called IndexUpdater exported by the schemaless module that provides a basic template for batches that add/update/prune indexes. It's probably easiest to understand how it works if you look at the source code for it, which provides an example of a batch that adds a new index in the module documentation. Look under schemaless/batch.py.

schemaless's People

Contributors

eklitzke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

schemaless's Issues

add optional columns

Add optional columns, so that columns can be added to schemas without causing ValueErrors to be read when reading old rows.

an error i get

when i run this : python mysqlbench.py bench.yaml

i get this error :

Traceback (most recent call last):
File "mysqlbench.py", line 180, in
main(opts, args)
File "mysqlbench.py", line 113, in main
conn = MySQLdb.connect(*_cfg)
File "/usr/lib/python2.7/dist-packages/MySQLdb/init.py", line 81, in Connect
return Connection(_args, *_kwargs)
File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 187, in init
super(Connection, self).init(_args, **kwargs2)
TypeError: connect() argument 3 must be string, not None
serdem@ubuntu:~/ff/examples/mysqlbench$

what am i doing wrong?

thanks
serdem

Add late binding to mysql

I think this is the biggest feature I'm missing -- late binding to MySQL. This means the ability to declare a base document class and tables using that class, and then later on tell schemaless the parameters for the MySQL instance it should connect to.

Missing this feature is really problematic, because it means that the MySQL connection parameters need to be hardcoded into the source for applications (or you need to do gross hacks to declare models after config files have been loaded).

Releasing/tagging the 0.3 of schemaless blocks on this feature.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.