Git Product home page Git Product logo

python-spanner-orm's Introduction

.github/workflows/test.yaml

Google Cloud Spanner ORM

This is a lightweight ORM written in Python and built on top of Cloud Spanner. This is not an officially supported Google product.

Getting started

How to install

Make sure that Python 3.8 or higher is the default version of python for your environment, then run: pip install git+https://github.com/google/python-spanner-orm#egg=spanner_orm

Connecting

To connect the Spanner ORM to an existing Spanner database:

import spanner_orm
spanner_orm.from_connection(
    spanner_orm.SpannerConnection(instance_name, database_name))

project and credentials are optional parameters, and the standard Spanner client library will attempt to infer them automatically if not specified. A session pool may be also specified by the pool parameter if necessary. An explanation of session pools may be found here, but the implementation of TransactionPingingPool in the standard Spanner client libraries seems to not work, and the thread code associated with using the PingingPool also seems to not do what is intended (ping the pool every so often)

Creating a model

In order to write to and read from a table on Spanner, you need to tell the ORM about the table by writing a model class, which looks something like this:

import spanner_orm

class TestModel(spanner_orm.Model):
  __table__ = 'TestTable'  # Name of table in Spanner
  __interleaved__ = None  # Name of table that the current table is interleaved
                          # into. None or omitted if the table is not interleaved

  # Every column in the table has a corresponding Field, where the first parameter
  # is the type of field. The primary key is constructed by the fields labeled
  # with primary_key=True in the order they appear in the class.
  # The name of the column is the same as the name of the class attribute
  id = spanner_orm.Field(spanner_orm.String(), primary_key=True)
  value = spanner_orm.Field(spanner_orm.Integer(), nullable=True)
  number = spanner_orm.Field(spanner_orm.Float(), nullable=True)

  # Secondary indexes are specified in a similar manner to fields:
  value_index = spanner_orm.Index(['value'])

  # To indicate that there is a foreign key relationship from this table to
  # another one, use a ForeignKeyRelationship.
  foreign_key = spanner_orm.ForeignKeyRelationship(
    'OtherModel',
    {'referencing_key': 'referenced_key'})

If the model does not refer to an existing table on Spanner, we can create the corresponding table on the database through the ORM in one of two ways. If the database has not yet been created, we can create it and the table at the same time by:

admin_api = spanner_orm.connect_admin(
  'instance_name',
  'database_name',
  create_ddl=spanner_orm.model_creation_ddl(TestModel))
admin_api.create_database()

If the database already exists, we can execute a Migration where the upgrade method returns a CreateTable for the model you have just defined (see section on migrations)

Retrieve data from Spanner

All queries through Spanner take place in a transaction. The ORM usually expects a transaction to be present and provided, but if None is specified, a new transaction will be created for that request. The two main ways of retrieving data through the ORM are where() and find()/find_multi():

# where() is invokes on a model class to retrieve models of that type. it takes
# a sequence of conditions. Most conditions that specify a Field, Index,
# Relationship, or Model can take  either the name of the object or the object
# itself
test_objects = TestModel.where(spanner_orm.greater_than('value', '50'))

# To also retrieve related objects, the includes() condition should be used:
test_and_other_objects = TestModel.where(
    spanner_orm.greater_than(TestModel.value, '50'),
    spanner_orm.includes(TestModel.fake_relationship),
)

# To create a transaction, run_read_only() or run_write() are used with the
# method to be run inside the transaction and any arguments to passs to the method.
# The method is invoked with the transaction as the first argument and then the
# rest of the provided arguments:
def callback_1(transaction, argument):
  return TestModel.find(id=argument, transaction=transaction)

specific_object = spanner_orm.spanner_api().run_read_only(callback, 1)

# Alternatively, the transactional_read decorator can be used to clean up the
# call a bit:
@transactional_read
def finder(argument, transaction=None):
  return TestModel.find(id=argument, transaction=transaction)
specific_object = finder(1)

Write data to Spanner

The simplest way to write data is to create a Model (or retrieve one and modify it) and then call save() on it:

test_model = TestModel({'key': 'key', 'value': 1})
test_model.save()

Note that creating a model as per above will fail if there's already a row in the database where the primary key matches, as it uses a Spanner INSERT instead of an UPDATE, as the ORM thinks it's a new object, as it wasn't retrieved from Spanner.

For modifying multiple objects at the same time, the Model save_batch() method can be used:

models = []
for i in range(10):
  key = 'test_{}'.format(i)
  models.append(TestModel({'key': key, 'value': value}))
TestModel.save_batch(models)

spanner_orm.spanner_api().run_write() can be used for executing read-write transactions, or the transactional_write decorator can be used similarly to the read decorator mentioned above. Note that if a transaction fails due to data being modified after the read happened and before the transaction finished executing, the called method will be re-run until it succeeds or a certain number of failures happen. Make sure that there are no side effects that could cause issues if called multiple times. Exceptions thrown out of the called method will abort the transaction.

Other helper methods exist for more complex use cases (create, update, upsert, and others), but you will have to do more work in order to use those correctly. See the documentation on those methods for more information.

Migrations

Creating migrations

Running spanner-orm generate <migration name> will generate a new migration file to be filled out in the directory specified (or 'migrations' by default). The upgrade function is executed when migrating, and the downgrade function is executed when rolling back the migration. Each of these should return a single MigrationUpdate object (e.g., CreateTable, AddColumn, etc.), as Spanner cannot execute multiple schema updates atomically.

Executing migrations

Running spanner-orm migrate <Spanner instance> <Spanner database> will execute all the unmigrated migrations for that database in the correct order, using the application default credentials. If that won't work for your use case, MigrationExecutor can be used instead:

connection = spanner_orm.SpannerConnection(
  instance_name,
  database_name,
  credentials)
executor = spanner_orm.MigrationExecutor(connection)
executor.migrate()

Note that there is no protection against trying execute migrations concurrently multiple times, so try not to do that.

If a migration needs to be rolled back, spanner-orm rollback <migration_name> <Spanner instance> <Spanner database> or the corresponding MigrationExecutor method should be used.

Tests

Note: we suggest using a Python 3.8 virtualenv for running tests and type checking.

Before running any tests, you'll need to download the Cloud Spanner Emulator. See https://github.com/GoogleCloudPlatform/cloud-spanner-emulator for several options. If you're on Linux, we recommend:

VERSION=1.2.0
wget https://storage.googleapis.com/cloud-spanner-emulator/releases/${VERSION}/cloud-spanner-emulator_linux_amd64-${VERSION}.tar.gz
tar zxvf cloud-spanner-emulator_linux_amd64-${VERSION}.tar.gz
chmod u+x gateway_main emulator_main
git clone [email protected]:GoogleCloudPlatform/cloud-spanner-emulator.git

To check type annotations, run:

pip install pytype
pytype spanner_orm

To check formatting, run (change --diff to --in-place to fix formatting):

pip install yapf
yapf --diff --recursive --parallel .

Then run tests with:

SPANNER_EMULATOR_BINARY_PATH=$(pwd)/emulator_main pytest

python-spanner-orm's People

Contributors

akefeli avatar dcbrandao avatar dgorelik avatar dseomn avatar gavinduggan avatar gleeper avatar maroux avatar qadro87 avatar sabrina-gisselle avatar supersam654 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-spanner-orm's Issues

Fix the lost session issue in Spanner ORM

Spanner deletes sessions older than some threshold (28 days). When that happens, Spanner ORM panics and doesn't know what to do causing its clients to receive errors similar to google.api_core.exceptions.NotFound: 404 Session not found: projects/<project name>/instances/<instance name>/databases/<database name>/sessions/<session id>".

Fix type annotations for Transaction in model.py.

Currently model.py uses google.cloud.spanner_v1.transaction.Transaction for all the transaction parameters, but I think the read-only methods are designed to work with google.cloud.spanner_v1.snapshot.Snapshot too. Snapshot is available through the Spanner ORM API in spanner_orm.api.SpannerReadApi.run_read_only() and spanner_orm.decorator.transactional_read(), so we really should support using correct type annotations for Snapshot. Another thing to consider is whether there should be any support for google.cloud.spanner_v1.batch.Batch in write-only Model methods.

Add a Model.delete_where class method.

Currently, I think the only way to delete based on a condition (not including primary keys) is retrieve the model objects matching the condition, then delete them. It would be nice to have a delete_where() method with the same signature as where(), but that deleted matching rows.

Missing Float64

Hi,

Just found this library and noticed that it's missing Float64. Is there a plan to add it?

Simplify code in condition.py.

I think the addition of ArbitraryCondition to condition.py should enable some other code in that file to be simplified significantly. E.g., ComparisonCondition could probably be simplified a lot by using ArbitraryCondition.

Fix metaclass-related pytype issues

PR #92 adds # type: ignore to some metaclass related issues. This issue is to track the removal of those ignores.

For the issue in spanner_orm/tests/metadata_test.py, the issue seems to be that whilecls.meta is set in the metaclass' __new__, pytype is interpreting the return type annotation of __getattr__ as the type which cls.meta must be.

Re-write Model to use dataclasses.

Model currently uses a metaclass (which is disallowed by the style guide for good reasons) and Fields which make type checking tricky. (The class attribute is of type Field, but the instance attribute with the same name is of type str, int, or whatever that specific field stores.) I think most of this could be solved by switching to something based on dataclasses. E.g., in Python 3.9+:

@dataclasses.dataclass
class Foo(spanner_orm.Model):
  foo: Annotated[int, spanner_orm.primary_key]

Or in older versions of Python:

@dataclasses.dataclass
class Foo(spanner_orm.Model):
  foo: int = dataclasses.field(metadata={spanner_orm.PRIMARY_KEY: True})

Is this library maintained at all?

Hi

I've opened a few issues and a few PRs but haven't heard anything on them. Just checking to see if this library is maintained at all and accepting changes?

Thanks!

Make migrations a bit less tedious to write.

Ideas:

  1. It seems like many SchemaUpdate subclasses have inherent inverses, e.g., it should be easy to derive a DropTable from a CreateTable. Could we simplify upgrade() and downgrade() into a single function that returns objects that can process both directions of changes?
  2. Maybe add a CompoundUpdate or similar that applies multiple updates in a single migration. This might be more complex than it seems though if the migration executor and/or migration status information assume that migrations are atomic.

Pypi Package

Hey Guys,

Can we get this up into pypi ... sitting behind an organization that has github blocked. Will be helpful if you guys can get it up there. My workaround will have to be to push your package under my account for now.

[Feature Request] Autogenerate migrations based on current `spanner_orm.Models`

The current model of migration creation being a static file requiring developers to setup the exact changes desired in this change is less than ideal.

A more desirable model for ease of use is to have the package be able to introspect the models (several ways to do this) and generating the delta required to bring the system up to current state from the previous schema revision.

Inspiration: https://github.com/miguelgrinberg/Flask-Migrate & the whole alembic system

[BUG] Tables with string columns defined manually not using `STRING(MAX)`, ie `STRING(1024)` will fail

If you create a table directly via sql command and then try to interact with that same db with spanner_orm then it will fail.

CREATE TABLE manuallyCreatedTable(
  id INT64 NOT NULL,
  ProblematicColumn STRING(1024),
) PRIMARY KEY(id);

And you create any table. You will get an error like:

  /usr/local/bin/spanner-orm(8)<module>()
-> sys.exit(main())
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/scripts.py(78)main()
-> args.execute(args)
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/scripts.py(33)migrate()
-> executor.migrate(args.name)
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/migration_executor.py(63)migrate()
-> self._validate_migrations()
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/migration_executor.py(184)_validate_migrations()
-> if (self.migrated(migration_.migration_id) and
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/migration_executor.py(46)migrated()
-> return self._migration_status().get(migration_id, False)
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/migration_executor.py(150)_migration_status()
-> model_from_db = metadata.SpannerMetadata.model(
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/metadata.py(68)model()
-> return cls.models().get(table_name)
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/metadata.py(43)models()
-> tables = cls.tables()
  /usr/local/lib/python3.9/site-packages/spanner_orm/admin/metadata.py(80)tables()
-> column_row.field_type(), nullable=column_row.nullable())
> /usr/local/lib/python3.9/site-packages/spanner_orm/admin/column.py(45)field_type()
-> raise error.SpannerError('No corresponding Type for {}'.format(
(Pdb) up
> /usr/local/lib/python3.9/site-packages/spanner_orm/admin/metadata.py(80)tables()
-> column_row.field_type(), nullable=column_row.nullable())
(Pdb) column_row.field_type()
*** spanner_orm.error.SpannerError: No corresponding Type for STRING(1024)`

WORKAROUND: You will need to either set ProblematicColumn to STRING(MAX) or delete that table.

Leaving this here for future intrepid souls.

Support for COMMIT_TIMESTAMP

spanner.COMMIT_TIMESTAMP is a special value that may be used for automatically setting timestamp fields using commit time. However, the ORM doesn't support this:

/usr/local/lib/python3.7/site-packages/spanner_orm/model.py in __init__(self, values, persisted)
    462 
    463       for column in self._columns:
--> 464         self._metaclass.validate_value(column, values.get(column), ValueError)
    465 
    466     for column in self._columns:

/usr/local/lib/python3.7/site-packages/spanner_orm/model.py in validate_value(cls, field_name, value, error_type)
    125       cls.fields[field_name].validate(value)
    126     except error.ValidationError as ex:
--> 127       raise error_type(*ex.args)
    128 
    129 

ValueError: spanner.commit_timestamp() is not of type datetime

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.