sparkpost / eriksen Goto Github PK

View Code? Open in Web Editor NEW

6.0 36.0 1.0 305 KB

A model marshaling library for dual-write/single-read data migration

Home Page: https://www.npmjs.com/package/eriksen

License: MIT License

JavaScript 100.00%

marshalling model-factory nodejs sp-lib

eriksen's Introduction

Eriksen

Eriksen is a model factory that makes it easy to write model code that retrieves or saves data to multiple, configurable places. It's main job is to Marshal reads & writes to a configurable and swappable backend, allowing for a separation that preserves an interface even when the backend data storage system changes.

Installation

npm i eriksen

Why?

Eriksen is useful when you're writing code that may change database backends, or if you're currently changing code to move from one backend to another. Since it concerns itself with only the db interactions from your code, it allows your data access code to be more modularized within a codebase. And since Eriksen acts as the coding interface, you should not need to change any other code unless expectations change as db backends switch in or out.

Because Eriksen sits between the code that accesses a backend storage system, this allows for Eriksen to log failures for a non-primary backend in the background.

Usage

This example creates two models in different databases and configures eriksen to use cassandra as the primary and dynamodb as the secondary. It calls the getAllOfTheThings method with cassandra as the primary, meaning if the getAllOfTheThings fails in dynamo, it just logs out the errors, does not fail the call. If the primary fails it will throw an error.

  // cassandra model
  const cassandraMapper = {
    getAllOfTheThings: (name) => {
      return cassandra.query("SELECT ...");
    }
  }

  // dynamo model
  const dynamoMapper = {
    getAllOfTheThings: (name) => {
      return aws.dynamodb.DocumentClient(...);
    }
  }

  const Eriksen = require('eriksen');
  const model = new Eriksen('allThings');
  model.addModel('cassandra', cassandraMapper);
  model.addModel('dynamodb', dynamoMapper);
  model.configure({
    primary: cassandraMapper,
    secondary: dynamoMapper
  });

  function retrieveAllOfTheThings(thingName) {
    return model.proxy.getAllOfTheThings(name);
  }

  // calling function that calls the eriksen instance to marshall calls
  retrieveAllOfTheThings('allMyThings')
    .then((things) => {
      console.log('list of my things', things);
    })
    .catch((err) => {
      console.log(`it failed ${err.message}`);
    });

eriksen's People

Contributors

Stargazers

Watchers

Forkers

jasonrhodes

eriksen's Issues

How to keep the last_customer_id values in sync across stores

I've been thinking and I'm not convinced it's good enough to just trust that the 2 last_customer_id values will stay in sync all on their own, and if they don't that is going to be pretty bad for our trust in the dual-write strategy. Here's a possible solution...

In our model proxy's proxify function, we'd check the first argument to see if it had been "decorated" with a special argument containing a SPECIAL_FLAG. (The decoration would be done in our marshal setup, in the "supermodel" file.) If there, we store off the value and remove the special argument from args before calling the actual proxied methods. And if that SPECIAL_FLAG was set, we attach the results of the primary action as a final argument passed to the secondary action. (We could even have the value of the SPECIAL_FLAG be a function to map what value from the primary's result we actually want to pass to the secondary.)

Example:

function proxify(method, options) {
  return function() {
    const args = (arguments.length === 1) ? [arguments[0]] : Array.apply(null, arguments);
    let attachPrimary = false;

    if (args[0] && args[0].ATTACH_PRIMARY_RESULT) {
      attachPrimary = args.shift();
    }

    return callMethod(options.primaryModel, method, args).then((result) => {
      if (!options.secondary || !_.isFunction(options.secondaryModel[method])) {
        return result;
      }

      if (attachPrimary) {
        args.push(attachPrimary(result));
      }

      // don't return here and swallow errors so that secondary call is non-blocking
      callMethod(options.secondaryModel, method, args).catch((err) => {
        options.logger.error(`[Eriksen] Captured error on secondary model: ${options.secondary}#${method}`, err);
      });

      return result;
    });
  };
}

So in our "supermodel" file where we instantiate a new Eriksen marshaler, we might have:

const marshal = new Eriksen('accounts');

marshal.addModel('cassandra', require('lib/models/cassandra/accounts'));
marshal.addModel('aws', require('lib/models/aws/accounts'));

marshal.configure({ primary: 'cassandra', secondary: 'aws' });

module.exports = marshal.proxy;

And between configuring the proxy and exporting it, we would decorate the create function:

marshal.configure({ primary: 'cassandra', secondary: 'aws' });

proxyCreateAccount = marshal.proxy.createAccount
marshal.proxy.createAccount = function(account) {
  const args = (arguments.length === 1) ? [arguments[0]] : Array.apply(null, arguments);
  
  // add a new argument to the front of the array with a FLAG and a fn to
  // define what data should be passed along from the primary results
  args.unshift({
    ATTACH_PRIMARY_RESULT: (result) => result.results.customer_id
  });

  proxyCreateAccount.apply(marshal.proxy, args);
}

module.exports = marshal.proxy;

Finally, the actual cassandra and aws versions of createAccount would have to change slightly:

// from this:
model.createAccount = function(account) {
  // get next customer id
  // create account etc.
}

// to this:
model.createAccount = function(account, customerId) {
  if (typeof customerId === 'undefined') {
    customerId = getNextCustomerIdOrWhateverIDK();
  }
  // create account etc.
};

Miscellaneous issues

Had a few things bouncing in my head that were stressing me out, just need to get them down to discuss next week:

I think we have to change the login and account creation throttling we're doing with Redis, slightly. It seems like COPS has some issues around setting up AWS db stuff in the other regions, but even more interesting than that, we can't have both the Cassandra and AWS models using a single store for throttling or else we'll be double-counting each event, once from primary and once from secondary. Probably just need to use Eriksen there and continue using C* for SPE, which will also separate those streams and prevent double-counting.
Related: how do we handle the account history log? We don't want both models to write to it, but we don't want to leave things out of the AWS models and then scramble to implement things when we switch over...
Related also: how do we handle actual logging from within the models? I know we set up a separate logger for Eriksen to use directly, but I think we need a way to tell the secondary model in particular to write to a different file from within its code, which will be interesting to figure out without breaking down and having the models know about Eriksen or know whether they are primary/secondary or something. Not sure but something to figure out.

sparkpost / eriksen Goto Github PK

eriksen's Introduction

Eriksen

Installation

Why?

Usage

eriksen's People

Contributors

Stargazers

Watchers

Forkers

eriksen's Issues

How to keep the last_customer_id values in sync across stores

Miscellaneous issues

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent