Git Product home page Git Product logo

automerge-classic's Introduction

Automerge

Automerge logo

homepage main docs latest docs ci docs

Automerge is a library which provides fast implementations of several different CRDTs, a compact compression format for these CRDTs, and a sync protocol for efficiently transmitting those changes over the network. The objective of the project is to support local-first applications in the same way that relational databases support server applications - by providing mechanisms for persistence which allow application developers to avoid thinking about hard distributed computing problems. Automerge aims to be PostgreSQL for your local-first app.

If you're looking for documentation on the JavaScript implementation take a look at https://automerge.org/docs/hello/. There are other implementations in both Rust and C, but they are earlier and don't have documentation yet. You can find them in rust/automerge and rust/automerge-c if you are comfortable reading the code and tests to figure out how to use them.

If you're familiar with CRDTs and interested in the design of Automerge in particular take a look at https://automerge.org/automerge-binary-format-spec.

Finally, if you want to talk to us about this project please join our Discord server!

Status

This project is formed of a core Rust implementation which is exposed via FFI in javascript+WASM, C, and soon other languages. Alex (@alexjg) is working full time on maintaining automerge, other members of Ink and Switch are also contributing time and there are several other maintainers. The focus is currently on shipping the new JS package. We expect to be iterating the API and adding new features over the next six months so there will likely be several major version bumps in all packages in that time.

In general we try and respect semver.

JavaScript

A stable release of the javascript package is currently available as @automerge/[email protected] where. pre-release verisions of the 2.0.1 are available as 2.0.1-alpha.n. 2.0.1* packages are also available for Deno at https://deno.land/x/automerge

Rust

The rust codebase is currently oriented around producing a performant backend for the Javascript wrapper and as such the API for Rust code is low level and not well documented. We will be returning to this over the next few months but for now you will need to be comfortable reading the tests and asking questions to figure out how to use it. If you are looking to build rust applications which use automerge you may want to look into autosurgeon

Repository Organisation

  • ./rust - the rust rust implementation and also the Rust components of platform specific wrappers (e.g. automerge-wasm for the WASM API or automerge-c for the C FFI bindings)
  • ./javascript - The javascript library which uses automerge-wasm internally but presents a more idiomatic javascript interface
  • ./scripts - scripts which are useful to maintenance of the repository. This includes the scripts which are run in CI.
  • ./img - static assets for use in .md files

Building

To build this codebase you will need:

  • rust
  • node
  • yarn
  • cmake
  • cmocka

You will also need to install the following with cargo install

  • wasm-bindgen-cli
  • wasm-opt
  • cargo-deny

And ensure you have added the wasm32-unknown-unknown target for rust cross-compilation.

The various subprojects (the rust code, the wrapper projects) have their own build instructions, but to run the tests that will be run in CI you can run ./scripts/ci/run.

For macOS

These instructions worked to build locally on macOS 13.1 (arm64) as of Nov 29th 2022.

# clone the repo
git clone https://github.com/automerge/automerge
cd automerge

# install rustup
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# install homebrew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

# install cmake, node, cmocka
brew install cmake node cmocka

# install yarn
npm install --global yarn

# install javascript dependencies
yarn --cwd ./javascript

# install rust dependencies
cargo install wasm-bindgen-cli wasm-opt cargo-deny

# get nightly rust to produce optimized automerge-c builds
rustup toolchain install nightly
rustup component add rust-src --toolchain nightly

# add wasm target in addition to current architecture
rustup target add wasm32-unknown-unknown

# Run ci script
./scripts/ci/run

If your build fails to find cmocka.h you may need to teach it about homebrew's installation location:

export CPATH=/opt/homebrew/include
export LIBRARY_PATH=/opt/homebrew/lib
./scripts/ci/run

Contributing

Please try and split your changes up into relatively independent commits which change one subsystem at a time and add good commit messages which describe what the change is and why you're making it (err on the side of longer commit messages). git blame should give future maintainers a good idea of why something is the way it is.

Releasing

There are four artefacts in this repository which need releasing:

  • The @automerge/automerge NPM package
  • The @automerge/automerge-wasm NPM package
  • The automerge deno crate
  • The automerge rust crate

JS Packages

The NPM and Deno packages are all released automatically by CI tooling whenever the version number in the respective package.json changes. This means that the process for releasing a new JS version is:

  1. Bump the version in the rust/automerge-wasm/package.json (skip this if there are no new changes to the WASM)
  2. Bump the version of @automerge/automerge-wasm we depend on in javascript/package.json
  3. Bump the version in @automerge/automerge also in javascript/package.json

Put all of these bumps in a PR and wait for a clean CI run. Then merge the PR. The CI tooling will pick up a push to main with a new version and publish it to NPM. This does depend on an access token available as NPM_TOKEN in the actions environment, this token is generated with a 30 day expiry date so needs (manually) refreshing every so often.

Rust Package

This is much easier, but less automatic. The steps to release are:

  1. Bump the version in automerge/Cargo.toml
  2. Push a PR and merge once clean
  3. Tag the release as rust/automerge@<version>
  4. Push the tag to the repository
  5. Publish the release with cargo publish

automerge-classic's People

Contributors

adelsz avatar alexjg avatar aslakhellesoy avatar begor avatar dan-weaver avatar dependabot[bot] avatar edahlseng avatar ept avatar ethanrbrown avatar gozala avatar herbcaudill avatar jeffa5 avatar jeffpeterson avatar kpruden avatar mattkrick avatar mmcgrana avatar nathanfaucett avatar nettybun avatar orionz avatar paulmorris avatar pgte avatar philschatz avatar pierreprinetti avatar pvh avatar rikkertkoppes avatar samio avatar scotttrinh avatar vedantroy avatar verma-rajatk avatar wincent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

automerge-classic's Issues

Request to update Readme to match current API

README.md seems to be out of date, could I get current docs? Specifically using the changeset() API, and accessing getHistory().

Also there's a bunch of junk at the bottom of the readme that looks like it could be cleaned up or removed.

assignment merges

Adam found a bug where assignments were getting clobbered on merges, I figured out that it was happening because on assignment updates I was resetting the entire assigned object instead of the specific prop:

  updateAssignments(state, action) {
    let cardIndex = this._findIndex(state.cards, (c) => c.id === action.cardId)
    return Tesseract.set(state.cards[cardIndex], "assigned", action.assigned)
  }

So I tried:

  updateAssignments(state, action) {
    let cardIndex = this._findIndex(state.cards, (c) => c.id === action.cardId)

    // cards don't have an "assigned" prop by default,
    // we have to set it if it doesn't already exist
    if(!state.cards[cardIndex].assigned)
      state = Tesseract.set(state.cards[cardIndex], "assigned", {})

    // make the update more granular so we don't clobber the entire "assigned" object
    return Tesseract.set(state.cards[cardIndex].assigned, action.person, action.isAssigned)
  }

However, this doesn't work either because now we're clobbering the "assigned" object when we initialize it in the if expression. I finally got it to work by making sure every card has the "assigned" prop when it's created (in the seed data or when a user creates a card). You can see the full diff here.

This works in our use case because no one's using the app so it's easy enough to change how we initialize the data, but if we were using Trellis in production I'd have to write a migration to add an empty "assigned" property to every card and make sure all future cards have it by default. I'm not sure if that would even work because the migration itself could end up clobbering some users' assignment props. Is there a better way to handle this that I haven't thought of?

Branching, Forks, and "Dangling Tips"

The default mode of operation for Automerge is to render all operations in the op_set into the output document.

For Trellis, we added a history view to crawl back through earlier states of the document and see both the change at that time and the state of the document.

Our new application adds a notion of asynchronous collaboration. That is to say, we don't always want to apply operations as soon as they're received from other users. This means we not only want to be able to see how the document appears for another user, but also to anticipate how it might appear if we merged their changes and yours.

As a result, our current solution involves creating a great many extra automerge documents -- one for each known peer involved in editing a document to represent their last-known-state and additional automerge documents to represent speculative merges to provide previews of what things might look like if I merged your changes into my work.

This leads to large amounts of wasted work and memory. Most of these documents will be identical modulo a few operations at any point in time, and the cost of creating copies is fairly high.

It would be nice if automerge had better support for structural sharing and an API for querying these kinds of document versions.

performance characteristics of automerge

Hi,

preface
I've wanted to introduce immutable.js to my app, but I disliked having to use the immutable.js data structure's API (e.g. using .get() instead of just accessing a property on an object). Now, I'm thinking that automerge might be able to give me the best of both worlds by providing immutable data that uses just plain javascript objects and arrays.

question
I was curious about the performance implications of using automerge in my redux app. I know that Immutable.js has some optimizations for operating within it's data structures (e.g. adding to an immutable.js List), but I've read that there's a performance hit for converting an immutable.js data structure into a vanilla javascript object or array (e.g. using the .toJS()). How does automerge keep operations on large data structures (e.g. a giant redux state tree) performant? Also, are there any other performance characteristics I'd need to keep in mind when using automerge?

Thanks.

`tesseract.root` should not be modifiable

We already discussed this, but just for record keeping...

To load the fake Trellis data, I was doing:

this.tesseract.root = require("./initial_state.json")

This silently breaks tesseract, which wasn't too hard to figure out but it would be nice if this either worked as expected or raised an error if it's not allowed.

binding tesseract store to react components

Note: I'm just logging feedback in this issue, nothing to take action on yet.

In Trellis, we're passing a Tesseract.Store object around to every component so that they can access the data. Each component also adds a listener to store.subscribe so it knows when to re-render itself. Since Tesseract.load returns a new object, I have to manually update the reference and listeners in each child component. One way to resolve this would be to create a store.load function that reloads the store internally but doesn't require updating references and re-subscribing listeners, but if we use Redux again this might not be an issue at all since Redux takes care of triggering re-renders for us.

I'll also highlight that our Trellis components re-render on any global state change, even if they don't care about that part of the state (e.g. if a card in one list changes, every card re-renders). Redux has a good solution using its connect helper where you map certain parts of the global state to your local component state, and it will only re-render if there are changes to the local state.

Performance isn't an issue at our current scale so we're able to punt on this altogether for now. I'm only flagging these as data points that support Redux compatibility in Tesseract.

batch remove

In this function, I'm deleting a list as well as its associated cards (since the data is relational):

deleteList(state, action) {
  ...

  cardIndexes.forEach((index) => {
    state = Tesseract.remove(state.cards, index)
  })

  return Tesseract.remove(state.lists, listIndex)
}

Maybe Tesseract.remove could take an array of indices as well as a single index?

Tesseract.remove(state.cards, cardIndexes)

We were also talking about batch updates the other day, this would be a very good use case for them. I expect that I want my removal of the list and it's associated cards to all happen in one transaction as opposed to individual actions:

state = Tesseract.transaction(() => {
    cardIndexes.forEach((index) => {
    state = Tesseract.remove(state.cards, index)
  })

  return Tesseract.remove(state.lists, listIndex)
})

return state

Relational API

We currently support a JSON-like data structure of nested lists and maps, but the Redux website makes a compelling argument that you should use a relational-like structure rather than deeply nested structures. I'm opening this issue as a place to think about how to best support a relational data model besides the JSON-like data model.

In the abstract, a relational model can of course be implemented on top of JSON. However, there are some open questions around these areas:

  • How to create IDs: app developers may be tempted use use incrementing integers, which would not be safe, as multiple actors may concurrently generate the same integer. UUIDs are safe, but bulky. We already internally generate IDs, so maybe we can expose them in a way that is useful to the application.
  • Foreign key integrity: for example, if a list is deleted, also delete all the cards on that list (#22). This would happen automatically in a model where the cards are nested inside the list object, but if they are separate top-level objects in a relational style, you could end up with violations of the foreign key constraint (a new card is added to a list while concurrently the entire list is deleted).

If I do lots of change by a single assignment, automerge won't perform a diff.

See ianstormtaylor/slate#259 (comment)
If I do something like this:

const doc2c = Automerge.change(doc2b, 'Adding some text', doc => {
  // from https://docs.slatejs.org/walkthroughs/saving-to-a-database
  doc.note = {
    document: {
      nodes: [
        {
          object: 'block',
          type: 'paragraph',
          nodes: [
            {
              object: 'text',
              leaves: [
                {
                  text: 'A line of text in a paragraph.',
                },
              ],
            },
          ],
        },
      ],
    },
  };
});
// sent change to doc1's client
changes = Automerge.getChanges(doc2b, doc2c);
const doc1c = Automerge.applyChanges(doc1b, changes);

// change some text deep inside document
const doc1d = Automerge.change(doc1c, 'Rewrite line', doc => {
  doc.note = {
    document: {
      nodes: [
        {
          object: 'block',
          type: 'paragraph',
          nodes: [
            {
              object: 'text',
              leaves: [
                {
                  text: 'Rewrite this line.',
                },
              ],
            },
          ],
        },
      ],
    },
  };
});

changes = Automerge.getChanges(doc1c, doc1d);

Those changes looks like this:

[ { actor: 'fca6a267-8317-4471-a8ed-71267e7d5779',
    seq: 2,
    deps: { '46ede060-927c-4ddb-9224-a9c654c3d42c': 1 },
    message: 'Rewrite line',
    ops:
     [ [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object],
       [Object] ] } ]

There are lots of ops, though I'm only change a single line.

Will this result in a huge history, just like somebody commits a movie into the git repo?

Elixir port

Hey!

Iโ€™d like to write a port of this library in Elixir to use when syncing between server <-> server and server <-> client, where the servers are running Elixir or some other language, and the client is running automerge.

I think thereโ€™s a huge need for a generalized document synchronization protocol that isnโ€™t language specific, and doesnโ€™t require paying for a service (a la Firebase or Realm)

Any thoughts on documenting or formalizing the delta spec or core merge logic? Itโ€™s different from the JSON CRDT paper, yes?

Iโ€™d be happy to help and write tests that verify cross-implementation correctness.

Ideally, this implementation, implementations in other languages, and the spec would be maintained together.

Immutable API style discussion

6eda1e4e924f351cbe50a8166fb6dfe45c9928d5 changed the API of immutable tesseract as proposed by @orionz, namely eliminating the distinction between a store and the root object, and applying edits by calling tesseract.something() functions rather than methods. @adamwiggins provided a DX critique of the API style. In this issue I'd like to discuss the options further.

Multi-field assignment problems

One rationale for the change was that in the following example (from the previous API) it's not clear what's happening:

store = store.assign({config: {background: 'blue'}})

// That might be equivalent to setting a single field in the config object:
store.config.background = 'blue'
// or it might mean replacing the entire config object:
store.config = {background: blue}

The previous API merged objects on a field-by-field basis if the object already existed (i.e. equivalent to store.config.background = 'blue'), which seemed intuitive to me, but then Orion pointed out that it is inconsistent with JavaScript's Object.assign(). Also, that API left no way of replacing an entire object, except perhaps by deleting the entire object and re-adding it.

So, in the new API, I didn't include any method for assigning to several fields at a time (by passing an object), and only allowed a single field to be updated at a time. However, the new API still allows you to pass an object as a value:

t.insert(s1.cards, 1, {title: 'Reticulate splines', done: false})

// The line above is shorthand for:
s1 = t.insert(s1.cards, 1, {})
s1 = t.set(s1.cards[1], 'title', 'Reticulate splines')
s1 = t.set(s1.cards[1], 'done', false)

Adam felt this API was inconsistent, because t.set(s1.cards[1], 'done', true) just takes a primitive value as third parameter, whereas t.insert(s1.cards, 1, {title: 'Reticulate splines', done: false}) takes an object as parameter. I agree that they look visually different, but I think they actually have the same structure: both insert and set have arguments (targetObject, keyOrIndex, value), and the change always affects the single field targetObject[keyOrIndex], regardless of whether value is a primitive or an object.

I think it's important for the API to encourage the developer to only assign a single field at a time, which is required in order to get sensible CRDT merging behaviour (I think @choxi called this principle something along the lines of "make changes as granular as possible"). My feeling is that the single-field set operation does that, but perhaps there's room for improvement.

It would be possible to define a two-argument version of set, as Adam proposes:

t.set(s1.cards[1], { done: true })
// is equivalent to:
t.set(s1.cards[1], 'done', true)

which would allow several fields to be assigned in one go, but I'm not sure it would be much clearer. Any thoughts?

Functions versus methods

Compare:

// Current API uses function insert on the global tesseract object:
s1 = t.insert(s1.cards, 0, {title: 'Rewrite everything', done: false})

// Adam suggested instead putting a method on the store object:
s1 = s1.insert('cards', 0, {title: 'Rewrite everything', done: false })

My reason for using the first of the two options was to keep the namespaces of tesseract methods and application data separate. For example, say the application sets a field called insert:

t.set(s1, 'insert', 3)

โ€ฆthen it is not clear whether s1.insert refers to the insertion method or the value 3. The previous API relegated application data to the root object, so s1.root.insert == 3, but having to constantly type .root gets tedious quickly.

If we want to use methods, one option would be to prefix them with an underscore, since the application data is not allowed to have fields starting with an underscore:

s1 = s1._set('set', 3)
// now s1._set is a function, and s1.set == 3

s1._set('_set', 3) // not allowed, throws an error

โ€ฆbut that seems a bit ugly to me.

Structure-sharing

One attractive property of immutable data structures is that when some data changes, the object references for unaffected parts of the tree remain unchanged, and only changed parts of the tree get new object references. This means that you can check whether something has changed by doing a simple pointer equality comparison ===, and AFAIK this property is used for rendering efficiency by React and Redux.

However, the current API does not have that property: since it creates proxy objects on the fly, the references to objects are never the same. For example:

s1.cards[1] === s1.cards[1]
// surprisingly, returns false!

I can't think of a way of avoiding this problem with the current API. The issue is as follows: for example, if you do t.set(s1.cards[1], 'done', true), the function needs to return a new root object that is the same as s1 except that it has s1.cards[1].done updated. The t.set() function is only passed s1.cards[1] as an argument (i.e. a single card), and it needs to find the root object from there, which requires including a reference to the root object on the nested card object. Now, whenever anything changes, we get a new root object, and so all nested objects that reference the root object also need to be replaced with new objects that reference the new root. Thus, this API forces us to sacrifice either structure-sharing or immutability. (Currently, it sacrifices structure-sharing.)

To fix this, we would need to change the API so that we pass the root object, and not some nested object, into the function that performs the change. For example, one of the following:

// All of these are equivalent to s1.cards[1].done = true
s1 = t.set(s1, ['cards', 1, 'done'], true)
s1 = t.set(s1, 'cards', 1, 'done', true)
s1 = s1.set(['cards', 1, 'done'], true)
s1 = t.set(s1, s1.cards[1], 'done', true)

In other words, the code would probably have to spell out the path from the root object to the updated object as a list of strings/numbers, rather than the more natural JS expression s1.cards[1]. Alternatively, as in the fourth example above, you would need to pass both the root object (s1) and the object being updated (s1.cards[1]) as two separate arguments.

Opinions welcome!

Performance/scalability?

Hey there,

First off, thank you for sharing your paper (A Conflict-Free Replicated JSON Datatype) and this library with the world.

I started playing with automerge this afternoon and wanted to run some basic tests to see if I could use it for the federated personal web site system Iโ€™m working on, as I want it to work offline and Iโ€™d rather not privilege servers over clients (the goal is, ideally, for this to be a stepping stone towards a p2p world, which I believe is what weโ€™re all working for) :)

So to cut to the chase, I wanted to share my very unsophisticated findings and to also inquire about whether, in the words of Steve Jobs, Iโ€™m โ€œholding it wrongโ€ :)

Multiple inserts

Code: https://source.ind.ie/snippets/80

I first tried pushing incremental numbers to an array in a document via the change() method. The timings (on my MacBook) ranged from 6ms for 1 push to 43.75 seconds for 10,000. 100 pushes took 175ms and 1,000 took 1.345 seconds.

screen shot 2018-05-09 at 21 58 41

The storage requirements (tested very roughly using util.inspect to flatten the object, including all hidden fields) also seem to follow a similar curve. A single insert took up 3,557 bytes (size of original object: 18 bytes), 10: ~17KB, 100: ~160KB, 1,000: ~1.6MB, and the array with 10,000 numbers took up ~ 16MB (size of original object: ~80 bytes). Again, my testing methodology doesnโ€™t necessarily reflect the actual amount of space these objects would take up either in memory or on the file system but, unless Iโ€™ve done something really daft (which is always a possibility), it does look like the system would result in a sluggish interface on operations on an array with ~100 items or so.

screen shot 2018-05-09 at 21 58 47

Single insert

Then I thought maybe I am holding this wrong and it is meant to be used with batch inserts.

Code: https://source.ind.ie/snippets/81

So instead of testing, say, 10,000 pushes to an array, I wanted to test a single change to the object where an array with 10,000 items is added.

The results I got mirror those of the first scenario.

The timings seem to follow a similar curve at first but the results I got for the array with 10,000 items was ~3x slower at ~115 seconds vs ~43 seconds using the first technique.

screen shot 2018-05-09 at 22 08 31

As for the document sizes, they came out to be slightly less than with the first technique from the 100 item mark onwards. It was still ~1.3MB for 1,000 items and ~13MB for 10,000.

screen shot 2018-05-09 at 22 08 40

Iโ€™d love to hear your thoughts on this as well as any criticism of the above (including what I was benchmarking and how). My use case is a personal web site allows both posts and messaging between sites. I was using the 10,000 item test as an upper limit for initial use (e.g., 10,000 posts in a category, 10,000 messages in a conversation.) A more realistic amount might be 20-30 to a few hundred. But even at those levels, the latency would require operations to take place outside of the main thread to avoid a sluggish UI.

Also, since Automerge is already being used in two real-world applications, Iโ€™d love to hear of your actual experiences with performance, scalability, and storage requirements.

Thanks again for making this and sharing it. Itโ€™s a very hard problem domain indeed.


Update

Based on the feedback of @pvh and @j-f1 (thanks, folks!), I just updated my tests to include:

A key setting test (as opposed to array.push()) โ€“ย the results are generally comparable to the previous ones:

screen shot 2018-05-10 at 21 43 24

screen shot 2018-05-10 at 21 43 37

And to use Automerge.save(doc) to test storage size (the sizes are 3x-5x smaller):

screen shot 2018-05-10 at 21 43 47

screen shot 2018-05-10 at 21 43 55

screen shot 2018-05-10 at 21 44 08

Iโ€™ve added the test files to their own repository at https://source.ind.ie/indienet/spikes/crdt/automerge

Channel concept, granularity of documents and dependency tracking

We have previously discussed a few things we'd like to be able to do with Automerge documents:

  • Forking documents (like forking in github/branching in git), with the choice of whether or not to remain subscribed to upstream documents
  • Allowing a forked document to be merged back into its upstream document
  • Tentatively applying a bunch of changes to a local copy of a document, discarding them again if the result is not as desired, or publishing the merged result if the result is good
  • Template documents that set up some basic scaffolding required for an application, and concrete documents that "inherit" from the template
  • A mechanism for consuming feeds and merging them into a single document โ€”ย think RSS feeds, or feeds of currency exchange rates, or suchlike

I had a call with @pvh to discuss how best to implement these concepts in Automerge/MPL. The following are some notes on what we discussed.

As a first step, hash-chaining to encode the dependency graph (#28, like parent commit hashes in git) seems like a good idea. To find all changes that have gone into a document, start with one or more HEAD commit hashes, and traverse the dependency graph. When two communicating nodes have made concurrent changes, they won't know about each others' heads, so they'll need to run a multi-round protocol to figure out their latest common ancestor hash, much like git (#27).

This leaves the question of how you find out about changes to a document. Our proposal is to separate it into two concepts:

  1. A channel is a network abstraction for pub/sub. A change is published to a channel, and a node can subscribe to any number of channels. A channel has a unique identifier, e.g. a UUID. Channels are probably not visible to the end user, but only an internal abstraction.
  2. A document is a set of channels, and it incorporates all changes that appear in any of its channels. A document may exist only on one node, and is not necessarily shared with other nodes. To share a document, its set of channels should perhaps be written to a filesystem CRDT?

The features outlined above can all be implemented using those two concepts:

  • Forking a document means publishing any future changes to that document to a new channel. The document can remain subscribed to the upstream channels (continuing to incorporate upstream changes), or unsubscribe from the upstream channels.
  • Merging a forked document back upstream would mean adding a change with a dependency on the fork to the upstream channel. (I guess that implies that a change's dependencies do not have to be in the same channel, but could be in any channel?)
  • Tentatively applying changes could mean adding the experimental channel to the set of channels for a document; as long as this dependency is not published back to the channel, it can be reverted by simply removing the channel again.
  • Template documents: if document B inherits from template A, then B includes A's channels as well as its own.
  • Likewise, merging feeds into a single document is easily achieved by having each feed in a separate channel, and having the document refer to all of them (so the document is the union of the changes in all of the channels).

Support for programmatic change merges.

First of all, great library :)

I'm running in to use cases where I want to display changes and selectively apply them. In order to do so, I need to be able to:

  • Get a list of changes changes for a given diff. Basically a more exposed version of getMissingChanges.
  • An exposed version of applyChanges.

Unless I'm missing another way to accomplish this.

Slow after 100 clients?

Could you please expand on this:

Caveats: Small number of collaborators: Automerge is designed for small-group collaborations. While there is no hard limit on the number of devices that can update a document, performance will degrade if you go beyond, say, 100 devices or so.

What causes this? Some kind of change history? I'm looking to switch to CRDT but this limitation could be a bit problematic.

Hash-based integrity checking of operation sets

At the moment, tesseract has no integrity checks. For example, a buggy node could send two different operations with the same sequence number to different peers, and they may never notice that they have diverged.

A simple approach for fixing this would be to take a leaf out of git's model: each operation has a cryptographic hash that covers the contents of the operation, the author, and the hashes of the prior operations on which it depends. The recipient must check that the hashes match up. Such hash-chaining alone is not sufficient to prevent disruption by actively malicious peers, but it would catch a lot of accidental bugs, and it would pave the way for a protocol that is robust against malicious peers (e.g. using digital signatures).

`handler.set()` requires a return value of true or false in strict mode

I ran into this error trying to set a value on store.root:

import { Store } from "tesseract"
let store = new Store()

store.root.foo = "bar"
// => Uncaught TypeError: 'set' on proxy: trap returned falsish for property 'foo'

I traced the error back to the MapHandler in Tesseract. It looks like Node requires this Proxy/handler object to return true or false depending on if the assignment was successful, but only in strict mode.

Unfortunately, there's no way to disable strict mode because all modules are loaded under strict mode in ES6, so this might be why we can't reproduce the error in the Node repl. I monkeypatched my local copy of the module and just added a return true to confirm that it fixes the issue, I'd recommend updating Tesseract to follow the strict mode spec for handler.set() since mostly everyone will be using ES6.

Reduce size of vector clocks

At present, we create a new actor ID every time a node is started (either as a completely fresh node, or loading its state from file). That approach ensures that even if you run a process twice on the same machine, they will have different actor IDs, and so they won't step on each others' toes. However, the downside is that the rate of churn of actor IDs is high, and so the vector clocks (which have an entry for each actor) get rather large.

I did a first step in this direction in 9d8f136: instead of including the full vector clock in each changeset, we now only include the actor ID of the originator, the sequence number (which starts at 1 for a new actor ID and is incremented on every changeset), and a minimal set of dependencies (as actorId-seqNum pairs). The set of dependencies does not include any dependencies that can be reached transitively through other dependencies, and it implicitly includes seqNumโ€“1 for the same actorId. In other words, it only references changesets from other actors that were received by the originator between seqNumโ€“1 and seqNum, making the dependencies much like a merge commit in git.

However, the API for figuring out which changesets to send between peers (getVClock(), getDeltasAfter()) still uses full vector clocks without any truncation. Here, reducing the size is a little trickier. It can be done, but it will require more than one round-trip between peers to figure out what deltas need to be sent.

The reason is as follows: for example, say the latest changeset known by peer A (the "HEAD" in git terminology) has sequence number 42. All other changesets known by A can be reached transitively by following the dependency chains from A:42. So it is technically sufficient for getVClock() to return {A:42}, since that contains all the necessary information. However, any peer that does not have recent changesets from A cannot interpret the clock {A:42}. All it can tell is that it is missing a bunch of changesets from A, but it may also be missing changesets from other actors, and it won't find out what those are until it has received the changesets from A.

To solve this, we can probably take a look at the git transfer protocol as implemented by git-send-pack and git-upload-pack. Git has a similar problem, since it identifies the state of a repository by a commit hash, and if you only know a commit hash, you don't know what history you're missing in order to get there.

Documentation / Spec

The README.md states the following:

Automatic merging. Automerge is a so-called Conflict-Free Replicated Data Type (CRDT),
which allows concurrent changes on different devices to be merged automatically without
requiring any central server. It is based on academic research on JSON CRDTs, but the
details of the algorithm in Automerge are different from the JSON CRDT paper, and we are
planning to publish more detail about it in the future.

Is there are timeline as to when this information will be available? It would be nice to
have this information for people trying to implement this in other languages.

Thanks.

Expose conflicts on list objects

At the moment there's a _conflicts object on maps that contains any concurrent assignments to the same key, but there's no equivalent _conflicts object on lists that contains concurrent assignments to the same list element. There should be.

Clearing History

Any thoughts regarding clearing history and changes to a document leaving only the data? As I understand history should persist since it is needed to resolve possible conflicts. In a collaborative editing scenario, a client edits a document and eventually hits save/close to end the editing session. maybe if there is a node/server that is responsible for authorization and it keeps track of open sessions, and once the sessions < 1, then it clears the metadata. but how to account for a client session that starts and is never closed due to a failure?

Error when performing Automerge.change() while Automerge.getMissingDeps() is non-empty

@mmcgrana reports the following issue:

Automerge = require('automerge')

const docRoot = Automerge.init()

const doc1 = Automerge.change(docRoot, (doc) => {
  doc.foo = []
})
const change1 = Automerge.getChanges(docRoot, doc1)

const doc2 = Automerge.change(doc1, (doc) => {
  doc.foo.push('a')
})
const change2 = Automerge.getChanges(doc1, doc2)

console.log(change1, change1[0].ops)
console.log(change2, change2[0].ops)
console.log()

const docA = Automerge.applyChanges(docRoot, change2)
console.log(docA)
console.log(Automerge.getMissingDeps(docA))
// Line below throws e.g. `Modification of unknown object 860bacb9-eae2-4035-a257-4a2604c229f4`.
console.log(Automerge.change(docA, (doc) => { doc.biz = 'c' }))
console.log()

const docB = Automerge.applyChanges(docA, change1)
console.log(docB)
console.log(Automerge.getMissingDeps(docB))
console.log(Automerge.change(docB, (doc) => {doc.biz = 'd' }))

// Everything else above works as expected (if the offending line is commented out).

// The error only appears for certain types of blocks given to create doc1 and
// doc2. E.g. if doc1 is created with `(doc) => { doc.foo = 'bar' }` and doc2
// with `(doc) => {doc.foo = 'bat' }`, the error doesn't occur.

pretty print store objects

Is there a way to have console.log print Tesseract objects in a developer-friendlier way? Currently it looks like this:

image

You can expand the object to get to the actual values but it's a bit tedious:

image

Immutable.js API

The current Automerge API, which uses the change callback (with proxy objects) for modifications, and Object.freeze()-frozen plain JS objects for the read-only view of a document, is good for many apps. However, some apps are written against the Immutable.js API (e.g. using .set(key, val) for writes, and .get(key) for reads). It would be good if Automerge could support that API too.

Unable to set object en masse to mutable doc within change callback

I've found that the doc mutable object within a Automerge.change callback can't be set using another object directly, otherwise the changes are lost.

See the following case in which I try to set doc with an object's values directly during the "Initialize" change starting on line 5...

const Automerge = require('automerge');

let doc1 = Automerge.init();

doc1 = Automerge.change(doc1, 'Initialize', doc => {
  doc = {
    bar: 'zoop',
    foos: {
      doop: 1
    }
  };
});

console.log('doc1', doc1);

doc2 = Automerge.init();
doc2 = Automerge.merge(doc2, doc1);

doc1 = Automerge.change(doc1, 'Set foos doop to 5', doc => {
  doc.foos['doop'] = 5;
});

doc2 = Automerge.change(doc2, 'Set bar to zip', doc => {
  doc.bar = 'zip';
});

let finalDoc = Automerge.merge(doc1, doc2);

console.log(Automerge.getHistory(finalDoc).map(state => [state.change.message]));
console.log(finalDoc);

...which results in an empty doc1 variable aside from its _objectId and the following error:

automerge-test $ node index.js 
doc1 { _objectId: '00000000-0000-0000-0000-000000000000' }
/private/var/www/automerge-test/index.js:20
  doc.foos['doop'] = 5;
                    ^

TypeError: Cannot set property 'doop' of undefined
    at Automerge.change.doc (/private/var/www/automerge-test/index.js:20:21)
    at Object.change (/private/var/www/automerge-test/node_modules/automerge/src/automerge.js:145:3)
    at Object.<anonymous> (/private/var/www/automerge-test/index.js:19:18)
    at Module._compile (module.js:635:30)
    at Object.Module._extensions..js (module.js:646:10)
    at Module.load (module.js:554:32)
    at tryModuleLoad (module.js:497:12)
    at Function.Module._load (module.js:489:3)
    at Function.Module.runMain (module.js:676:10)
    at startup (bootstrap_node.js:187:16)

However, if I set the mutable doc object's properties individually...

const Automerge = require('automerge');

let doc1 = Automerge.init();

doc1 = Automerge.change(doc1, 'Initialize', doc => {
  doc.bar = 'zoop';
  doc.foos = {
    doop: 1
  };
});

console.log('doc1', doc1);

doc2 = Automerge.init();
doc2 = Automerge.merge(doc2, doc1);

doc1 = Automerge.change(doc1, 'Set foos doop to 5', doc => {
  doc.foos['doop'] = 5;
});

doc2 = Automerge.change(doc2, 'Set bar to zip', doc => {
  doc.bar = 'zip';
});

let finalDoc = Automerge.merge(doc1, doc2);

console.log(Automerge.getHistory(finalDoc).map(state => [state.change.message]));
console.log(finalDoc);

...then the changes are applied successfully:

automerge-test $ node index.js 
doc1 { _objectId: '00000000-0000-0000-0000-000000000000',
  bar: 'zoop',
  foos: { _objectId: '83c3523b-7c65-4a76-92e0-c476ccef33f0', doop: 1 } }
[ [ 'Initialize' ],
  [ 'Set foos doop to 5' ],
  [ 'Set bar to zip' ] ]
{ _objectId: '00000000-0000-0000-0000-000000000000',
  bar: 'zip',
  foos: { _objectId: '83c3523b-7c65-4a76-92e0-c476ccef33f0', doop: 5 } }

Perhaps this behavior is unintentional but I expected to be able to set the initial value of doc1 with a multi-level object as such without problems so the behavior was unexpected.

Make history more memory-efficient

At the moment, the edit history consumes a lot of memory, because it stores a snapshot of the document at every point in the history. I noticed this while running a text-editing performance test. A trial run of 10,000 single-character editing operations in Node.js ran for 7 minutes before crashing with an out-of-memory error. However, removing the following line (op_set.js:261) brought the runtime down to 72 seconds without crashing:

opSet = opSet.update('history', history => history.push(Map({changeset, snapshot})))

To reduce the memory use, we should not store a snapshot but only the operations, and build the snapshot on demand when requested through the API.

`console.log(store)` prints undefined values

Snippet to reproduce bug:

const Tesseract = require("tesseract")

let store = Tesseract.init()
store = Tesseract.changeset(store, "set foo", (doc) => {
  doc.foo = "bar"
})

console.log(store)     //=> { foo: undefined }
console.log(store.foo) //=> bar

The console log says foo is undefined, but actually it is defined and the log is incorrect.

conflicts in single-player mode

I'm somehow creating conflicts even when Trellis and DocInspector are disconnected. To reproduce:

  1. Open Trellis (from conflicts branch)
  2. Disconnect Trellis and DocInspector
  3. Move any card from one column to another on Trellis

The card that was moved now has a _conflicts property that looks like this:

// card._conflicts
{
  "listId": {}
}

So it flags listId as being in conflict, but then the conflicts object is empty. The peer_action created looks like this:

{
  "by": "trellis",
  "clock": {
    "trellis": 19
  },
  "action": "set",
  "target": "cd5649a6-3130-4872-9533-72c0c3bc9a6c",
  "key": "listId",
  "value": 1
}

For some reason, the DocInspector does not create the same conflict even though I'm using the same setter pattern:

// Trellis
updateCard(id, attributes) {
  this.tesseract.root.cards[id].listId = attributes.listId
}

// DocInspector
updateListId(event) {
  let index = parseInt(event.target.name.replace(/[^0-9]/g, ''))
  let newListId = parseInt(event.target.value)

  if (newListId >= 0 && newListId <= 2) {
    this.tesseract.root.cards[index].listId = newListId
    this.setState(this.tesseract.getState())
  }
}

Is it possible that Tesseract is still flagging a conflict with DocInspector even though they're both "paused"?

golang port

Automerge looks highly useful in the non JS world too.

I would like to hear back from people about if they woudl be supportive of a port to golang.

You can compile golang to WASM and JS for use in browser and nodejs.
JavaScript:
https://github.com/gopherjs/gopherjs

Wasm:
https://go-review.googlesource.com/c/go/+/102835
Its offically supported and WASM is supported in all browsers now.

You can also use it for compiled GUI's.
For example you can use it with Flutter.
flutter/flutter#15925
You can compiel the golang code to AMD86 or ARMv8 (64bit) for use in emulators or real mobiles.

You can also use it on raspberry PI's etc etc or from cpp, c#, etc etc.


I like the fact that Automarge only does the logic and leaves it up to the developers how they want to integrate it with storage etc.
Also the same goes for how nodes communicate and what network transports they use.

I also think serverside there is huge potential for using Automerge / CRDT. There are a few high performance computing projects using CRDT to handle parrallel computing workloads for example.

I can imagine if a golang port occurs then wrapper for Protocol buffers and GRPC will happen.


I fully expect to be heavily criticised for discussing this as i can imagine if you are mostly programming in JavaScript you will view any port to golang as a negative thing. In some ways its true. But that is why i also highlighted that golang supports WASM now.

We could have 2 source versions of Automerge. But the logic of how Automarge and CRDT works is rare and so its not that easy to maintain 2 source versions. But when looking at the code i can see its pretty clean code and not a huge amount of it.
Personally i would favour a single surce base in golang version and a compiled to WASM version.


Other ways to skin the same cat ?

There is one JS to golang compiler also, and maybe a few more.
https://github.com/jingweno/godzilla

There are 3 Javascript emulators written in golang also, but i dont think that route is optimal because they are slow.

Please be nice.. I expect many will not look at this issue favourable ..

Higher-level changesets API

Had a good discussion with @adamwiggins about what kind of information Tesseract should expose in order to support Omniview, inspection and debugging. Digging around in the internal action list isn't great, since those actions are extremely low-level and granular โ€”ย which is reasonable from Tesseract's internal perspective, but from an app developer's point of view feels a bit like poking around with a hex editor.

My takeaway from the discussion was that it would be useful for Tesseract to have an introspection API that provides two features in particular:

  1. Combining internal Tesseract actions into logical groups that are meaningful from an application point of view. Those groups could be called "transactions" (like in databases), or "commits" (like in git), or "changesets" (like in some other version control systems). Each group could have an attached "commit message" or description that describes the change in a human-readable way (for example, "moved card 'Fix WebRTC support' from 'In progress' to 'Done'" rather than "set field 'listId' on record '5c3a9e75-f118-41c2-8666-b38a5b994f18' to 5). A group of changes may well correspond to an action in the Redux model.
  2. Querying the order of those changesets, as it would be displayed in a "commit log": ordering first by causal dependencies, ordering any concurrent changes arbitrarily (e.g. by wall-clock timestamp), and providing enough metadata so that the UI can join up the changes with lines to visualise branches and merges. A higher-level API should allow Omniview to do this without having to interpret vector clocks.

tesseract__branch__master_

At the moment, the Tesseract API doesn't provide any mechanism for grouping together several related changes, nor giving them a human-readable description or commit message. So maybe we should have a facility for describing such grouped changes. The API might look something like this:

moveCard(state, action) {
  const cardIndex = state.cards.findIndex(card => card.id === action.cardId)
  const card = state.cards[cardIndex]
  const oldList = getListById(state, card.listId)
  const newList = getListById(state, action.listId)

  return Tesseract.changeset(state, 'Move card "' + card.title + '" from "' +
                  oldList.title + '" to "' + newList.title + '"', change => {
    return Tesseract.set(change.cards[cardIndex], "listId", action.listId)
  })
}

Besides facilitating introspection, such a commit message could also be useful for user-facing UI elements, such as Undo (displaying a description of what will be undone in the undo menu) and Revert (displaying a log of changes that others made, and allowing the user to revert them selectively).

Any thoughts?

Tesseract.push

I do this pattern lot:

Tesseract.insert(state.cards, state.cards.length, { id: uuid(), title: "New Card"})

In fact, I have yet to insert an object in the middle of an array (though the app is still relatively simple). It'd be nice to have a convenience function like:

Tesseract.push(state.cards, { id: uuid(), title: "New Card" })

Calling applyChanges twice with unresolved dependencies throws and exception

I've attached a PR #64

Here is an example:

const Automerge = require('automerge')

const alice1 = Automerge.init('alice')

const alice2 = Automerge.change(alice1, doc => {
  doc.test = 'a'
})

const aliceChanges1to2 = Automerge.getChanges(alice1, alice2)
console.log('Alice 1 to 2 changes %o', aliceChanges1to2)

const alice3 = Automerge.change(alice2, doc => {
  doc.test = 'b'
})

const aliceChanges2to3 = Automerge.getChanges(alice2, alice3)
console.log('Alice 2 to 3 changes %o', aliceChanges2to3)

const alice4 = Automerge.change(alice3, doc => {
  doc.test = 'c'
})

const aliceChanges3to4 = Automerge.getChanges(alice3, alice4)
console.log('Alice 3 to 4 changes %o', aliceChanges3to4)

const bob1 = Automerge.init('bob')

const bob2 = Automerge.applyChanges(bob1, aliceChanges3to4)

console.log('bob2', bob2)

const bob3 = Automerge.applyChanges(bob2, aliceChanges2to3)

// Boom!
$ node simpler.js 
Alice 1 to 2 changes [ { actor: 'alice',
    seq: 1,
    deps: {},
    message: undefined,
    ops: 
     [ { action: 'set',
         obj: '00000000-0000-0000-0000-000000000000',
         key: 'test',
         value: 'a' },
       [length]: 1 ] },
  [length]: 1 ]
Alice 2 to 3 changes [ { actor: 'alice',
    seq: 2,
    deps: {},
    message: undefined,
    ops: 
     [ { action: 'set',
         obj: '00000000-0000-0000-0000-000000000000',
         key: 'test',
         value: 'b' },
       [length]: 1 ] },
  [length]: 1 ]
Alice 3 to 4 changes [ { actor: 'alice',
    seq: 3,
    deps: {},
    message: undefined,
    ops: 
     [ { action: 'set',
         obj: '00000000-0000-0000-0000-000000000000',
         key: 'test',
         value: 'c' },
       [length]: 1 ] },
  [length]: 1 ]
bob2 { _objectId: '00000000-0000-0000-0000-000000000000' }
/Users/jim/tmp/break-automerge/node_modules/automerge/src/freeze_api.js:221
  Object.assign(Object.getPrototypeOf(rootObj), {_state: state, _actorId: state.get('actorId')})
         ^

TypeError: Cannot assign to read only property '_state' of object '#<Object>'
    at Function.assign (<anonymous>)
    at rootObject (/Users/jim/tmp/break-automerge/node_modules/automerge/src/freeze_api.js:221:10)
    at Object.applyChanges (/Users/jim/tmp/break-automerge/node_modules/automerge/src/freeze_api.js:249:10)
    at Object.applyChanges (/Users/jim/tmp/break-automerge/node_modules/automerge/src/auto_api.js:47:22)
    at Object.<anonymous> (/Users/jim/tmp/break-automerge/simpler.js:32:24)
    at Module._compile (module.js:660:30)
    at Object.Module._extensions..js (module.js:671:10)
    at Module.load (module.js:573:32)
    at tryModuleLoad (module.js:513:12)
    at Function.Module._load (module.js:505:3)

proposal: allow underscores

Currently underscores are not allowed in documents. This is unfortunate as it makes automerge less universal.

I understand this is to prevent accidental overwrites of tracking props, such as _actorId, _objectId, _conflicts and _state. When adding automerge to an existing application which heavily used underscores as part of another library, I ran into a conflict.

I think the best way to fix this is to use es6 symbols, or when symbols are not available, a configurable, namespaced property name like @@automerge/state

I'd be happy to propose a pull request, but I think it's best to discuss it a bit first

  • are there any objections to this proposal?
  • are there any other solutions that may come to mind?
  • is the issue encountered by others? What were their solutions?

load/save round trip fails

let doc = automerge.init();
doc = automerge.change(doc, 'init', (doc) => {
  doc.foo = [];
});
doc = automerge.load(automerge.save(doc));
doc = automerge.change(doc, 'add', (doc) => {
  doc.foo.push(1);
});
doc = automerge.load(automerge.save(doc));

Fails with Error: Missing index entry for list element undefined:1

The docs are kinda unclear on how save() is supposed to work -- it says that save returns a string, but then the first example wraps it in a JSON.stringify. In my experiments it looks like save maybe just returns the thing you handed it?

friendlier error messages

I accidentally did Tesseract.set when I meant Tesseract.insert:

this.tesseract = Tesseract.set(this.tesseract.cards, this.tesseract.cards.length, card)

and got this error message:

image

I'm guessing this will be a common developer error, is there a way to anticipate what the developer meant and give them a friendlier error message? For example, if the second argument should never be an integer for .set, can we raise an error like:

Tesseract.set must pass a prop name as the second argument, but you passed '1'. 
Did you mean to use Tesseract.insert?

random winner strategy of Automerge.merge

let doc1 = Automerge.change(Automerge.init(), doc => { doc.x = 1 })
let doc2 = Automerge.change(Automerge.init(), doc => { doc.x = 2 })
doc1 = Automerge.merge(doc1, doc2)
doc2 = Automerge.merge(doc2, doc1)
// Now, doc1 might be either {x: 1} or {x: 2} -- the choice is random.
// However, doc2 will be the same, whichever value is chosen as winner.

Hi

I am confused about why Automerge.merge use the random winner strategy when have conflicts.

I think that it could choose value of remote target as winner directly.

merge does not work for two card additions

I created two forks of a store, added card "A" to one and card "B" to the other, but when I merged them together there was only card "A" in the resulting store. Here's the same scenario reproduced with just Tesseract:

Tesseract = require("tesseract")

s1 = Tesseract.init()
s1 = Tesseract.set(s1, "cards", [])

// Create two forks from s1
s2 = Tesseract.insert(s1.cards, 0, { title: "A" })
s3 = Tesseract.insert(s1.cards, 0, { title: "B" })

// Merge them together -- we expect two cards now
s4 = Tesseract.merge(s2, s3)

console.log(s4.cards[0].title) // => "A"
console.log(s4.cards[1])       // => undefined

merging non-sibling stores

What's the expected behavior when merging two non-sibling stores? e.g.

let s1 = new Tesseract.Store()
let s2 = new Tesseract.Store()

s1.root.cards = [{ title: "Card A"}]
s2.root.cards = [{ title: "Card B"}]

s1.merge(s2) // => [{title: "Card B"}]

It works correctly when the stores are both descended from the same store, but when they're not it seems to just clobber the first store. I think it would be fair if that's just not a supported action, but in that case should we throw a warning or error to the developer?

Get path of modified objects in diff?

When an Automerge document is changed as the result of applying changes from a peer, a UI usually needs to be updated. There are two strategies for updating the UI:

  • Rerender the UI
  • Modify the UI that is affected by the changes without rerendering

In the context of web applications using a virtual DOM (such as React), rerendering often works well. The virtual DOM's diffing algorithm can detect what won't change (most of the virtual dom), and will only rerender a subset of the UI.

There are however situations where this approach doesn't work well.

One situation is a UI with several independent text editors that can be collaboratively edited, and also
moved around. For example a card wall application where cards can be dragged around and also edited concurrently.

Consider the case where user A is editing the text in Card X while user B is moving Card X to a different location of the board.

The rerender with virtual DOM approach would detect that the X card has moved, and rerender it. As it's being re-rendered, the text editor component in the old location is removed from the DOM and a new one is created in the new location.

This causes user A to lose focus in their editor, which is not a good user experience.

A better approach would be to move the DOM node with the editor component to the new location, allowing user A to keep typing without losing focus while card X is being moved. For React applications this can be achieved with the technique described here (thanks for the pointer @flash1293), but if the application doesn't use React this poses a new challenge.

The application developer now needs to know what card was moved in the Automerge doc, so they can move the corresponding DOM node. This can be determined by examining the diff returned by Automerge.diff. The flat nature of this format makes this difficult. If the object representing the moved card is located deep inside the document it can be both difficult and slow to discover the old and new path. One would have to traverse the old Automerge doc and match _objectId properties with those in the diff.

Would it be possible to make Automerge return a diff that has more information about the path of the modified objects in a document? A path could be represented as an array of property names and indexes, as described here.

can't get peers from store object

We're logging the Tesseract store's peers in the DocInspector for debugging and introspection, it looks like that doesn't exist anymore in the immutable API:

s1 = Tesseract.init()
s1.peers //=> undefined

immutable API doesn't include common array functions

None of the common array functions (map, forEach) exist on the immutable store objects, for example:

t.set(store, "cards", [])
store.cards.map // => undefined

I'm using map, forEach, find, and filter, and findIndex throughout Trellis. @ept showed me a workaround for now, but we should add those to Tesseract.

Loops also don't seem to work though, and @ept mentioned it might not be possible to make them work with the current implementation:

t.set(store, "cards", [{id: 1, title: { "Hello World"} ])
for(let card in store.cards) { console.log(card) }
// => prints out all of the props on the Proxy object
//    instead of the card object

Behaviour of concurrently created objects under the same key

After some discussion around #1, I've been thinking about what the "minimally confusing" semantics for nested maps/objects should be.

Scenario: Several users have a shared Trellis document. The current version of Trellis does not have any configuration settings, it only stores cards and their positions. However, users have been demanding customisation, and so the developers of Trellis reluctantly release a new version that supports two configuration options for customising the theme of a project: a logo, and a background colour.

User Alice installs the new release, plays around with the configuration, and configures the app to have a custom logo. The code internally does the following:

// On Alice's device
if (!trellis.root.config) trellis.root.config = {};
trellis.root.config.logo_url = "https://example.com/logo.png";

Alice then goes offline. Later, Bob comes online, also installs the app update, and also plays around with the configuration. Since Alice is not online, he doesn't receive the config change made by Alice, so in his view of the data the configuration is still empty. He sets the background colour:

// On Bob's device
if (!trellis.root.config) trellis.root.config = {};
trellis.root.config.background = "blue";

Now Alice comes back online, and the two synchronise. What should happen? It seems like the two most reasonable outcomes are:

  1. The app arbitrarily picks one of the two configurations, i.e. either config: {logo_url: "https://example.com/logo.png"} or config: {background: "blue"}, and stashes away the other configuration under conflicts (where it will probably be ignored).
  2. The app merges together the two sets of configurations, which works fine in this case because Alice and Bob changed two different keys in the config object: config: {background: "blue", logo_url: "https://example.com/logo.png"}.

The current behaviour of Tesseract is (1), except that Tesseract chooses behaviour (2) for the root object. That is, if you used top-level keys rather than a nested config object, Tesseract would retain both:

// On Alice's device
trellis.root.config_logo_url = "https://example.com/logo.png";

// On Bob's device
trellis.root.config_background = "blue";

// Merged outcome:
trellis.root == {config_logo_url: "https://example.com/logo.png", config_background = "blue"}

To me, this merging behaviour seems more intuitive than the choose-one-or-the-other behaviour (1), and it seems strange to treat the root object differently from nested objects, as discussed in #1. Riak also chooses this merging behaviour. However, applying this merging behaviour systematically results in some weird edge-cases, as discussed in our paper:

1608_03960_pdf

I am now trying to figure out what a sensible middle ground might look like. Here is an idea, and I would love to hear what you think.

I suggest that within a map, when a key is assigned an object for the first time time (that key did not previously exist), we record the fact that it was a first-time assignment. If several such first-time assignments occur concurrently, we merge the assigned objects recursively:

// On Alice's device
if (!trellis.root.config) trellis.root.config = {logo_url: "https://example.com/logo.png"};

// Concurrently, on Bob's device
if (!trellis.root.config) trellis.root.config = {background: "blue"};

// Merged outcome:
trellis.root == {config: {logo_url: "https://example.com/logo.png", background: "blue"}, ...}

However, deletions of a higher tree node take precedence over edits within the deleted subtree:

// Initially:
trellis.root == {config: {logo_url: "https://example.com/logo.png", background: "blue"}, ...}

// Alice resets to factory settings
delete trellis.root['config'];

// Concurrently, Bob changes a setting:
trellis.root.config.background = "red";

// Merged outcome, Bob's change having been lost:
trellis.root.config === undefined

Also, insertions of new objects into arrays are always independent from each other:

// Alice:
trellis.root.cards.push({title: "Collect underpants"})

// Bob:
trellis.root.cards.push({title: "Profit!"})

// Merged outcome: either
trellis.root.cards == [{title: "Collect underpants"}, {title: "Profit!"}]
// or
trellis.root.cards == [{title: "Profit!"}, {title: "Collect underpants"}]
// but never merging the two insertions

Unfortunately, merging objects may not always make sense, because the fields of an object may not be independent of each other. For example:

// Alice:
trellis.root.billing_address = {
  org: "Ink & Switch",
  city: "San Francisco",
  state: "CA",
  country: "United States"}

// Bob:
trellis.root.billing_address = {
  org: "Computer Laboratory",
  road: "15 JJ Thomson Avenue",
  city: "Cambridge",
  country: "United Kingdom"}

// Merged outcome, with two addresses nonsensically spliced together:
trellis.root.billing_address == {
  org: "Ink & Switch",
  road: "15 JJ Thomson Avenue",
  city: "San Francisco",
  state: "CA",
  country: "United Kingdom"}

Moreover, it is possible that rather than assigning a value of a single field within an object, a user may update a copy of an object and assign the entire copy. Should this be treated as equivalent to single-field assignment?

// Bob changes a setting by assigning a single field in an object
trellis.root.config.background = "red";

// Is this equivalent to updating a copy of the object?
let config = Object.assign({}, trellis.root.config);
config.background = "red";
trellis.root.config = config;

In a single-user scenario, these two ways of updating the state would be almost equivalent, but in the current Tesseract implementation the second example would overwrite any concurrent configuration updates by other users. Should assigning an object maybe examine the differences between the old and the new object, and translate the change into the assignment of only those fields that have changed?

Sorry for the rambling โ€”ย I don't really know how best to resolve this, just wanted to start a discussion with a few examples. We don't have to answer it right now, but it would be good to keep this question in the back of our minds while working on Trellis, and so I thought it would be useful to have it written up.

Gitter chatroom

It would be nice to have a gitter channel so that we can all collaborate more directly.

Reuse objects that are not modified (structure-sharing)

The proxy objects currently returned by Tesseract have the property that you get a new one every time you look at the data structure, even if the data hasn't actually changed. It would be more efficient to detect which objects have changed and which haven't, and to return a reference-identical (=== operator returns true) object if nothing has changed. This fact could then also be used by React to determine which parts of the UI need to be refreshed.

API for communicating deltas

Tesseract requires an API through which it will integrate with the networking layer. It should provide a clean separation between the network and CRDT concerns, so that Tesseract doesn't need to know anything about WebRTC, and the network library doesn't need to know anything about CRDTs.

The starting point is a vector clock, which is a mapping from store UUIDs to sequence numbers:

vclock = {'1234...': 15, '4321...': 7, ...}

It simply means that we have seen 15 operations that originated on store '1234...', 7 operations that originated on '4321...', etc. Applying an operation that originated on another store does not count as a new operation. Only making a local edit creates a new operation and increments the sequence number for the store on which the edit was made.

The simplest protocol for exchanging deltas would then look something like this:

   Alice:                            Bob:

   alice.                            bob.
   getVClock()                       getVClock()

              \                    /
       Alice's  \                /  Bob's
          vector  \            /  vector
             clock  \        /  clock
       (aliceVClock)  \    /  (bobVClock)
                        \/
                       /  \
                     /      \
                   /          \
                 /              \
alice.         /                  \      bob.
getDeltas( <--'                    `-->  getDeltas(
  bobVClock)                               aliceVClock)

              \                    /
       Alice's  \                /  Bob's
          deltas  \            /  deltas
     (aliceDeltas)  \        /  (bobDeltas)
                      \    /
                        \/
                       /  \
                     /      \
                   /          \
alice.           /              \      bob.
applyDeltas( <--'                `-->  applyDeltas(
  bobDeltas)                             aliceDeltas)

In words, each peer first sends the other peer its vector clock ("this is what I know"); whenever it receives a vector clock from another peer, it generates a response containing any operations that other peer is missing ("this is what I know but you don't know yet"); and when it receives any operations from another peer, it applies them locally ("this is what I have learned from the other node").

This means that we would need three new functions:

  • tesseract.getVClock(store) returns the vector clock corresponding to the current state of store, encoded as a string so that it can be sent over the network to peers.
  • tesseract.getDeltas(store, vclock) takes a vector clock received from another peer, and returns the deltas that should be sent back to that peer, encoded as a string so that it can be sent over the network to peers.
  • tesseract.applyDeltas(store, deltas) takes a string of deltas as generated by getDeltas(), applies them to store, and returns a new store in which the deltas have been applied.

getVClock() and getDeltas() are read-only and don't change the store; only applyDeltas() returns a new copy of the store with changes applied.

So far, so good. The added challenge arises if something has got screwed up somehow, so that (for example) one node thinks that sequence number 16 for node '1234...' means "assign card 1 to martin", whereas another node thinks that sequence number 16 for node '1234...' means "rename card 3 to foo". That would not merely be a conflict (because conflicting operations should always have different origin nodes), but a violation of the protocol.

This situation could happen, for example, if node '1234...' first generated the "assign card 1 to martin" operation and sent it to another node; then node '1234...' failed and was restored from a backup, and that backup only went up to sequence number 15; then the user performed the "rename card 3 to foo" action, and node '1234...' unwittingly reused the sequence number 16 to mean something different.

The first step towards solving this problem is to detect it. I propose doing that by including a hash of the operations in the encoded vector clock, and then getDeltas() and applyDeltas() can check that the sequence numbers and hashes match. Once a mismatch has been detected, we would need some recovery protocol that figures out exactly where the mismatch has occurred, and how to resolve it. My proposal would be to leave that recovery protocol out of scope for now, so we will merely detect a screwed-up state and raise an error. In future, we can try to automatically resolve it.

To be clear, this kind of inconsistency should not happen in normal operation (including during network partitions), but only happen due to bugs or amnesia (a device doing something and then forgetting that it did it). If we just raise it as an error for now, we can observe how often it actually occurs in practice, and solve it if it's enough of a problem.

Can undo/redo operation be provided in automerge?

Hi. I'm considering using automerge to implement a collaborative text based edit editor. automerge provides general purpose format(JSON), I think it is appropriate to represent the editor's documentation.

And I have two questions.

  • features like undo/redo are required in the editor. Can we implement undo/redo using automerge?
  • Is there anything else to consider to implement a text-based editor using automerge?

Thank you for wonderful work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.