heckj / crdt Goto Github PK

View Code? Open in Web Editor NEW

143.0 2.0 3.0 15.29 MB

Conflict-free Replicated Data Types in Swift

Home Page: https://swiftpackageindex.com/heckj/CRDT/main/documentation/crdt

License: MIT License

Swift 98.06% Shell 1.94%

crdt-implementations swift crdt crdts

crdt's Introduction

CRDT

An implementation of ∂-state based Conflict-free Replicated Data Types (CRDT) in the Swift language.

Overview

This library implements well-known state-based CRDTs as swift generics, sometimes described as convergent replicated data types (CvRDT). The implementation includes delta-state replication functions, which allows for more compact representations when syncing between collaboration endpoints. The alternative is to replicate the entire state for every sync.

The CRDT API documentation is hosted at the Swift Package Index.

G-Counter (grow-only counter)
PN-Counter (A positive-negative counter)
LWW-Register (last write wins register)
G-Set (grow-only set)
OR-Set (observed-remove set, with LWW add bias)
OR-Map (observed-remove map, with LWW add or update bias)
List (causal-tree list)

For more information on CRDTs, the Wikipedia page on CRDTs is quite good. I'd also suggest the website CRDT.tech as a wonderful collection of further resources. The implementations within this library were heavily based on algorithms described in Conflict-free Replicated Data Types by Nuno Preguiça, Carlos Baquero, and Marc Shapiro (2018), and heavily influenced/sourced from the package ReplicatingTypes, created by Drew McCormack, used under license (MIT).

What's Different about this Package

The two most notable change from Drew's code are:

consistently exposing the type used to identify the collaboration instance (be that person, process, or machine) as a generic type
adding explicit delta-state transfer mechanisms so that you didn't need to transfer the entirety of a CRDT instance to another location in order to merge the data.

Like the ReplicatingTypes package, this package is available under the MIT license for you to use as you like, asking only for recognition that it was sourced.

If your goal is creating local-first software, this implementation is start, but (in my opinion) incomplete to those needs. In particular, there are none of the serialization optimizations included that would reduce the space needed by the instances when serialized in their entirety to be stored. There are also none of the optimizations that other libraries (for example Automerge or Yjs) that improve memory overhead needed to support longer-form collaborative text interactions.

These limitations may change in the future, and contributions are welcome.

Alternative Packages and Libraries

Other Swift implementations of CRDTs:

https://github.com/appdecentral/replicatingtypes
- related article: Conflict-Free Replicated Data Types (CRDTs) in Swift
https://github.com/bluk/CRDT
https://github.com/jamztang/CRDT
https://github.com/archagon/crdt-playground
- related article: Data Laced with History: Causal Trees & Operational CRDTs
Objc.io video series: CRDTs – Introduction

Two very well established CRDT libraries used for collaborative text editing:

Automerge
- (video) CRDTs: The Hard Parts by Martin Kleppmann)
Y.js (and its multi-language port Y-CRDT)
- Yrs data structure internals: https://bartoszsypytkowski.com/yrs-architecture/

Optimizations

Articles discussing tradeoffs, algorithm details, and performance, specifically for sequence based CRDTs:

Benchmarks

Running the library:

swift run -c release crdt-benchmark library run Benchmarks/results.json --library Benchmarks/Library.json --cycles 5 --mode replace-all
swift run -c release crdt-benchmark library render Benchmarks/results.json --library Benchmarks/Library.json --output Benchmarks

Current Benchmarks

There's also stubbed benchmarks using package-benchmark under the ExternalBenchmarks directory. These additional benchmarks are primarily one-dimensional and DO require that additional libraries are installed (jemalloc) in order for them to operate. If you just want to explore, the .devContainer setting in this repository includes that library - so it's easy to trial this out from within VSCode and Docker. To explore the 1-dimension external benchmarks:

cd ExternalBenchmarks
swift package benchmark

crdt's People

Contributors

Stargazers

Watchers

Forkers

finestructure mkll hassila

crdt's Issues

add initializer to ORMap that takes a dictionary

API docs for ORMap act as though they're Set. Copy/pasta errors...

(more) performant memory storage

Look into using one of the newer pieces of swift collections, specifically the pending-1.1-release HashTreeCollections, which provides an optimized append-only structure that sounds like it would be lovely for the backing store of operations. (right now, it's just using Set<> which, while functional, might not provide the best operations for scaled up sets of data.

To verify the performance aspects, I think I'd want to bolster the benchmarks applied here, and seriously consider some of the classic CRDT benchmark structures, such as the string-sequence one that Martin Kleppmann (encoded at https://github.com/automerge/automerge-perf)

ORSet and GSet need more replication verification tests

While looking at sizing numbers, I ran across the following delta's and numbers (GSet and ORSet):

let orset_1 = ORSet(actorId: UInt(31), [1, 2, 3, 4])
let orset_2 = ORSet(actorId: UInt(31), [4, 5])

print("ORSet1(4 elements, UInt actorId) = \(orset_1.sizeInBytes())")
print("ORSet2(2 elements, UInt actorId) = \(orset_2.sizeInBytes())")
print("ORSet1's state size: \(orset_1.state.sizeInBytes())")
print("ORSet2's state size: \(orset_1.state.sizeInBytes())")
print("Delta size of Set1 merging into Set2: \(orset_2.delta(orset_1.state).sizeInBytes())")
print("Delta size of Set2 merging into Set1: \(orset_1.delta(orset_2.state).sizeInBytes())")

Delta Numbers for GSet:

GSet1(4 elements, UInt actorId) = 64
GSet2(2 elements, UInt actorId) = 40
GSet1's state size: 48
GSet2's state size: 48
Delta size of Set1 merging into Set2: 24
Delta size of Set2 merging into Set1: 40

Delta Numbers for ORSet:

ORSet1(4 elements, UInt actorId) = 116
ORSet2(2 elements, UInt actorId) = 66
ORSet1's state size: 16
ORSet2's state size: 16
Delta size of Set1 merging into Set2: 0
Delta size of Set2 merging into Set1: 50

The 0 in set1 merging into set2 is a notable issue - means something in the delta/state computation logic is screwed up.
ORSet has fairly complicated state, delta, and diffing logic - so best to work the corners there and verify its all operating as expected.

resolve the interleaving anomoly

Reference paper: https://martin.kleppmann.com/papers/interleaving-papoc19.pdf

It's a (common) corner case to simultaneous merges.

investigate using benchmark package

Package-benchmark (ref: https://forums.swift.org/t/package-benchmark-0-6-0-released/63103) release 0.6 recently, and didn't exist when I was first starting this. I'm currently using collections-benchmark, but I'd like to investigate using this newer one for the one-dimensional benchmarks that make sense - and I'm insanely curious about the capability that might be available with the histogram built-in - with maybe providing visual insights into distributions of calls.

add CustomStringConvertible to types...

Debugging the internals is kind of messy and verbose when using the default string interpolation, so I think there'd be some benefit to adding CustomStringConvertible conformance to a number of the types, especially around state and delta generation for the CRDTs.

create an iCloud file sync example showing the use of CRDT

Partially to show how to use the library, but also to work out more kinks - dig around in iCloud/CloudKit and set up an iCloud-based app that synchronizes a file and load/store the CRDT information from that - including reloading and merging on updates, and if that works and doesn't step over each other as two collaborators edit a single, simple document.

support retaining the full history for LWWRegister, ORMap, and ORSet

Currently, the history is optimized out and dropped on delta updates (or more specifically, generating a delta for these types only includes the most recent value - not all the possible changes that have happened since a previous update).

However, if you want to be able to view the state of the CRDT at any arbitrary point in its history, then that additional detail becomes worth maintaining, even thought the overall CRDT size can become notably larger.

The current logic would require a few updates to maintain this history, but it should be pretty directly doable as an initialization option.

Is the constraint on List.T conforming to Comparable necessary?

Apologies if this isn't the right venue to ask the question, but is it really necessary for the elements of List to be comparable? I don't see anywhere that the elements are being compared against one another, and commenting out the constraint doesn't prevent compilation or cause tests to fail.