rbmhtechnology / eventuate Goto Github PK

Global-scale event sourcing and event collaboration with causal consistency (This project is in maintenance mode. Only critical bugs will be fixed, but there is no more feature development.).

Home Page: http://rbmhtechnology.github.io/eventuate/

License: Apache License 2.0

Scala 88.00% Shell 0.35% Python 0.57% HTML 1.30% CSS 0.05% JavaScript 0.37% Java 9.35%

eventuate's Introduction

Eventuate

Please note: This project is stopped, no bugfixes, features, etc. are done.

Eventuate is a toolkit for building applications composed of event-driven and event-sourced services that communicate via causally ordered event streams. Services can either be co-located on a single node or distributed up to global scale. Services can also be replicated with causal consistency and remain available for writes during network partitions. Eventuate has a Java and Scala API, is written in Scala and built on top of Akka, a toolkit for building highly concurrent, distributed, and resilient message-driven applications on the JVM. Eventuate

provides abstractions for building stateful event-sourced services, persistent and in-memory query databases and event processing pipelines
enables services to communicate over a reliable and partition-tolerant event bus with causal event ordering and distribution up to global scale
supports stateful service replication with causal consistency and concurrent state updates with automated and interactive conflict resolution
provides implementations of operation-based CRDTs as specified in A comprehensive study of Convergent and Commutative Replicated Data Types
supports the development of always-on applications by allowing services to be distributed across multiple availability zones (locations)
supports the implementation of reliable business processes from event-driven and command-driven service interactions
supports the aggregation of events from distributed services for updating query databases
provides adapters to 3rd-party stream processing frameworks for analyzing event streams

Documentation

Project

Community

Mailing list

eventuate's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger lemonhall mronyancha fijolekprojects eventuate-test lavoiemathieu linearregression magnusart ianclegg kongo2002 kakamessi99 scalamx horusiath odd pazthor jrieks narayana-glassbeam yxtwang roy-luoll macressler devsprint helena stephanfeb speedcom jrudolph wq19880601 mindis kunalkanojia rasheedamir hzhou1 morsch nattyddubbs davidtegtmeyer christiandeger miguelperalvo yulawone magro danbim gcc2ge hustnn agido-hundt stavich stevenchen3 gabrielgiussi ecompositor bensnw dispalt seakayone princessd8251 coding4m ahmedriza inoio tombeck carrot-garden vaquarkhan pchalcol bekiroguz mikem zhouliang3 mehdimas mslinn christian-schlichtherle alexandrabinca yuvaltab alvincjin fengweijp alexxnica kryndex wqhenry springfred lin0244 nangal etacassiopeia intfrr ferrisyang krasserm etsangsplk dongyt2004 hypnotranz kunlqt dragonsqn cotyar huiwenhan jacobpovar i845555 coolias hangshisitu gollen833 zanjichao maclezhou rizwanbinzahir tristanhall spidy-sec puwei0000 epikweb svnlab

eventuate's Issues

Architecture documentation

Event replication with custom filters

One event-sourced actor per CRDT instance

At the moment, there is one EventsourcedActor per CRDT type. While this is sufficient for CRDTs that don't require precondition checks atSource it limits write throughput for those that require these checks.
In the latter case, the EventsourcedActor must have set stateSync to true which more or less turns off write batching for a single actor.

By distributing CRDT instances of a given type across actors, their concurrent writes can be batched again by BatchingEventLog, increasing write throughput. The cost is more memory overhead per CRDT instance. Ideally, the user can configure whether to use the one-actor-per-CRDT-instance or the (current) one-actor-per-CRDT-type approach.

Another advantage of having one EventsourcedActor per CRDT instance is that they can be loaded and discarded from memory on demand. With a reasonably small number of updates per CRDT instance (e.g. < 500) snapshots may not be necessary either.

Protobuf serializer for replication protocol messages

Avoid redundant reads from source log during replication

Scale write to a single replicated aggregate

This can be achieved by allowing an EventsourceActor (= aggregate) to consume from multiple logs and only write to one of these logs. Aggregate replicas with the same aggregate id write to different logs but consume from all of them, so that distributed writes to the same aggregate id don't need to be merged into a single log. For example, with three (optionally replicated) logs, A, B and C and three aggregate replicas r1, r2 and r3:

r1 writes to A and reads from A, B and C
r2 writes to B and reads from A, B and C
r3 writes to C and reads from A, B and C

This means that replicas ri need to be recovered from three logs (instead of one) and deterministic (= totally ordered) replay is not possible any more. However, by re-ordering concurrently replayed events based on their vector timestamps, causal ordering can be achieved.

User guide

A rewrite of the old user guide

Convenient event sequence persistence

See this discussion thread.

BasicReplicationSpec failed

Build 112

Rename abstract log member in EventsourcedActor

Eventsourced#log name collides with ActorLogging#log.

Isolate different CRDT service types if they operate on a shared log

This is only relevant if they are operated on the same event log.

Topic-based event routing

#45 introduces basic routing capabilities. This should be extended to allow

producers to publish events to application-defined topics and
consumers to consume events from application-defined topics

Topic-based event routing must preserve causality of events and should be possible for event routing between actors that share a single, distributed log. Routing across different logs can be done with processors.

Furthermore, it should be investigated if routing decisions must be made persistent, so that past routings are repeatable. On the other hand, a changed routing rule is somewhat equivalent to changed event handler logic which isn't persistent either.
#130 could be used as implementation basis.

Protobuf serializer for DurableEvent storage

Document current limitations

Batch update notifications

Update notifications from event log to replication servers are too fine-granular at the moment. Send update notifications in larger batches, especially when load is high. Under low and moderate loads, update notifications should dynamically become more fine granular. This shall significantly reduce the number of update notifications sent from a replication server to replication clients over the wire.

Automatically create copyright headers during compile

EventsourcedActorIntegrationSpec failure

[info] Eventsourced actors and views
[info] - must support conditional command processing *** FAILED ***
[info]   java.lang.AssertionError: assertion failed: expected a, found delayed-2
[info]   at scala.Predef$.assert(Predef.scala:165)
[info]   at akka.testkit.TestKitBase$class.expectMsg_internal(TestKit.scala:339)
[info]   at akka.testkit.TestKitBase$class.expectMsg(TestKit.scala:315)
[info]   at akka.testkit.TestKit.expectMsg(TestKit.scala:718)
[info]   at com.rbmhtechnology.eventuate.EventsourcedActorIntegrationSpec$$anonfun$2$$anonfun$apply$mcV$sp$8.apply$mcV$sp(EventsourcedActorIntegrationSpec.scala:237)
[info]   at com.rbmhtechnology.eventuate.EventsourcedActorIntegrationSpec$$anonfun$2$$anonfun$apply$mcV$sp$8.apply(EventsourcedActorIntegrationSpec.scala:214)
[info]   at com.rbmhtechnology.eventuate.EventsourcedActorIntegrationSpec$$anonfun$2$$anonfun$apply$mcV$sp$8.apply(EventsourcedActorIntegrationSpec.scala:214)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer$$anonfun$apply$1.apply(Transformer.scala:22)
[info]   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
[info]   ...

Custom serializer for DurableEvent payload

Getting started guide

Replication duplicates introduce sequence number gaps

API documentation

Initial reference documentation

One aggregate per actor pattern

Efficiently support the pattern to have one DDD Aggregate per EventsourcedActor. The example application still follows the many aggregates per actor pattern but it has already been re-written by @remcobeckers to follow the one aggregate per actor pattern (see also this discussion). Eventuate should support efficient implementations of this pattern by providing the following enhancements:

the event log must be indexed by aggregate id for efficient recovery of individual aggregates
if an aggregate actor writes an event, the event log should only notify those aggregate actor replicas with the same aggregate id (and not all other actors subscribed to the event log). Nevertheless, EventsourcedViews or non-aggregate EventsourcedActors continue to receive all events written to the event log (to support the one view for many actors pattern, for example).
~~implement EventsourcedAggregate as specialization of EventsourcedActor to support implementing the one aggregate per actor pattern~~. Update: EventsourcedActor has an optional aggregateId.
consider implementing aggregate services similar to CRDT services to free applications from dealing with low-level details.
the one aggregate per actor prototype is already implemented in a way that vector clock sizes scale with the number of replicas only instead of the potentially large number of aggregates. Ensure that an implementation of EventsourcedAggregate cannot break this property.

Setup documentation infrastructure

reStructuredText
Sphinx
Automated deployment of generated docs and scaladocs from Travis CI

More detalied copyright & license header

At the moment the copyright and licence header on top of every source file is very sparse.
We should add the URL to the Red Bull Media House in the copyright and also add the licence text that is proposed in the Apache2 license.

Full header that should be applied:

Copyright (C) 2015 Red Bull Media House GmbH <http://www.redbullmediahouse.com> - all rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); 
you may not use this file except in compliance with the License. 
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed 
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES 
OR CONDITIONS OF ANY KIND, either express or implied. 
See the License for the specific language governing permissions and limitations under the License.

Make LevelDB disk sync configurable

Replication/Transport encryption

As EventSourcedActors are most likely deployed in two different locations and the log gets replicated over the internet it would be very beneficial to have encryption between two endpoints.

Failure detector based on replication protocol heartbeat

Avoid redundant stash/unstash when operating eventsourced actor in sync=true mode

This fixes an O(n*n) stash/unstash overhead issue where n = number of messages in the actor mailbox.

Separate unit- from integration- and multi-jvm tests

Optimize delay between batch replications

Trigger another batch replication immediately if previous replication had maximum batch size.

Akka remote configuration settings ignored

Setting of actor ref provider and enabled transports can be overridden because the load order of all reference.conf on the classpath is not defined. Eventuate applications must set these explicitly in application.conf.

Setup multi-JVM test infratsructure

EventsourcedActor should use isolated internal stash

This is needed to avoid interference of internal stash operations with user stash operations.

Multiple logs per replication endpoint

A replication endpoint should support the replication of multiple logs. Here, each log represents a partition and causality is preserved only within a partition.

In the following example, two partitions, named "log1" and "log2", are created for the local replication endpoint and connections are established to remote replication endpoints listening at 127.0.0.1:2553 and 127.0.0.1:2554. The system expects that remote endpoints also have partitions with the same names.

val logNames = Set("log1", "log2")
val connections: Seq[ReplicationConnection] = Seq(
  ReplicationConnection("127.0.0.1", 2553),
  ReplicationConnection("127.0.0.1", 2554))

new ReplicationEndpoint("my-endpoint-id", logNames, id => 
  LeveldbEventLog.props(id), connections)(system)

Log partitioning is not only useful for partitioning events (for example, to support multiple tenants) but can also be used to scale writes (assuming this is also supported by the underlying storage). However, having multiple logs per replication endpoint is not a prerequisite for horizontal scaling; this can also be achieved by deploying logs (i.e. partitions) on different machines, each having its own replication endpoint, for example.

Idempotent processing of replicated events

Detect and ignore duplicates
Atomic write of replication progress and events

Batching layer on top of event log

For batching writes of

a single or multiple event sourced actors running with sync = true
multiple event sourced actors running with sync = false

The batching layer must be optimized for latency with growing batch sizes (up to a maximum) under increasing load.

The batching layer has no effect when running a single event sourced actor running with sync = true because the actor submits the next write when previous write finished.

API cleanup

Clean separation of public from private API
Drop backwards compatibility to initial prototype API

Operation-based CRDTs

Support operation-based CRDTs as described in A comprehensive study of Convergent and Commutative Replicated Data Types.

An initial implementation should support:

Counter (specification 5)
ORSet (specification 15)
MV-Register (op-based specification not part of paper)

Authentication for replication

As replication can happen over the internet authentication for replication connections should be supported.

Pull requests builds on Travis CI fail

Custom event routing

After #36 is implemented, events are routed to Eventsourced actors (= destinations) based on the following default rules:

if destination.aggregateId is not defined, the destination will receive all events regardless whether event.aggregateId is defined or not.
if destination.aggregateId is defined, the destination will only receive events with matching event.aggregateId.

This shall be generalized to allow custom routing to destinations. When persisting events, EventsourcedActors should be able to specify a set of additional aggregate ids as routing destination. The default routing behavior as described above should be preserved.

An EventsourcedActor should also be able to opt-out from routing of events to replicas with the same aggregate id (by specifying an empty set). In this case, the only consumer of the event is the emitting EventsourcedActor. Routing of events to destination that have no aggregate id defined cannot be overriden. These destination must always receive all events from the event log (but can of course choose to ignore them).

Overview documentation

Limit vector clock size

Vector clock size should scale with the number of actual event collaborators and not with the total number of EventsourcedActors on the same distributed log. In other words, only those events that are actually handled by an EventsourcedActorcontribute to its vector clock.

Build
Sphinx installation
Contributing
...