Git Product home page Git Product logo

bbva / qed Goto Github PK

View Code? Open in Web Editor NEW
94.0 10.0 20.0 4.94 MB

The scalable, auditable and high-performance tamper-evident log project

Home Page: https://qed.readthedocs.io/

License: Apache License 2.0

Go 92.75% Shell 2.32% HCL 2.28% Python 0.13% C++ 0.68% Dockerfile 0.20% C 1.65%
merkle-tree sparse-merkle-tree cryptography forensics latin byzantine-failures tamper-evident verifiable-data-structures verify lsm-tree

qed's Introduction

QED - Scalable, auditable and high-performance tamper-evident log

User Documentation Status Build Status Azure DevOps coverage GoReport GoDoc
https://raw.githubusercontent.com/BBVA/qed/master/docs/source/_static/images/qed_logo_small.png

QED is an open-source software that allows you to establish trust relationships by leveraging verifiable cryptographic proofs.

It can be used in multiple scenarios:

  • Data transfers.
  • System (or application or business) logging.
  • Distributed business transactions.
  • Etc.

QED guarantees that the system itself, even when deployed into a non-trusted server, cannot be modified without being detected. It also provides verifiable cryptographic proofs in logarithmic relation (time and size) to the number of entries.

QED is scalable, resilient and ops friendly:

  • Designed to manage billions of events per shard
  • Over 2000 operations per second per shard under sustained load
  • Consistent replication through RAFT
  • Operable and instrumented with dozens of metrics
  • Zero config files, fully documented single binary

Documentation

You can find the complete documentation at: Documentation

Project code

You can find the project code at Github

Authors

QED was made by Hyperscale BBVA-Labs Team.

License

QED is Open Source and available under the Apache 2 license.

Contributions

Contributions are very welcome. See docs/source/contributing/contributing.rst or skim existing tickets to see where you could help out.

qed's People

Contributors

aalda avatar cr0hn avatar gdiazlo avatar iknite avatar jbpratt avatar panchoh avatar suizman avatar tuxillo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

qed's Issues

Feature: bulk inserts

Tasks:

  • Api: new handler.
  • Api: new protocol definitions.
  • Client: AddBulk
  • Wal: new command.
  • Wal: Apply method.
  • Fsm: Apply method.
  • Fsm: Benchmarking.
  • Balloon: Apply method.
  • Balloon: New struct "snapshotBulk".
  • Balloon: Benchmarking.
  • Riot: update.
  • Improve error handling.

Unify cli commands

We should use a single base command for every interaction with the server, Therefore server command should be integrated into qed.

The sub-command hierarchy would be:

qed
  |-- start
  |-- stop
  |-- client
            |-- add
            |-- membership
  |-- auditor

start sub-command will include flags to enable profiling and tampering endpoints, and also for launching the server in background (-d):

qed start -d --profiling --tampering

QED client library topology discovery timeout

The configuration parameter DiscoveryTimeout is not used inside client.discover(). When creating a client with the default options, changing that timeout and setting an incorrect endpoint, the client creation hangs on the discover operation instead of timing out.

Investigate performance drop

Seems the latest Go version give us at least a 10% performance impact in unit benchmarks, which is amplified in e2e benchmarks.

We need to read the changes from one verstion to another. We also need to identify which parts of our code are causing this and analyze if we can change anything to mitigate the effect.

Simplify Position interface

We should reduce the number of exposed methods included in Position abstraction because it should be designed to be used outside the trees. For instance, methods like ShouldBeCached, Key or Height could be internal to each tree implementation.

Pruners should return errors

We are swallowing some errors or failing with panics when building the Prune() methods are executed to build the pruned trees.

Prune() common.Visitable

We should change the method signature to return some errors that should be handled at upper layers.

Prune() (common.Visitable, err error)

Implement a server component

We should bind the start and stop operations of all HTTP endpoints together under a single struct. Also, this component should be able to react to SIGTERM signals and do a graceful shutdown.

Design: Start() and Stop()/Shutdown()

I think any component with Start() and a Stop() methods should make those methods non-blocking. We would need to block only those without the Stop() method.

The start should create a goroutine and the stop should end it. Without any leaking, and without the user of the API knowing the internals used to do that.

Thoughts?

Remove index table

We are using the index table to map from event hashes to history tree versions, but that responsibility should be exclusive of the hyper tree, given that now, it stores the raw version in the shortcut leaves.

In this manner, we could eliminate the need for using another table to support fast mappings. With this change, every membership operation must query first the hyper tree before generating the audit path from the history tree, and thus, incurs in a latency penalty. However, given that the hyper tree is the only one that holds a lock for queries, in theory, it shouldn't reduce balloon's throughput.

This change helps to reduce space and write amplification in storage.

Implement incremental proofs

Now that we have completed a draft implementation of the membership query and its verification process, we are ready to undertake the generation of incremental proofs in order to verify the temporal consistency of a sequential flow of events.

The history tree is in charge of generating incremental proofs P between commitments Ci and Cj, where i <= j:

P <- H.IncGen(Ci, Cj).

Once the client has received the proof, he should be able to verify that the proof proves that Cj fixes every event fixed by the recomputed C'i (where i <= j):

{accept, reject} <- P.IncVerify(C'i, Cj)

Tasks:

  • E2E tests
  • Extend HTTP API
  • Implement query functionality in Balloon
  • Implement verify functionality in Balloon
  • Extend client to include incremental query
  • Extend client to include verification

Unable to Query Membership concurrently

When we try to concurrently Query Membership we receive the fowling error:

2018/10/03 17:42:50 http: panic serving [::1]:59546: d.nx != 0
goroutine 400165 [running]:
net/http.(*conn).serve.func1(0xc022b49720)
/usr/local/go/src/net/http/server.go:1746 +0xd0
panic(0x8edd00, 0xa4e670)
/usr/local/go/src/runtime/panic.go:513 +0x1b9
crypto/sha256.(*digest).checkSum(0xc00fcc37c8, 0x0, 0x0, 0x0, 0x0)
/usr/local/go/src/crypto/sha256/sha256.go:253 +0x1db
crypto/sha256.(*digest).Sum(0xc0000a6200, 0x0, 0x0, 0x0, 0xb, 0x0, 0x0)
/usr/local/go/src/crypto/sha256/sha256.go:229 +0x69
github.com/bbva/qed/hashing.(*Sha256Hasher).Do(0xc000066470, 0xc011b5e6a0, 0x1, 0x1, 0xc00fc52908, 0x0, 0x0)
/home/spark/go/src/github.com/bbva/qed/hashing/hash.go:74 +0xb5
github.com/bbva/qed/balloon.Balloon.QueryMembership(0x186a0, 0x9e4ba0, 0x7f807c8f7360, 0xc005510048, 0xc0000af500, 0xc00551c1c0, 0xa55e20, 0xc000066470, 0xc01158ec50, 0xb, ...)
/home/spark/go/src/github.com/bbva/qed/balloon/balloon.go:222 +0xe1
github.com/bbva/qed/raftwal.BalloonFSM.QueryMembership(0x9e4ba0, 0xa59cc0, 0xc005510048, 0xc0000af740, 0xc01c1a6300, 0x0, 0x0, 0x0, 0xc01158ec50, 0xb, ...)
/home/spark/go/src/github.com/bbva/qed/raftwal/fsm.go:89 +0x87
github.com/bbva/qed/raftwal.RaftBalloon.QueryMembership(0x7ffca83d569c, 0x11, 0x9ba7b3, 0x5, 0xc0000240e2, 0x9, 0xc000116000, 0xc000158400, 0xc00553a1b0, 0xc00000c940, ...)
/home/spark/go/src/github.com/bbva/qed/raftwal/raft.go:408 +0x87
github.com/bbva/qed/api/apihttp.Membership.func1(0xa55de0, 0xc011b5e660, 0xc0170c6500)
/home/spark/go/src/github.com/bbva/qed/api/apihttp/apihttp.go:167 +0x1e1
net/http.HandlerFunc.ServeHTTP(0xc00000c680, 0xa55de0, 0xc011b5e660, 0xc0170c6500)
/usr/local/go/src/net/http/server.go:1964 +0x44
github.com/bbva/qed/api/apihttp.AuthHandlerMiddleware.func1(0xa55de0, 0xc011b5e660, 0xc0170c6500)
/home/spark/go/src/github.com/bbva/qed/api/apihttp/apihttp.go:246 +0xc5
net/http.HandlerFunc.ServeHTTP(0xc0000664d0, 0xa55de0, 0xc011b5e660, 0xc0170c6500)
/usr/local/go/src/net/http/server.go:1964 +0x44
net/http.(*ServeMux).ServeHTTP(0xc005531740, 0xa55de0, 0xc011b5e660, 0xc0170c6500)
/usr/local/go/src/net/http/server.go:2361 +0x127
github.com/bbva/qed/api/apihttp.LogHandler.func1(0xa56b20, 0xc01c6b7960, 0xc0170c6500)
/home/spark/go/src/github.com/bbva/qed/api/apihttp/apihttp.go:290 +0xda
net/http.HandlerFunc.ServeHTTP(0xc00000c6c0, 0xa56b20, 0xc01c6b7960, 0xc0170c6500)
/usr/local/go/src/net/http/server.go:1964 +0x44
net/http.serverHandler.ServeHTTP(0xc000132a90, 0xa56b20, 0xc01c6b7960, 0xc0170c6500)
/usr/local/go/src/net/http/server.go:2741 +0xab
net/http.(*conn).serve(0xc022b49720, 0xa571a0, 0xc025696f00)
/usr/local/go/src/net/http/server.go:1847 +0x646
created by net/http.(*Server).Serve
/usr/local/go/src/net/http/server.go:2851 +0x2f5

It seams to be related we are sharing the same hasher for each client.

Pointer receivers clean up

We should use pointers receivers with an * when the method modifies something of the data structure, otherwise, do not use *.

type A struct {
    value int
}
func (a *A) Add(x int) { ... }
func (a A) Show() {...}

This way we make sure we do not modify our structures when we don't want.

Centralize bash scripts in `/scripts` directory

All the scripts now live in /tests directory and since we are using it as a canonical way to launch environments sometimes outside tests scope (QA, Performance...) I believe it would be worthy to give them a proper directory.

HyperTree: bug adding nodes.

Given the following test in "hyper>tree_test.go":

func TestAdd(t *testing.T) {

	testCases := []struct {
		eventDigest      []byte
		expectedRootHash []byte
	}{
		{[]byte{0x0}, []byte{0x0}},
		{[]byte{0x1}, []byte{0x1}},
		{[]byte{0x2}, []byte{0x3}},
		{[]byte{0x3}, []byte{0x0}},
		{[]byte{0x4}, []byte{0x4}},
		{[]byte{0x5}, []byte{0x1}},
		{[]byte{0x6}, []byte{0x7}},
		{[]byte{0x7}, []byte{0x0}},
		{[]byte{0x8}, []byte{0x8}},
		{[]byte{0x9}, []byte{0x1}},
	}

	hasher := new(hashing.XorHasher)

	leaves, close := openBPlusStorage()
	defer close()
	cache := cache.NewSimpleCache(2)
	tree := NewFakeTree(string(0x0), cache, leaves, hasher)

	for i, c := range testCases {
		index := make([]byte, 8)
		binary.LittleEndian.PutUint64(index, uint64(i))

		rh, err := tree.Add(c.eventDigest, index)
		assert.Nil(t, err, "Error adding to the tree: %v", err)
		assert.Equal(t, c.expectedRootHash, rh, "Incorrect root hash for index %d", i)
	}

}

And printing insertions until test 2:

;;;;
: [0 0 0 0 0 0 0 0 0] [0 0 0 0 0 0 0 0]
;;;;
: [0 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0]
: [1 0 0 0 0 0 0 0 0] [1 0 0 0 0 0 0 0]
;;;;
: [0 0 0 0 0 0 0 0 0] [2 0 0 0 0 0 0 0]
: [1 0 0 0 0 0 0 0 0] [2 0 0 0 0 0 0 0]
: [2 0 0 0 0 0 0 0 0] [2 0 0 0 0 0 0 0]

It happens that when inserting a new index in hyper-tree, the function "tree.fromStorage" returns the same value to all existing leafs.

Unify randomBytes function in testutils

Currently, we have randomBytes function in every test file where it's necessary. We should unify them in a single function placed under testutils/rand package.

Benchmark: Raft behavior while testing Membership() throughput

We need to measure the system write/query performance when using raft replication, and querying only 1 follower (instead of both).

We should consider the following scenarios:
Multi node:

  • One Leader & two followers{1..2}
  • Start the cluster
  • Preload N events to the leader
  • Query Membership of the N events to follower1
  • Perform continuous write load on the leader
  • At the same time, perform Query Membership of the previous N events ONLY to follower 1
    (notice that follower 2 is query-load free)

QED input parameter check

When setting up URLS from the command line we need to make sure those are well formed URLs, instead of passing random strings up to the libraries API.

This is quite annoying when we require to use http:// in and endpoint definition instead of hostname:port, and the program does not fail whatever you put in.

We must check all parameters properly.

Separate integration tests from unit tests

Identify and segregate tests by their purpose: unit, integration, etc. For instance, balloon_test.go file currently contains both unit and integration tests. The later should be moved to another package under tests folder.

Improve cli error handling

I've detected whenever we use the cli and it returns an error, Cobra always returns the usage message. Even when we pass the right parameters.

How to reproduce:

$ go run ../main.go -k pepe client membership --key 0 --version 1 -l info     
                                       
QedClient: 2018/10/11 11:00:05.575712 /home/spark/go/src/github.com/bbva/qed/cmd/client_membership.go:53: Querying key [ 0 ] with version [ 1 ]
Error: Unexpected server error
Usage:
  qed client membership [flags]

Flags:
  -h, --help                   help for membership
      --historyDigest string   Digest of the history tree
      --hyperDigest string     Digest of the hyper tree
      --key string             Key to query
      --verify                 Do verify received proof
      --version uint           Version to query

Global Flags:
  -k, --apikey string     Server api key
  -e, --endpoint string   Endpoint for REST requests on (host:port) (default "http://localhost:8080")
  -l, --log string        Choose between log levels: silent, error, info and debug (default "error")

exit status 255

We've seen an possible solution in this Cobra issue

Move event hashing out of balloon internals

Currently, inserted events get hashed in the Add method before inserting into both trees. This means that the raw event, whose size could be quite large compared with a 32B hash, is first stored in the WAL and then replicated to other nodes of the Raft cluster. We can avoid these unnecessary storage space and network traffic penalties by allowing Balloon to also accept event hashes instead of raw events in the Add method and hashing the events at the HTTP layer, before applying them to the WAL.

Change QueryMembership signature

Currently, QueryMembership method in the hyper tree looks like this:

func (t *HyperTree) QueryMembership(eventDigest hashing.Digest) (proof *QueryProof, err error)

When the event doesn't exist, it returns an ErrKeyNotFound. But this type of error shouldn't rise to the upper Balloon's layer. It will be more convenient to change the hyper method's signature to hide this kind of errors and return a bool in case of non-existence:

func (t *HyperTree) QueryMembership(eventDigest hashing.Digest) (proof *QueryProof, answer bool, err error)

With this change, we wouldn't need to check the length of the proof in Balloon in order to set the Exists flag:

if len(hyperProof.Value) > 0 {
		proof.Exists = true
		proof.ActualVersion = util.BytesAsUint64(hyperProof.Value)
	}

Remove balloon/storage package

Remove balloon/storage package and move its functionality to a balloon/storage.go inside ballon package.

We don't need a package there anymore, as the interface implementations don't need to be aware of it.

Improve component API design

We need to state clear boundaries between components and their relations. For example, in server/server.go we need to include almost all components separately, injecting dependencies instead of configuration.

We did this to ease the testing, enable the tampering, etc. But we can design this with the same functionality without exposing the balloon components if we introduce new constructors for specific needs, instead of exposing everything.

Also I think this will lead to improve our APIs and interface{} designs, adding only what's really needed, and using composition to build complexity.

Please add places where we can fix this situation:

server/server.go --> clean up by simplifying balloon constructor and API

Create benchmark suite to test Membership() throughput

We need to measure the system Membership throughput because it's a critical parameter for the system operation(the Membership performance is expected to be greater than the write performance, as well as not losing write throughput when both operations are performed simultaneously.)

We should consider the following scenarios:

  • Single node:

  • Start a single node

  • Preload N events

  • Query Membership of the N events

  • Perform continuous write load on the leader

  • At the same time, perform Query Membership of the previous N events

    Multi node:

  • One Leader & 2 followers{1..2}

  • Start the cluster

  • Preload N events to the leader

  • Query Membership of the N events to follower 1

  • Perform continuous write load on the leader

  • At the same time, perform Query Membership of the previous N events to all the followers

  • One Leader & 4 followers{1..4}

  • (Same as before)

Cloud benchmarking

Prepare and execute a benchmark plan on different clouds to test performance in different providers and VM flavors.

Remove snapshot channel from the FSM

The process of sending newer snapshots to the snapshot channel (now named agentsQueue) after inserting the event into the balloon must get removed out the critical path of the insertion operation.

Given that the process of applying changes to the FSM is executed in a serialized way with one single thread, the queuing could lead to a potential stalling situation if the channel gets full. Snapshots should be sent to the channel after committing the command into the WAL, just after resolving the Apply future in the Raft node. This way, the goroutine that handles the HTTP request is responsible for the sending the snapshots freeing up the Raft applying thread.

Improve re-join cluster after restart

Spinning QED in cluster mode when all the nodes join via raft and the leader gets elected, if the actual leader goes down and if we bring it up again it's unable to join the cluster again.

We should improve our join process for new and clusters with existing configuration.

Add persistence to hyper tree's cache

In order to improve durability and keep the hyper tree consistent under shutdown or failure scenarios, we need to implement a persistent storage of the cache on disk, so the next time that the server starts up, all of the previously cached data is still available.

Design / explore how to publish signed snapshots

We need to design and implement a way to publish signed snapshot. In order to select a given design, we need to explore multiple options and state which one is the most convenient for our needs.

Increase tests quality

  • Improve our tests coverage by increasing the number of scenarios tested in each part of the QED.
  • Document each test objective and remove duplicate tests
  • Increase the quality of our fakes and its documentation
  • Increase the number and quality of acceptance tests
  • Test corner cases

adding gorelease in azure

After creating a GITHUB_TOKEN for the manual release we need to improve this workflow:

  • discover how to store secrets in azure-pipelines
  • generate a GITHUB_TOKEN from a bot, or our ORG to be independent of the users
  • create the task in the pipeline and only run when a tag is uploaded (and ensure commit and tag are uploaded synchronously)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.