Git Product home page Git Product logo

trillian's Introduction

Trillian: General Transparency

Go Report Card codecov GoDoc Slack Status

Overview

Trillian is an implementation of the concepts described in the Verifiable Data Structures white paper, which in turn is an extension and generalisation of the ideas which underpin Certificate Transparency.

Trillian implements a Merkle tree whose contents are served from a data storage layer, to allow scalability to extremely large trees. On top of this Merkle tree, Trillian provides the following:

  • An append-only Log mode, analogous to the original Certificate Transparency logs. In this mode, the Merkle tree is effectively filled up from the left, giving a dense Merkle tree.

Note that Trillian requires particular applications to provide their own personalities on top of the core transparent data store functionality.

Certificate Transparency (CT) is the most well-known and widely deployed transparency application, and an implementation of CT as a Trillian personality is available in the certificate-transparency-go repo.

Other examples of Trillian personalities are available in the trillian-examples repo.

Support

Using the Code

The Trillian codebase is stable and is used in production by multiple organizations, including many large-scale Certificate Transparency log operators.

Given this, we do not plan to add any new features to this version of Trillian, and will try to avoid any further incompatible code and schema changes but cannot guarantee that they will never be necessary.

The current state of feature implementation is recorded in the Feature implementation matrix.

To build and test Trillian you need:

  • Go 1.21 or later (go 1.21 matches cloudbuild, and is preferred for developers that will be submitting PRs to this project).

To run many of the tests (and production deployment) you need:

Note that this repository uses Go modules to manage dependencies; Go will fetch and install them automatically upon build/test.

To fetch the code, dependencies, and build Trillian, run the following:

git clone https://github.com/google/trillian.git
cd trillian

go build ./...

To build and run tests, use:

go test ./...

The repository also includes multi-process integration tests, described in the Integration Tests section below.

MySQL Setup

To run Trillian's integration tests you need to have an instance of MySQL running and configured to:

  • listen on the standard MySQL port 3306 (so mysql --host=127.0.0.1 --port=3306 connects OK)
  • not require a password for the root user

You can then set up the expected tables in a test database like so:

./scripts/resetdb.sh
Warning: about to destroy and reset database 'test'
Are you sure? y
> Resetting DB...
> Reset Complete

Integration Tests

Trillian includes an integration test suite to confirm basic end-to-end functionality, which can be run with:

./integration/integration_test.sh

This runs a multi-process test:

  • A test that starts a Trillian server in Log mode, together with a signer, logs many leaves, and checks they are integrated correctly.

Deployment

You can find instructions on how to deploy Trillian in deployment and examples/deployment directories.

Working on the Code

Developers who want to make changes to the Trillian codebase need some additional dependencies and tools, described in the following sections. The Cloud Build configuration and the scripts it depends on are also a useful reference for the required tools and scripts, as it may be more up-to-date than this document.

Rebuilding Generated Code

Some of the Trillian Go code is autogenerated from other files:

  • gRPC message structures are originally provided as protocol buffer message definitions. See also, https://grpc.io/docs/protoc-installation/.
  • Some unit tests use mock implementations of interfaces; these are created from the real implementations by GoMock.
  • Some enums have string-conversion methods (satisfying the fmt.Stringer interface) created using the stringer tool (go get golang.org/x/tools/cmd/stringer).

Re-generating mock or protobuffer files is only needed if you're changing the original files; if you do, you'll need to install the prerequisites:

  • a series of tools, using go install to ensure that the versions are compatible and tested:

    cd $(go list -f '{{ .Dir }}' github.com/google/trillian); \
    go install github.com/golang/mock/mockgen; \
    go install google.golang.org/protobuf/proto; \
    go install google.golang.org/protobuf/cmd/protoc-gen-go; \
    go install google.golang.org/grpc/cmd/protoc-gen-go-grpc; \
    go install github.com/pseudomuto/protoc-gen-doc/cmd/protoc-gen-doc; \
    go install golang.org/x/tools/cmd/stringer
    

and run the following:

go generate -x ./...  # hunts for //go:generate comments and runs them

Updating Dependencies

The Trillian codebase uses go.mod to declare fixed versions of its dependencies. With Go modules, updating a dependency simply involves running go get:

go get package/path       # Fetch the latest published version
go get package/[email protected] # Fetch a specific published version
go get package/path@HEAD  # Fetch the latest commit 

To update ALL dependencies to the latest version run go get -u. Be warned however, that this may undo any selected versions that resolve issues in other non-module repos.

While running go build and go test, go will add any ambiguous transitive dependencies to go.mod To clean these up run:

go mod tidy

Running Codebase Checks

The scripts/presubmit.sh script runs various tools and tests over the codebase.

Install golangci-lint.

go install github.com/golangci/golangci-lint/cmd/[email protected]

Run code generation, build, test and linters

./scripts/presubmit.sh

Or just run the linters alone

golangci-lint run

Design

Design Overview

Trillian is primarily implemented as a gRPC service; this service receives get/set requests over gRPC and retrieves the corresponding Merkle tree data from a separate storage layer (currently using MySQL), ensuring that the cryptographic properties of the tree are preserved along the way.

The Trillian service is multi-tenanted – a single Trillian installation can support multiple Merkle trees in parallel, distinguished by their TreeId – and each tree operates in one of two modes:

  • Log mode: an append-only collection of items; this has two sub-modes:
    • normal Log mode, where the Trillian service assigns sequence numbers to new tree entries as they arrive
    • 'preordered' Log mode, where the unique sequence number for entries in the Merkle tree is externally specified

In either case, Trillian's key transparency property is that cryptographic proofs of inclusion/consistency are available for data items added to the service.

Personalities

To build a complete transparent application, the Trillian core service needs to be paired with additional code, known as a personality, that provides functionality that is specific to the particular application.

In particular, the personality is responsible for:

  • Admission Criteria – ensuring that submissions comply with the overall purpose of the application.
  • Canonicalization – ensuring that equivalent versions of the same data get the same canonical identifier, so they can be de-duplicated by the Trillian core service.
  • External Interface – providing an API for external users, including any practical constraints (ACLs, load-balancing, DoS protection, etc.)

This is described in more detail in a separate document. General design considerations for transparent Log applications are also discussed separately.

Log Mode

When running in Log mode, Trillian provides a gRPC API whose operations are similar to those available for Certificate Transparency logs (cf. RFC 6962). These include:

  • GetLatestSignedLogRoot returns information about the current root of the Merkle tree for the log, including the tree size, hash value, timestamp and signature.
  • GetLeavesByRange returns leaf information for particular leaves, specified by their index in the log.
  • QueueLeaf requests inclusion of the specified item into the log.
    • For a pre-ordered log, AddSequencedLeaves requests the inclusion of specified items into the log at specified places in the tree.
  • GetInclusionProof, GetInclusionProofByHash and GetConsistencyProof return inclusion and consistency proof data.

In Log mode (whether normal or pre-ordered), Trillian includes an additional Signer component; this component periodically processes pending items and adds them to the Merkle tree, creating a new signed tree head as a result.

Log components

(Note that each of the components in this diagram can be distributed, for scalability and resilience.)

Use Cases

Certificate Transparency Log

The most obvious application for Trillian in Log mode is to provide a Certificate Transparency (RFC 6962) Log. To do this, the CT Log personality needs to include all of the certificate-specific processing – in particular, checking that an item that has been suggested for inclusion is indeed a valid certificate that chains to an accepted root.

trillian's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trillian's Issues

"Unexpectedly reading from within GetNodeHash()" warnings

If I add --alsologtostderr to the invocation of ./trillian_map_server in integration/map_integration_test.sh, I see lots of warnings:

W0131 09:37:43.527794   11656 subtree_cache.go:185] Unexpectedly reading from within GetNodeHash()

The comment there says "This should never happen - we should've already read all the data we need above, in Preload()"...

Integration tests for backend

Tests that prove end to end that it's doing the correct crypto and other stuff related to adding and querying entries, signing / sequencing, proof serving.

Map Index / HashKey should be computed by the personality

The current Map Hasher interface contains a HashKey function to turn a string into a sha256 index in the map. This index, however, should be computed by the personality, not the map. Key Transparency, for instance, computes the index as the output of a privately keyed signature function.

If this sounds good, I'll convert the Map interfaces to accept a index []byte rather than key []byte or HashedKey []byte, and remove the HashKey function from the MapHasher.

KeyManager: Perform all key verification during New* functions

The KeyManager interface currently supports returning error for several of the Get* functions.
This dramatically increases code complexity for calling functions. All these error params can be eliminated if the New* functions require a valid key to be loaded before returning a KeyManager object. Is there a strong reason to support starting Trillian without key material?

type KeyManager interface {
	Signer() (crypto.Signer, error)
	SignatureAlgorithm() spb.DigitallySigned_SignatureAlgorithm
	HashAlgorithm() crypto.Hash
	GetPublicKey() (crypto.PublicKey, error)
	GetRawPublicKey() ([]byte, error)
}

Proposed interface:

New(...) (KeyManager, error)
type KeyManager interface {
	Signer() crypto.Signer
	SignatureAlgorithm() spb.DigitallySigned_SignatureAlgorithm
	HashAlgorithm() crypto.Hash
	GetPublicKey() crypto.PublicKey
	GetRawPublicKey() []byte
}

QueueLeaves should return individual status for each leaf

Currently, the storage.QueueLeaves API effectively enforces an all-or-nothing failure mode for the addition of each leaf in the array, which is not ideal. E.g. this would cause a batch of additions to fail if the log didn't accept dupes but one was presented.

Split application STHs from generic ones

  • Rework schema + backend to add support for application tree heads.
  • Update CT example to create / serve application STHs.
  • Clean up anything in schema that this obsoletes (public / private STH?)

Use Inclusion Proof Verifier in tests

Current tests verify inclusion proofs by rebuilding a parallel tree and checking that the inclusion proofs are the same. We should migrate these tests to use a Log Verifier that computes the root hash from the neighbor nodes.

Need API for creating new logIDs

We need some kind of administrative API that supports creating and deleting LogIDs.
There's ongoing work on an admin API. Not sure how to connect this bug to that.

merkle.CompactMerkleTree.Hashes() doesn't return [] for "perfect tree sizes"

Updates made to CompactMerkleTree in d1ee609 (PR #180) to return a copy of the internal Node state, have not properly covered the edge case where the tree is perfectly balanced (i.e. has 2^n leaf nodes).

In that situation the set of nodes should be empty: the size & root hash alone describe the tree, as evidenced by the fact that when merkle.NewCompactMerkleTreeWithState() is called it will make 0 calls to the backing store via its getNodeFunc to retrieve hashes.

Create log verifier client

Create a pure go implementation that can verify all the responses that are returned from the bits of Trillian that implement an append only log.

Components:

  • SetLeaves
    • SetLeaf
    • GetLeaf
      • Hashing Interface #331
      • GetInclusionProofByHash #320
        • InclusionProofVerifier #334
      • GetSignedLogRoot
        • VerifySignature #351
        • ConsistencyProofVerifier #334

Supports google/keytransparency#384

QueueLeaves should not be part of the LogTX

When adding a batch of leaves some leaves could fail, e.g. due to dupe keys, and the log shouldn't ditch the rest of the batch due to an unrelated dupe entry. Rather, it should submit as many as it can, and report those which failed to the caller.

API: Set single place for Index to be set and returned.

The Map API is intended to expose a key / value interface, yet the "value" aka. MapLeaf currently contains the key aka Index. Index is also contained in IndexValue, IndexInclusionProof and several other messages, producing confusion about where and when to set Index.

Proposal:
Use proto3 map in Set / Get if we can find a scalar value for index. Perhaps a hex string?
Or standadize on the IndexValue message.

Blocking google/keytransparency#486

Signature Verifier

The signatures library doesn't have a companion verification function.
Investigate importing the one from Key Transparency

Hashing Interface

Hasher needs to be an interface to support alternative hashing implementations.
Here's a proposal that supports both logs and maps.
Individual implementations are not required to incorporate all input fields into their hash.

// TreeHasher provides hash functions for tree implementations.
type TreeHasher interface {
	HashLeaf(treeID, index []byte, depth int, dataHash []byte) Hash
	HashEmpty(treeID, index []byte, depth int) Hash
	HashInterior(left, right Hash) Hash
}

Steps:

  • Move HashKey to Personality
  • Create Hasher interface
  • Create test implementations
  • Migrate to new function interfaces.

Remove TreeID from NewStorage* functions

Return a generic storage object rather than new objects for each tree.

  • Eliminates the need for caching within the storage factory.
  • Eliminates the need to create a fake tree in order to test DB connectivity.
  • Simplifies testing.

Partial work started here

Support proofs at arbitrary tree sizes

Currently we can only obtain proofs where the tree size is at an internal STH but it should be possible to return proofs at arbitrary sizes as in existing implementations.

This requires dynamic rehashing of some proof nodes and is complex to implement. Research indicates it's feasible to unroll the rehashing chain so storage can fetch all the involved nodes and the rehashing can be done by post-processing.

Changes involved (plus tests of course):

  • NodeReader interface must be able to optionally request tree revisions for sizes >= specified size
  • Merkle path code must handle rehashing and annotate node fetches that are part of a rehash
  • Proof fetching + rehashing should be broken out of the server to allow separate testing and reuse
  • After the storage fetch the rehashing operations must be carried out using the annotations
  • API should include a flag to allow rehashing so we can test it without impacting current users

Implement backend inclusion proofs

Fetching correct node set and stuff. Note: for this milestone initially only at tree sizes where we have an STH because the intermediate recalculations are complex. Needs to support by hash and by index, which are fairly similar operations.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.