kadena-io / chainweb-node Goto Github PK

View Code? Open in Web Editor NEW

248.0 248.0 90.0 17.84 MB

Chainweb: A Proof-of-Work Parallel-Chain Architecture for Massive Throughput

Home Page: https://docs.kadena.io/basics/whitepapers/overview

License: BSD 3-Clause "New" or "Revised" License

Haskell 95.53% Nix 0.16% Shell 0.02% Python 0.17% C 0.37% Pact 3.50% Dockerfile 0.24%

blockchain chainweb-node haskell kadena

chainweb-node's People

Stargazers

Watchers

Forkers

larskuhtz eskimor kenadia geoffreyporto rebecca-io amandacameron huglester fresheyeball gaoyang1120 gussulliman bobthebuilder420-rcl jacoby6000 obsidiansystems snatural alexfmpe gripen89 4everaerial newname12345 cratec gr4zz allchain past2017 bxlkm1 tomsmalley opt9 ceozero hassoon1986 jamexiao88 coolio-coder waltervargas akp05 ryantrinkle fosskers sshyran salvatore-fxpig bessieisbinbin davidufochain jessecu 5omnium crypto-zone mariszo jmininger shamzy1 bjing panoptisdev samonh coolcat2012 davehenton moteesh-reddy the-gaia-coin-foundation crowneye cleancoindev edmundnoble dwongdev ashr3f1 teury alijamaan trumae violattice zaptariz nopool nickrawlings2012 setecho labmember-001 urantialife zhangxu0115 kiruthik-raaj nguyenphuminh toptal126 ken0803 pinkdiamond1 innovation-labs-technical-hub arefathi pavan413 hashpool esomore koltigin luzzotica enof hoanghungict mbwmbw1337 louisaal cb-defi-notifs kadena-io omahs joethechicken liquidityqloud thoughtpolice daplcor

chainweb-node's Issues

Enforce mandatory network upgrades

Limit concurrency when fetching dependencies from "origin"

When pulling dependencies of a Cut, it is first attempted to request those from the origin node of the cut. Dependencies are queried in parallel as they are discovered.

Due to task sharing Concurrency for block headers is (big-O) bounded by the width of the chainweb. However, block headers are small and can be queried quickly. Payloads are potentially large and fetching them could pile up parallel tasks.

This is only an issue for fetching from "origin". If the origin query fails, dependencies are queried through the P2P network with has bounded concurrency.

Exercise multi-chain consensus edge cases

(e.g. forks, network splits, etc.)

Clean `Chainweb.BlockHeaderDB.RestAPI.Server`

There is a lot of duplicate code in this file, and much of it could be consolidated.

What value should be used for license in module headers?

Assuming, that there is no license, should we use None? In that case, should the License file be deleted?

chainweb-node seems to open the same log file twice for writting

which causes scripts/run-nodes.sh to fail with errors like this:

bash ./scripts/run-nodes.sh ./result/ghc/chainweb/bin/chainweb-node 10 /tmp/run-nodes-logs
starting 10 chainweb nodes
started bootstrap node 0
started node 1
chainweb-node: /tmp/run-nodes-logs/telemetry.node0.log: openFile: resource busy (file is locked)
started node 2
chainweb-node: /tmp/run-nodes-logs/telemetry.node1.log: openFile: resource busy (file is locked)
started node 3
chainweb-node: /tmp/run-nodes-logs/telemetry.node2.log: openFile: resource busy (file is locked)
started node 4
chainweb-node: /tmp/run-nodes-logs/telemetry.node3.log: openFile: resource busy (file is locked)
chainweb-node: /tmp/run-nodes-logs/telemetry.node4.log: openFile: resource busy (file is locked)
started node 5
started node 6
chainweb-node: /tmp/run-nodes-logs/telemetry.nod

When run with strace, by means of a little wrapper script (mynode.sh):

#!/usr/bin/env bash

strace ./result/ghc/chainweb/bin/chainweb-node "$@"

Run this way:

bash ./scripts/run-nodes.sh ./mynode.sh 10 /tmp/run-nodes-logs 2> /tmp/trace-log

I get the following strace (only the interesting part):

openat(AT_FDCWD, "/tmp/run-nodes-logs/telemetry.node0.log", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 56
fstat(56, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
ftruncate(56, 0)                        = 0
ioctl(56, TCGETS, 0x7ffe2fcc2f00)       = -1 ENOTTY (Inappropriate ioctl for device)
ioctl(56, TCGETS, 0x7ffe2fcc2f00)       = -1 ENOTTY (Inappropriate ioctl for device)
futex(0x7ffbc0000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc0000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x39e3ae8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffb84000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffb84000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x39f4088, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbb0000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbb0000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a14bc8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbbc000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbbc000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a25168, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc8000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc8000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a35708, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc4000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbc4000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a45ca8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbdc000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbdc000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a56248, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd4000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd4000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a667e8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd8000bac, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffbd8000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a76d88, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffba8000ba8, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7ffba8000bb0, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x3a87328, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigprocmask(SIG_BLOCK, [INT], [], 8) = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=380220146}) = 0
getrusage(RUSAGE_SELF, {ru_utime={tv_sec=0, tv_usec=344055}, ru_stime={tv_sec=0, tv_usec=36322}, ...}) = 0
mmap(0x4200e00000, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4200e00000
madvise(0x4200e00000, 1048576, MADV_WILLNEED) = 0
madvise(0x4200e00000, 1048576, MADV_DODUMP) = 0
sched_yield()                           = 0
clock_gettime(CLOCK_PROCESS_CPUTIME_ID, {tv_sec=0, tv_nsec=391705730}) = 0
rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
openat(AT_FDCWD, "/tmp/run-nodes-logs/telemetry.node0.log", O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, 0666) = 57

Notice the openat in the last line and the one in the first line - same file, but no close in between.

I was not able to find the reason for this yet - still investigating.

BlockHeaders and Payloads in RocksDB

A serialized block header requires about 250-500 bytes. At a block rate of 15 seconds and 100 different chains a year of block headers is worth about 40-50GB of data. Eventually, we need to back the block header database by a proper persisted database.

Candidates include sqlite, leveldb, rocksdb. The latter seems to be the most modern one and may be a good fit for our needs.

Phase

Store Payloads in RocksDb
Implement "Tables" for RocksDB for storing different Haskell types
Add RocksDb Backend for BlockHeaderDb
Store cut history/log in RocksDb

Phase

Simplify TreeDB API (or remove TreeDB altogether)
Performance improvements for RocksDB backend

Phase

Add Haskell bindings for newer RocksDB features
- Column families (use this for implementing different tables)
- tail iterator
- prefix iterator

Investigate custom "Ingress" and "Egress"

Currently, EC2 instance HTTP traffic ingress/egress is left "wide open". If you knew the IP addresses of any of our nodes, you could connect random Chainweb instances to our network.

At least for Testnet, we'd like to have control over who is in our network. Approaches:

[Terraform] Better define ingress/egress fields in our Security Group definitions?
[Terraform] Actually define a non-default AWS internal VPS?
[Haskell] Support special lists of IP address that can be read at runtime, indicating special HTTP traffic to allow?

My personal vote is for (1).

Build chainweb with nix and running a chainweb node

Following instructions in README I should be able to run chainweb node after building it with nix but I couldn't. The reason is chainweb-node executable is not there.

To run chainweb-node successfully:

./result/ghc/chainweb/bin/chainweb-node --node-id=0 --config-file=./scripts/test-bootstrap-node.config

I am not sure if this is right the right executable.

Include HTTP HEAD calls to REST API

It seems that servant doesn't generate HEAD endpoints by default.

implement generation of HEAD endpoints for all existing GET APIs.
implement clients for HEAD endpoints
include pagination information into the HTTP headers and document this
add test cases

MonadFail instances

fail is being moved out of Monad as part of the MonadFail proposal: https://prime.haskell.org/wiki/Libraries/Proposals/MonadFail

Lars already has a branch that addresses this for Pact, which allows it to be more forward-compatible.

Release management (binaries, distribution, etc)

Coin Faucet

Account with 1B coins
Ability to write a contract w/max of 100 coins
People can take out coins

Propagation of ChainwebVersion

The is issue serves two related purposes:

Derive the ChainGraph from the ChainwebVersion.
Propagate the ChainwebVersion reified as type parameter.

The ChainwebVersion and the ChainGraph are static values for a given chainweb instance. They are both represented as terms at runtime but are morally types. Therefore it makes sense to reify the ChainwebVersion value at the topmost layer, let the constraint solver propagate it, and inject it via reflection where it is used. By tagging data types that depend on the ChainwebVersion and in particular the ChainGraph with the reified ChainwebVersion we also get static consistency properties. For instance, we are guaranteed that the ChainIds of a BlockHash actually belong to the graph that is in scope.

Currently, we use given in a several places for the ChainGraph. This which has the benefit of not having to propagate the ChainGraph as explicit parameter to each use site, but instead have the constraint solver propagate it for us (which is convenient since dictionaries are propagate backward and forward through the control graph). Another major benefit is that it supports keeping code pure that otherwise would require some monadic context to provide the ChainGraph in the environment. That is, is for instance, convenient for defining type class instances for classes that don’t expect a parameter for a monadic environment. E.g. class Given ChainGraph => Arbitrary chainId where … instead of class HasChainGraph m => Arbitrary (m ChainId) where ….

Profiling and benchmark suite

Usually, every newly created Haskell code base has some performance issues. These are often due to laziness bugs. Among those are often some low-hanging fruits that can be quickly identified through performance and heap profiling.

Using thread scope can also provide insights on scheduling and garbage collection overhead.

There should also be a benchmark suite for critical operations like serializing and deserializing block headers, looking up block headers in the chain db, synchronizing db snapshots, adding new block headers, validating block headers, and difficulty adjustment computations.

Coin Contract 'credit' overwrites keys

Coin Contract credit function overwrite keys during transfer.

Coin Faucet contract

Write a faucet contract to be installed in Testnet Genesis with some allocated amount of coins owned by a ModuleGuard. Upon request the faucet will transfer some of its coins to requester, with some maximum and maybe a daily limit. "Refunds" (where users return coins to the faucet) should simply be a matter of transfer in the coin contract back to the faucet account.

Staging area for blocks with missing dependencies

When a block header with missing dependencies is received it is likely that those dependencies will be received later on and the block can be added to the database.

For that, instead of discarding the block because of a validation failure, the block should be stored temporarily in a staging area.

In addition a synchronization session should be triggered to query the missing dependencies for that block header. For that it is helpful when the origin of the block header is recorded (cf. #79).

If the block header is part of an active cut synchronization should be done with high priority.

Consider removal of custom Numeric typeclasses

As found in the Numeric.* modules.

SPV support

(blocked by transaction block format, chain header oracle)

Support SPV via a built-in function that recognizes a JSON payload of a particular format.

The JSON payload contains an SPV merkle proof of some receipt on another chain plus intermediate/connecting merkle roots leading to the executing chain.

Safe `BlockHeader` Decoding

#322 removes recently added logic that detects when a genesis block is being decoded. This caused problems for code reorganization, as described here.

Some logic of similar spirit ought to be reinstated somewhere. The primary consumer of the function in question (decodeBlockHeader) is the FromJSON instance of BlockHeader. Currently, the dependency graph between these types and functions is as follows:

To put the check back where it was would require drawing an arrow between decodeBlockHeader and genesisBlockHeader, which is clearly not possible. So, some further consideration is required.

Add persistence for Pact checkpoints

Pact checkpoints are currently saved in memory. We need a good solution for saving / restoring checkpoints from disk

Mining reward

Block reward has the following inputs:
N - number of chains (ie 10)
S - total mining coin supply (ie 700B)
B - current blockheight
R - block rate (in some time unit) (ie 30s)
H - halfLife in years (ie 20 years)

At a given block height B, the total reward T (which is the sum of all N mining rewards on every chain for height B) is calculated with an exponential decay function for a half-life H allocating S total coins at block rate B. Thus the individual reward is T/N.

N is hardcoded as a function of B, so we can step chain count at particular blockheights.
S is a hardcoded constant, perhaps as a function of chain version.
B is available already when PactService is making blocks.
R is configured or hardcoded as a function of chain version.
H is hardcoded, perhaps as a function of chain version.

Improve FloydWarshall

We use the Floyd-Warshall algorithm to calculate the diameter of chain graphs. This is mostly useful for testing, since in production, we would use known, fixed-diameter graphs.

massiv is employed in implementing the Data.DiGraph.FloydWarshall module, and its usage can be improved in a few ways.

Maximize Laziness

type DenseAdjMatrix = Array U Ix2 Int

fromAdjacencySets :: AdjacencySets -> DenseAdjMatrix
fromAdjacencySets g = makeArray Seq (n :. n) go
  where go = ...

As much as possible, we want our Array types to have the internal representation of D or DW. This avoids memory allocation until the last possible moment. When we do force them, asking for U, P, or S are generally equivalent (at least for our purposes).

floydWarshall :: Array U Ix2 Double -> Array U Ix2 Double

shortestPaths :: Array U Ix2 Int -> Array U Ix2 Double
shortestPaths = floydWarshall . computeAs U . distMatrix

Since everything is being forced as U here, a lot of extra memory is being allocated. computeAs should only be used:

At the very end of a composition of matrix operations, or;
Before doing a Stencil-based operation (like convolution, etc.)

Generally, for non-Stencil ops, there should always be a way to do everything lazily (via D) until the very end.

Parallelism

At least in test environments, we can assume graph sizes of > 1000 nodes. This means a Matrix of at least 1000x1000. In my experience with Massiv, for any Matrix over 256x256, using the Par evaluation strategy quickly becomes worthwhile.

fromAdjacencySets :: AdjacencySets -> DenseAdjMatrix
fromAdjacencySets g = makeArray Seq (n :. n) go  -- should be `Par`!

Then, when compiled/ran with the correct RTS ops, we get performance boosts for free.

Stencils

The implementation of the floydWarshall function in particular allocates a new Array for each interation, which means n unnecessary allocations are occuring, where n is the width/length of the Array. Glancing at the code, it seems like this could just be rewritten as a Stencil operation and done in a single pass.

`IntMap` / `IntSet`

type AdjacencySets = HM.HashMap Int (HS.HashSet Int)

Could this be rewritten in terms of IntMap and IntSet? fromAdjacencySets uses a lot of lookup and member calls, and the Int-based structures can perform these operations about twice as fast as the hash-based ones.

GHC 8.6-based nixpkgs / stack.yaml

For both Pact and Chainweb. This will allow up to unpin a great number of dependencies in our default.nix files.

Requires the MonadFail and Megaparsec issues to be cleared first.

Design Hard Fork

Design implementation of Chainweb Hardforks

ChainId incorporates network/version

ChainId needs to incorporate the version into itself so that you can never type a chain ID for a network that is invalid. I propose we move to a chain ID that incorporates a network ID (version) in its representation, and use that everywhere, including in endpoint URLs.

This eliminates magic and allows trivial creation of chain IDs as validation is automatic (assuming we solve this problem with non-serializable versions). You can never just write 0, you have to say Testnet01-0; the smart constructor can now be used safely as it will do all the validation right there, by first recovering the version, which yields the graph, which validates the index, or boom.

Whitelist / Blacklist

defpact-based SPV in Chainweb

Integrate Pact changes in #108, #456 and support automatic SPV

upgrade Pact and get it compiling
start Pacts from > step 1, using continuation values in SPV proof
do SPV in Haskell code before launching pact second step

Integrate Pact API changes

Upgrade pact to incorporate #451 and related and basically get it compiling:

Mempool/Pact RestAPI stuff
Pact return values in PactService -> RPC -> Mempool

Use new mining code for TestWithTime miner

The POW miner has recently been rewritten from scratch. The TestWithTime and Test miner should take advantage of this and use new mining code along with a trivial target and appropriate thread delay.

SPV proof creation should fail gracefully on empty payloads and out of bound indexes

Kademlia based P2P network

Distinct "sync phase" upon startup

Currently, all our components start at the same time. Futher, the sync
components assume that a node joining the network is already fairly
close to the top of the Chainweb. This is an issue for a fresh node, who
may have no data from the main "consensus line" at all. While syncing to
try and catch up, this node will also be mining on the lowest block
heights. This is completely wasted work.

Instead, we should add a preliminary phase where a node will do nothing
but sync and verify Pact transactions. Once it gets "close enough" to
the head of the Chainweb, it can resume normal operation as currently
defined.

This preliminary phase can be it's own module that just calls existing
functions. "New code" doesn't need to be written, per se. The call to
this new module should occur before [this block of code]. Notice the
"FIXME" ;)

[this block of code]
https://github.com/kadena-io/chainweb-node/blob/master/src/Chainweb/Chainweb.hs#L424-L434

Ensure proper database opening/closing within on-disk checkpointer

This issue is based upon @gregorycollins comments on an earlier PR (Ask me for information if you need the reference). In particular, he noted that there was an issue in the restore' function in SQLiteCheckpointer.hs. Here are his comments verbatim:

Start of Greg's Comments

AUGH I had a huge response written here and github ate it. Let me try to recreate it. Let's go over the sequence of operations on what should be happening here. Assume you have existing state at $v1 = (height1,hash1):

restore @ $v1 -> copies $data/$v1 to $tmp and loads it into sqlite, returns as PactDbState. $tmp contents currently identical to $v1 (we just copied it). Ideally we will have run this as the initialization portion of a bracket so that we set up rm -rf $tmp upon exit of the whole transaction.
pact runs transactions. $tmp now contains contents of $v1 plus modifications made by pact transactions
save @ $v2 -> closes sqlite db and atomically renames $tmp to $v2. Restores for DB state $v2 will now find our updated DB.
What's currently happening, as far as I can tell:

restore @ v1. withTmpFile creates $tmp, and reinitDbEnv calls SQLite to create it empty in-place (because there are no contents yet). We create a PactDbState pointing to $tmp but withTmpFile unlinks it on exit from restore.
Pact runs transactions. Probably SQLite recreates the empty db at this point. $tmp contents are either empty or contain just the modifications run by the transactions run on the empty DB.
save @ v2. We do copyFile tmp tmp2 >> rename tmp2 v2. $v2 is missing most of the historical db contents, and $tmp is leaked?

End of Greg's comments

Greg and I had a conversation over Slack over how to address this problem. I am going to create a PR that should fix this problem.

Mempool tx re-introduction on forks

When forks occur, transactions on the losing fork need to be put back into the mempool so they can be added to future blocks on the winning fork

Cleanup Chainweb.Time and related modules

The implementation of Chainweb.Time and the related numeric classes feel a bit like overkill. We should consider some cleanup.

Test suite for P2P network

Log information about transactions in a block

At the very least, this should include:

The number of transactions in the block

Avoid RecordWildCards

The RecordWildCards pragma performs code-gen in order to function. Removing all instances of its use should improve compile times.

Create type for query arguments in TreeDB

Create a type for the common query arguments:

        -> Maybe (NextItem (DbKey db))
         -> Maybe Limit
         -> Maybe MinRank
         -> Maybe MaxRank

cf. https://github.com/kadena-io/chainweb/pull/78/files#diff-2d86eb70515a06a3479275ea4c5c9a6bR373

Megaparsec 7 Compatibility

It has been about 8 months since the beginning of the Megaparsec 7 series. 7 has better error handling, but broke their API in doing so. In particular, the Stream class has changed and the instance of it for Cursor in Pact.Types.ExpParser will need to be updated.

SQLite-based checkpointer for Pact

As Pact has distinct user tables and a structure that is already very amenable to versioned history, propose to implement checkpoints directly in SQLite as opposed to (a) copying SQLite DB files (b) using BTRFS or some filesystem solution (c) using RocksDB.

Why not RocksDB

RocksDB is suggested as inferior for Pact mainly because the need to have so many user schemas requiring extensive keyspace operations. Also, SQLite actually outperforms RocksDB for single-threaded indexed queries, see https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks at ~50us where Pact has seen db updates in the <20us zone.

As for SQLite size limitations, they are actually not small and will probably suffice for many years, see https://www.sqlite.org/limits.html. However this design proposes that multiple connections can be used, where connections own some count of user tables. This can be introduced later as long as there is design support for it now.

Note lastly that RocksDB snapshots are not persisted to disk so that solution is not usable.

Lastly, this design actually leverages SQLite's performant indexing to leverage relational SQL as the versioning mechanism, as opposed to previous SQLite Pact usage which looks more like a key-value store. Indeed the Pact language just wants a journaled key-value, but this solution will handle reorgs "relationally".

Overview

The main notion is of a version corresponding to a fork that will be used along with block height to determine the latest version of a key, and to label entries as "forked" (or simply delete them) when a reorg occurs.

Example user table use

The following example will be used to illustrate the design. A user table will go through the following history. Row data will be represented with an arbitrary number. "Version" concept is detailed elsewhere but corresponds to a reorg history.

key	block	version	data	notes
a	10	1	123	Version 1 represents current reorg/fork
b	10	1	234
c	11	1	456	Reorg below replaces from here
a	11	1	124
d	12	1	567
c	12	1	457
c	11	2	460	Fork/reorg to Version 2
b	12	2	240

Thus, a table scan at block 12 version 1 just before the reorg should return:

key	block	version	data
a	11	1	124
b	10	1	234
c	12	1	457
d	12	1	567

A table scan at the end, post-reorg, should return:

key	block	version	data
a	10	1	123
b	12	2	240
c	11	2	460

Version detection

Block validation supplies a stream of (block height [B],parent hash [P]) pairs. A version indicates the non-forked behavior of this stream, such that receiving a monotonically-increasing dense stream of block heights indicates a single version.

When a block height arrives that is less than the expected next value, the version (V) will increment and version maintenance operations will occur.

System will need to have a central version history table ordering all (B,P) pairs and associating a version. Re-orged pairs can be discarded. The "HEAD" version is maintained in memory and can be recovered from the version history table.

The "block version" is the pair of (B,V) as seen in the examples above.

Table management

System will track all versioned system and user tables -- versioned system tables would include the Pact continuation table and refstore; user tables includes coin contract table. Tables should be associated with a database connection/file with some limit on how many tables should be in a file/conn. This initially can be a single connection but we want the design to support multiple connections. The central system tables can be in their own connection or share the first connection.

Tables will also be associated with the block version when they were created. Tables can optionally be dropped when reorged, or marked as old/invalid; table name in database can include the block version if desired. See "Deleting vs Marking" below.

Per-table versioning and history

Table operations will support reorgs by tracking versions directly in the relational schema. Queries will leverage indexes and "SELECT TOP 1" queries to find the latest key.

Pact history requests will no longer need dedicated "transaction tables". 'txid' will be stored as a relational column. Note that Pact will no longer track multiple updates in a given txid for a key as this is (a) not a good practice and (b) not relevant to larger history.

Deleting vs Marking on version changes

This solution supports either (a) marking deactivated block versions by updating the version column with a FORKED code, or simply deleting the row. Likewise, forked table creates can either associated the table as FORKED or simply drop the table.

Deletion has the advantage of space compaction; marking can help with troubleshooting. While supporting both as a configuration option might be nice it will also slow development.

Decision: use deletion. Descriptions below will nonetheless use FORKED to indicate marking option.

Versioned table schema

All versioned tables will have the following schema:

Name	Type	Note	Index
KEY	String	User key	Non-unique
BLOCK	Int	Block height	Non-unique
VERSION	Int	Reorg version
TXID	Int	Transaction ID for history	Non-unique
DATA	JSON	User data	No index

Unique constraint is (KEY,BLOCK[,VERSION]). SQLite automatically adds ROWID which is actual primary key. TXID will have an index for history queries.

Version maintenance

On a reorg, a new block version (B,V) will be introduced.

Delete/mark all rows where BLOCK >= B .
Drop/mark all tables that were created in BLOCK >= B.

On-demand version maintenance.

Version maintenance will be expensive, requiring operations on all versioned tables in the system. Version maintenance could however be on-demand.

Tables could track when they were last maintained using (B,V). The first in-block operation to occur could test this to see what maintenance needs to be done.

In the example above, consider if the updates/queries happened in block (15,2) instead of right at the reorg. Table would have (10,1) for block version, and see that (11,2) was a fork. Maintenance would occur then.

Need to maintain fork history to apply all forks that might have occurred since table block version.

Advantage of on-demand version maintenance is faster block processing assuming that not all tables are hit in every block.

Disadvantage is unpredictable work; maintaining all tables is possibly a more "even" workload.

DECISION: Attempt on-demand and fall back to "global" maintenance as time permits.

Checkpoint management

Checkpoint begin with (B,H) supplied

Detect version change; run maintenance on user tables as needed. (Note this could on-demand, see note above).
Compute block version (B,V) to put in environment for use in queries during this block.

Checkpoint discard

Might want to use SAVEPOINT in SQLite to discard. This would require more information in begin phase, or always use a SAVEPOINT and simply COMMIT on save.

Checkpoint save

Nothing required here, unless SAVEPOINT is used at which point commit.

In-block operations

Single-key query for key K

select top 1 KEY,DATA where KEY = K [and VERSION != FORKED] order by BLOCK[, VERSION]

Update key K to data D at block version (B,V) with txid T

read row data D' for K as above
write new row KEY = K, BLOCK = B, VERSION = V, TXID = T, DATA = (merge D' D)

Select all valid keys (`keys` in Pact)

Same query as single (without DATA) without TOP 1

Row history queries

Simply query ordered by TXID [avoiding FORKED rows].

Revisit top-level `TVar TestChainwebVersionMap`

This may involve revisiting how ChainwebVersion is converted to and from Text.

Use and validate client certificates in P2P connections

Each p2p network peer has an x509 server certificate. Currently there is no authentication of TLS clients. Not authenticating clients makes the network vulnerable to several attacks when ever a peer makes use of the origin a request. See issue #82 for an example.

pass peer certificates to HTTP connection manager for usage in new connections.
require client authentication on the server side
extract peer-id and host address from client certificate.
the currently implemented protocol allows Chainweb-nodes to use public DNS names with "official" X509 certificates. Those certificates may not support usage for client authentication. If that turns out to be a common case, we either wouldn't accept connections from these nodes, or reject the "origin" information on BlockHeaders and Cuts that we get from theses nodes, so that we won't pull data from them.

The main use case for public DNS names are boot-strap nodes. These nodes may just offer two endpoints: a read-only and point with the public name for bootstrapping and a second one with an
self-signed certificate that is used for querying other nodes as client.

This issue is a prerequisite for #266

Test suite for concurrency bugs

At least the P2P layer and the Sync layer involve non-trivial concurrency. We need good testing for that. Dejafu seems well suited for that task.

Implement Hard Forks

Data structure for local BlockHeader metadata

Some operations require to store local meta-data about block headers.

For instance, for efficient synchronization it is needed to track the origin of block headers which have missing dependencies. Those can be store temporarily in a "staging" area, while missing dependencies are queried from their origin. This information can also be used to track sources of block headers that fail payload validation. Other useful information may include when a block header was first received, added to the database, and passed payload validation.

Nat and and Proxy traversal for P2P connections

What network environments are supported (local IP, public IP, dynamic IP addresses, public DNS names, NAT traversal, VPN, Window proxies with/without NTLM authentication, IPoAC etc.)?

Do we care at all, or do we consider this something that users should take care of?

Note, that Haskells http clients, unlike many standard libraries from other languages, don't support windows HTTP proxies. A work-around (that is common in Haskell network applications) can be the usage of curl as http client, which has reasonable support for windows proxies.