As Pact has distinct user tables and a structure that is already very amenable to versioned history, propose to implement checkpoints directly in SQLite as opposed to (a) copying SQLite DB files (b) using BTRFS or some filesystem solution (c) using RocksDB.
Why not RocksDB
RocksDB is suggested as inferior for Pact mainly because the need to have so many user schemas requiring extensive keyspace operations. Also, SQLite actually outperforms RocksDB for single-threaded indexed queries, see https://github.com/facebook/rocksdb/wiki/Performance-Benchmarks at ~50us where Pact has seen db updates in the <20us zone.
As for SQLite size limitations, they are actually not small and will probably suffice for many years, see https://www.sqlite.org/limits.html. However this design proposes that multiple connections can be used, where connections own some count of user tables. This can be introduced later as long as there is design support for it now.
Note lastly that RocksDB snapshots are not persisted to disk so that solution is not usable.
Lastly, this design actually leverages SQLite's performant indexing to leverage relational SQL as the versioning mechanism, as opposed to previous SQLite Pact usage which looks more like a key-value store. Indeed the Pact language just wants a journaled key-value, but this solution will handle reorgs "relationally".
Overview
The main notion is of a version corresponding to a fork that will be used along with block height to determine the latest version of a key, and to label entries as "forked" (or simply delete them) when a reorg occurs.
Example user table use
The following example will be used to illustrate the design. A user table will go through the following history. Row data will be represented with an arbitrary number. "Version" concept is detailed elsewhere but corresponds to a reorg history.
key |
block |
version |
data |
notes |
a |
10 |
1 |
123 |
Version 1 represents current reorg/fork |
b |
10 |
1 |
234 |
|
c |
11 |
1 |
456 |
Reorg below replaces from here |
a |
11 |
1 |
124 |
|
d |
12 |
1 |
567 |
|
c |
12 |
1 |
457 |
|
c |
11 |
2 |
460 |
Fork/reorg to Version 2 |
b |
12 |
2 |
240 |
|
Thus, a table scan at block 12 version 1 just before the reorg should return:
key |
block |
version |
data |
a |
11 |
1 |
124 |
b |
10 |
1 |
234 |
c |
12 |
1 |
457 |
d |
12 |
1 |
567 |
A table scan at the end, post-reorg, should return:
key |
block |
version |
data |
a |
10 |
1 |
123 |
b |
12 |
2 |
240 |
c |
11 |
2 |
460 |
Version detection
Block validation supplies a stream of (block height [B],parent hash [P])
pairs. A version indicates the non-forked behavior of this stream, such that receiving a monotonically-increasing dense stream of block heights indicates a single version.
When a block height arrives that is less than the expected next value, the version (V) will increment and version maintenance operations will occur.
System will need to have a central version history table ordering all (B,P)
pairs and associating a version. Re-orged pairs can be discarded. The "HEAD" version is maintained in memory and can be recovered from the version history table.
The "block version" is the pair of (B,V) as seen in the examples above.
Table management
System will track all versioned system and user tables -- versioned system tables would include the Pact continuation table and refstore; user tables includes coin contract table. Tables should be associated with a database connection/file with some limit on how many tables should be in a file/conn. This initially can be a single connection but we want the design to support multiple connections. The central system tables can be in their own connection or share the first connection.
Tables will also be associated with the block version when they were created. Tables can optionally be dropped when reorged, or marked as old/invalid; table name in database can include the block version if desired. See "Deleting vs Marking" below.
Per-table versioning and history
Table operations will support reorgs by tracking versions directly in the relational schema. Queries will leverage indexes and "SELECT TOP 1" queries to find the latest key.
Pact history requests will no longer need dedicated "transaction tables". 'txid' will be stored as a relational column. Note that Pact will no longer track multiple updates in a given txid for a key as this is (a) not a good practice and (b) not relevant to larger history.
Deleting vs Marking on version changes
This solution supports either (a) marking deactivated block versions by updating the version column with a FORKED
code, or simply deleting the row. Likewise, forked table creates can either associated the table as FORKED
or simply drop the table.
Deletion has the advantage of space compaction; marking can help with troubleshooting. While supporting both as a configuration option might be nice it will also slow development.
Decision: use deletion. Descriptions below will nonetheless use FORKED to indicate marking option.
Versioned table schema
All versioned tables will have the following schema:
Name |
Type |
Note |
Index |
KEY |
String |
User key |
Non-unique |
BLOCK |
Int |
Block height |
Non-unique |
VERSION |
Int |
Reorg version |
|
TXID |
Int |
Transaction ID for history |
Non-unique |
DATA |
JSON |
User data |
No index |
Unique constraint is (KEY,BLOCK[,VERSION]). SQLite automatically adds ROWID which is actual primary key. TXID will have an index for history queries.
Version maintenance
On a reorg, a new block version (B,V) will be introduced.
- Delete/mark all rows where BLOCK >= B .
- Drop/mark all tables that were created in BLOCK >= B.
On-demand version maintenance.
Version maintenance will be expensive, requiring operations on all versioned tables in the system. Version maintenance could however be on-demand.
Tables could track when they were last maintained using (B,V). The first in-block operation to occur could test this to see what maintenance needs to be done.
In the example above, consider if the updates/queries happened in block (15,2) instead of right at the reorg. Table would have (10,1) for block version, and see that (11,2) was a fork. Maintenance would occur then.
Need to maintain fork history to apply all forks that might have occurred since table block version.
Advantage of on-demand version maintenance is faster block processing assuming that not all tables are hit in every block.
Disadvantage is unpredictable work; maintaining all tables is possibly a more "even" workload.
DECISION: Attempt on-demand and fall back to "global" maintenance as time permits.
Checkpoint management
Checkpoint begin with (B,H) supplied
- Detect version change; run maintenance on user tables as needed. (Note this could on-demand, see note above).
- Compute block version (B,V) to put in environment for use in queries during this block.
Checkpoint discard
- Might want to use SAVEPOINT in SQLite to discard. This would require more information in begin phase, or always use a SAVEPOINT and simply COMMIT on save.
Checkpoint save
- Nothing required here, unless SAVEPOINT is used at which point commit.
In-block operations
Single-key query for key K
select top 1 KEY,DATA where KEY = K [and VERSION != FORKED] order by BLOCK[, VERSION]
Update key K to data D at block version (B,V) with txid T
- read row data D' for K as above
- write new row KEY = K, BLOCK = B, VERSION = V, TXID = T, DATA = (merge D' D)
Select all valid keys (keys
in Pact)
Same query as single (without DATA) without TOP 1
Row history queries
Simply query ordered by TXID [avoiding FORKED rows].