One of the current issues with Hypercore is that a fork in the history is a fatal corruption of the data. This means that peoples' datasets can be destroyed by a botched "key move" between computers.
Another issue is that, because history cannot be rewritten, it's not currently possible to upgrade a datastructure on Hypercore (such as Hyperdb or Hyperdrive). If a breaking change has to be made to the data structure, then the old hypercore has to be replaced with an entirely new hypercore.
To counter-act this issue, @mafintosh and I have been talking about a meta "pointer structure" which provides a level of indirection between the "public URL" and the "internal identifier" of the hypercore. This would make it possible to replace a Dat dataset's internal data structures without changing the publicly-facing URL/key.
Such a data structure might look something like this:
message HypercorePointer {
required bytes key = 1;
required uint16 seq = 2;
}
The key
would provide the ID of a hypercore, while seq
would be a monotonically-increasing value. To update the pointer, the owner of the public-facing URL would publish a new signed HypercorePointer
with a seq
equal to the previous pointer's seq
plus one.
During the exchange for a hypercore, peers will share their latest HypercorePointer
and resolve to sync the pointer with the highest seq
number. (They could continue to sync previous feeds.) The hypercore pointed to would then be synced within the existing swarm & connection.
Implications for apps & consuming clients
The HypercorePointer
makes it possible to change the internal dataset without changing the URL.
When this occurs, the hypercore's data would essentially be reset, and all history could be altered. This is not a trivial event; from the perspective of any consuming application, the hypercore's previous state has been completely invalidated.
If the pointer is updated to fix a fork-corruption, it's likely that the application doing the fix would then try to recreate the last state on the new log. However, a pointer-update will have to be viewed by applications as a total reset, since the destination state can change
To manage this, we would most likely need to surface the HypercorePointer
to the APIs and UIs in some way. @mafintosh explored the idea of calling the seq
of the pointer a "major version" while the seq
of an individual log is the "revision" or perhaps "minor version." This would mean that hypercore-based data structures are addressed by a major/minor version, such as 5.3
.
The semantics of a major-version change, under this scheme, would be "this is basically a whole new dat, so clear any current indexes on it and reindex from scratch."
Thoughts and discussion open!