dominictarr / cyphernet Goto Github PK

License: MIT License

cyphernet's Introduction

cyphernet

secure replicatable tree database.

a README Project

Currently, this is a readme project. Really, cyphernet will be many projects that fit together. Please check the issues to take part in discussions on aspects of this project.

Synopsis

cyphernet is a abstraction of ideas present within various distributed systems such as git, bittorrent, and bitcoin. There are three key aspects:

objects are just binary blobs identified by their hash. This is known as a content addressable database.
hashes inside an (e.g. JSON) object are pointers or "links" to other objects. thus, the database is a tree of hash pointers, called "cypherlinks".
arbitary subsets of any two databases may be exchanged via merkle trees. (merkle trees allow two remote nodes to rapidly compare two sets of objects, and may then replicate by merging their sets)

interesting properties

Database is a tree, not a graph.

It is not possible for a cycle to form within a graph, unless there is a hash collision (highly unlikely). So the graph must be a tree, and it is not necessary to check for cycles.

Hashes always point backwards in time

You cannot create a hash, and then the object that produces the hash. Instead, you must have the object, and then hash it. Therefore cypherlinks always point backwards in time to things that already existed before the document that they are contained within was created.

Immutability, Security, Distributability.

Since every object is referenced by it's hash, it's impossible to change (immutable). If any one changed a piece of data, then it would change the hash. If you have the hash of a document, then you can always verify it. So, if you ask for a particular document ("follow a cypherlink"), then it doesn't matter who you receive that document from!

This is the reverse of the security model of the www. In most network applications, security is implemented by authorizing the connection. But in a cyphernet, you use cryptographic hashes to authorise the documents, but it is not necessary to authorize the connection.

This is rather like "authenticating" your friends by the sound of their voice, not just because the call has come from the correct phone number. You can still verify your friends identity if they call from a different phone, or even if you see them in person.

Replication Partners

The cyphernet is still potentially secure when completely distributed. There is no need to replicate via a central, or known server. However, how to track peers, and initiate connections is out of scope of this project. cyphernet is focusing on datastructures, security, and replication. Network topology is another problem. There are other projects that deal with creating connections between remote peers. peerjs, cjdns and ZeroTierOne

Defining a subset to replicate

The set of documents that define an "application" or "service" could be defined in any number of ways. In git, a repository contains the data for just one project, but you can replicate (checkout) just the branches you require. With a blog, you'd want to replicate the text and images on each post, plus the comments. On a social network, you'd want to replicate your mutual friends.

When two nodes connect, they will exchange a handshake that describes the set of objects they wish to exchange. Then each pair traverses their database for objects in that set. Then the difference between the two sets is found via merkle tree. Finially, only the objects each node is missing is sent over the wire!

How a set is defined is up to the application, but most will involve traversing the tree, or querying the indexed properties.

Users & Authors

Using asymmetric key cryptography it is possible to verify the authorship of documents. A user uploads an object, and then creates another signature object, which links to the first object, the user's public key, and a signature (of the object, with the key)

Signatures, and Keys, be replicated and stored within the database like any other object.

License

MIT

cyphernet's People

Contributors

Stargazers

Watchers

Forkers

micahredding patricktoca opensourceinternetv2 wasserfuhr

cyphernet's Issues

detecting binary blob types

There is a general problem of detecting the type of an object.
json is easy to detect, but it can't do binary.

For example, bitcoin and git would fit into cyphernet,
if there was a simple way to detect how what type of object a blob is,
and parse it correctly.

It may be necessary to interpret the object's type from context.
I.e. we know that a git commit links to a previous commit and a tree.

Also, it may be necessary to use allow different hashes.
I doubt that all these systems use sha1.

use steganography to put public keys inside (profile) pictures

One interesting way to do this would be to upload your public key inside an image, using steganography. (a new sort of profile pic!)

there seems to be plenty of steganography implemented in js

A quick google revealed these:

http://oakes.github.io/PixelJihad/about.html

http://www.peter-eigenschink.at/projects/hideme/demo/

But they'll have to be converted to npm/browserify

oh, on that note - I have had some luck making pull requests to get people to use browserify! way better to get the maintainer to put something onto npm them selves.
At least, post an issue telling them that you have published their thing,
so that they can take it over when they are ready.

Git vs Cyphernet

What are the key differences between Git and Cyphernet?

example: blog

here is how you'd implement a blog like thing with cyphernet.
Not that a blog is that exciting, but it's a simple well understood example,
with a simple structure. I'll build on this example to explain how you'd implement
a microblog/socialnetwork (which is really just a blog + friend feeds)

to begin, the author would publish a blob which is the post.
(this could be just a markdown file)

(this could be shared via an ordinary http url - as mentioned in the readme,
you can get the objects from anywhere, because you can always check that they are valid, so exposing them at myblog.com/HASH would be completely okay,
you could make it fully distributed later, or people could replicate your blog to their servers etc.)

first the user creates a blog post datablob, then readers can create comment blobs
(probably in json format, or maybe markdown with embedded metadata?)
that cypherlink back to the post. If a comment is a reply to another comment, then it
must cypherlink to that comment (which proves that it came after that comment)

Also, it's a good idea if a comment links to the latest comment (or comments)
which proves that comment came after the previous comments.

Note, you cannot delete a comment. So think before you speak.
Maybe you can ask people not to read what you have written. but the information is out there and there is no way to change that.

If the author wants to moderate the blog, they can accept comments
by creating accept objects that list the recent comments, and then only replicating the accepted comments.

Of course, there is no way to stop someone else from publishing their renegade comments via a different app, that may be integrated with your data.

So, essentially - this is the interesting thing - with webpages, the data and the app and the display are all bundled together, but with this idea, there is a data section that is replicated, and a separate app section. it will always be possible to request/replicate the data without the app. maybe in someways this is related to the semantic web

README: Explain differences between Camlistore and cyphernet

At least on the face of it, these projects sound very similar. Camlistore: http://camlistore.org/

secure authorization via asymmetric cryptography

Hashing each object can only get cyphernet so far.
Things become much more interesting when you add asymmetric cryptography.

For example, if you only had a author field with a username in it,
anyone could publish a blog post, and claim it was me.
You wouldn't have to leave your phone unlocked to get a #poopin in your feed,
anyone could just distribute a post with any user name on it.

The solution is for all critical objects to be signed
(if you have a tree it may be acceptable to only sign the top object)

However, I can sign a post with a public key, and then it can be verified to be by me.
Or rather, be made with the device that I control. Subtle distinction here.
You are your device. probably want to make sure it's password protected,
or implanted under your skin, etc.

There is more work/research to do here.

I have read that PGP is basically "what you want" here.
I have looked into this a bit, but need to do more research.

Here is a list of papers
https://gist.github.com/hij1nx/6107937 @hij1nx recommends

Possibly, the best way to do this might be just to use pgp,
But then pgp has a well earned reputation for being "not easy".

And, the various things you'd require for implemented a PGP,
like, storing a bunch of keys, creating links between them (signing them)
and then evaluating them by traversing them is exactly the sort of thing
that cyphernet is to make easy to do. So, probably the best is to
build a PGP with cyphernet.

welcome

Hi,

I know a lot of you are interested in distributed systems, or replication, or are trying to build a peer to peer and/or encrypted/secure social network, or distributed search engine or package manager!

@jez0990 @bigeasy @substack @hij1nx @Raynos @ralphtheninja @rvagg @juliangruber @gwenbell @venportman

@ednapiranha

level-* has taught to optimize for collaboration. see OPEN open source project.

So, I'm just gonna publish my todo list, as issues, because they work well for discussion.
If are interested please click "watch" and github will notify you when someone posts a comment or issue, or if you are working on stuff that may be useful - like crypto, search, traversal, network topology - ah, we'll start a wiki or something.

currently, this is still in a research phase, but since I know you guys are researching too,
we need some, uh, centralized place to focus that research!

Dominic

streaming merkle tree

A while back I implemented https://github.com/dominictarr/level-merkle
I had been meaning to implement a merkle tree for a while, and wish I had done so sooner. Unfortunately, my experience with scuttlebutt and level-replicate had blinded me to the strengths of merkle tree based replication.

The strength of scuttlebutt replication is that it's good for real time data, and can do peer-to-peer, but it has security problems...

@hij1nx was essential in getting me interested in the problem, as he was working on a merkle tree thing too. level-merkle was as simple as possible,
and can only replicate an entire database. However, the most facinating strength of merkle tree replication is that you can replicate subsets.

So, level-merkle needs to be reimplemented, and heres how it needs to work.

take an ordered stream of hashes,
and compute hashes of tumbling groups of hashes recursively.

1a1  ---\
1a2     | hashes that start with 1a --,
1a3     | hash(1a...)--\                                              
1a4  ---/              | hashes that start with 1...
1b1  ---\              |-----------------\
1b2     | hash(1b...)--/                 |
1b3     | hashes that start with 1b      | hash(*)
1b4  ---/                                | The "Top Hash"
                                         | all the hashes hashed together.
other                                    | 
hashes ----------------------------------/

Hashing is pretty fast, so it probably wont be a problem to calculate this on the fly,
as long as the traversal is fast. I can hash a 350 mb file in 1.5 seconds,
if that file was only 40 character hex shasums, that is 350e6/ 40 = 8.75e6 ~ 9 million.
that means the tree for even rather large sets could be calculated very easily.

In the current level-merkle, the hash is stored in the database, and recalculated async,
but I think that is not necessary in most cases. instead, it is much simpler to recalculate every time by default, and possibly support materialising the tree for nodes that must replicate at high frequency.

I know @substack and @Raynos is also interested in this.

traversals

Traversing a tree is a bit more complicated than just Array.forEach.
In some cases, you may want to traverse links that point in either direction,
"documents links that link to X". In that case, you are dealing with a graph.

Ideally, you'd describe a traversal with some sort of canonical expression,
like, "every post object reachable from X" or "all posts by user X and all approved comments"

Then, this traversal can be used to describe a dataset! two users would exchange this description (or it's hash) to indicate what dataset they are interested in,
and then replicate their sets.

here a few ideas:

procedural javascript / javascript dsl. Just pass some javascript code which you execute to produce the traversal. this will need to be sandboxed and sanitized.
logical description of relations! see learndatalogtoday.org
I think this has a lot of potential, and it more powerful than SQL.
@mcollina's levelgraph this already has some ways to query a graph, not sure if it can do recursive queries, though. @mcollina?

Another simple approach would be to just use indexes, like, https://github.com/dominictarr/level-search or @eugeneware's github.com/eugeneware/level-queryengine

couchdb style versioning

couchdb actually uses a pretty similar way to do revisions

http://guide.couchdb.org/draft/conflicts.html#deterministic

using an md5 hash plus a revision number. you can calculate what the value will be before you insert the document, but couchdb must insert it for you.

It's actually rather like how git commits work, except that all the data is stored within the commit message and there is no tree or blobs.

This is a very simple approach suitable for when you want to track changes within a single "document" independently from other "documents".

I'd do it more like git though, with _prev: [previous_hash] so that it's possible to represent merges if there are two independent updates to a document.

Also, dynamo has a similar system, except using vector clocks. Also, when you request a document, you get all the un-resolved versions. I like this system because it forces you to resolve conflicts, instead of making it easy to ignore they can happen.

example: bitcoin blockchain

Cypherlinks can be links to any kind of blobs and doesn't necessarily have to be human readable documents such as blog posts.

A blob might as well be a block in the bitcoin blockchain, which essentially is just a single linked list where each item in the list contains transaction data and a pointer back (which is the hash of that block) to the previous item in the list.

Each transaction is also identified with a hash, which turns the single linked list into a tree where one leaf is the latest block in the chain and the rest of the leaves are transactions. So the core blockchain can be looked at as the stem of a tree, with the genesis block as the root. Or perhaps more like a bean stalk growing up into the sky :)

Naturally this can be used for any type of crypto currency using a similar system and not just bitcoin, for example litecoin or feathercoin.

Similar idea in russian internet

Here another idea to build network like a torrent based with addresses as data-hashes.

http://translate.google.com/translate?hl=en&sl=ru&tl=en&u=http%3A%2F%2Fhabrahabr.ru%2Fpost%2F193886%2F

"A New Way to look at Networking" 2006 talk by Van Jacobson

Worth watching, if you haven't already.

https://www.youtube.com/watch?feature=player_embedded&v=8Z685OF-PS8

This is about named data networking or content centric networking, which seems to be the same thing or very similar to cyphernet. Are there differences between cyphernet and CCNx? If not, it may have been implemented already at Xerox PARC. ...will research whether the implementation is usable.

Replacing e-mail?

I imagine that this protocol would be eligible for creating an e-mail replacement.

E-mail is quite asking for new features which you get for free under this protocol:

Native (without GPG) authentication of the sender, and encryption of the message
Authentication of the whole conversation. If you are replying to an e-mail, there would be a link to the previous message, and there would be no reason not to have one.
No problem accessing messages on untrusted networks
No passwords, just a private key you can keep around

As I understand it, this protocol allows an e-mail to refer to previous messages on a conversation without copying the text, which has the hassle of the previous message being stored onto the current one (possibly modified!).

"addresses" could be SHA hashes of pubkeys, so if you are paranoid you can exchange public keys and be sure you are always talking to that person.

Would this protocol be feasible to create a new way to send electronic messages?

freenet project

Btw did you hear of these guys?
https://freenetproject.org/understand.html

idea for creating user identities

All users have identities on many popular, centralized online platforms.
With many of these services, it's possible to log into other services with a primary service.

PGP has a totally different approach, you have to actually meet, and then verify someone's identity
and then you can sign their key. Did I mention that not many people use PGP.

However, really, in the online world it's these centralized services that are the arbiters of identity.

so: here is how you could bootstrap decentralized identity of these centralized services:

Sign a claim that you are a given identity, and then upload it to a place where only that centralized identity can access.
like https://github.com/USERNAME/pubkey if you do not have control over the URL (like on twitter, you don't know the id for a tweet until twitter decides) then you'll have to create another object that points to url for the claim and sign it too.

Then other users can automatically check your identity via the URL and sign it if they verified it.