matrixai / polykey Goto Github PK

View Code? Open in Web Editor NEW

29.0 8.0 4.0 19.45 MB

Polykey Core Library

Home Page: https://polykey.com

License: GNU General Public License v3.0

JavaScript 0.10% Nix 0.06% TypeScript 97.89% Shell 0.29% HTML 1.59% PowerShell 0.07%

vaults keymanager secrets share-secrets pgp decentralized end-to-end-encryption p2p tls local-first

polykey's Introduction

What is Polykey?

Polykey is an open-source, peer-to-peer system that addresses the critical challenge in cybersecurity: the secure sharing and delegation of authority, in the form of secrets like keys, tokens, certificates, and passwords.

It allows users including developers, organizations, and machines—to store these secrets in encrypted vaults on their own devices, and share them directly with trusted parties.

All data is end-to-end encrypted, both in transit and at rest, eliminating the risk associated with third-party storage.
Polykey provides a command line interface, desktop and mobile GUI, and a web-based control plane for organizational management.
By treating secrets as tokenized authority, it offers a fresh approach to managing and delegating authority in zero-trust architectures without adding burdensome policy complexity - a pervasive issue in existing zero-trust systems.
Unlike complex self-hosted secrets management systems that require specialized skills and infrastructure, Polykey is installed and running directly from the end-user device.
It is built to automatically navigate network complexities like NAT traversal, connecting securely to other nodes without manual configuration.

Key features:

Decentralized Encrypted Storage - No storage of secrets on third parties, secrets are stored on your device and synchronised point-to-point between Polykey nodes.
Secure Peer-to-Peer Communication - Polykey bootstraps TLS keys by federating trusted social identities (e.g. GitHub).
Secure Computational Workflows - Share static secrets (passwords, keys, tokens and certificates) with people, between teams, and across machine infrastructure. Create dynamic (short-lived) smart-tokens with embedded policy for more sophisticated zero-trust authority verification.
With Polykey Enterprise, you can create private networks of Polykey nodes and apply mandatory policy governing node behaviour.

Installation
Development
- Docs Generation
- Publishing
License

Installation

NPM

npm install --save polykey

Development

Run nix develop, and once you're inside, you can use:

# install (or reinstall packages from package.json)
npm install
# build the dist
npm run build
# run the repl (this allows you to import from ./src)
npm run ts-node
# run the tests
npm run test
# lint the source code
npm run lint
# automatically fix the source
npm run lintfix

Docs Generation

npm run docs

See the docs at: https://matrixai.github.io/Polykey/

Publishing

# npm login
npm version patch # major/minor/patch
npm run build
npm publish --access public
git push
git push --tags

License

Polykey is licensed under the GPLv3, you may read the terms of the license here.

polykey's People

Contributors

Stargazers

Watchers

Forkers

daotlresearch omuumm20700 addievo groundbasesoft

polykey's Issues

Master Key Encryption or Missing

Upon creation of a Polykey keynode, we always generate a master key. Every keynode should have a unique master key as that represents its digital identity. This is because keynodes are participating in a P2P network, and so keynodes should have unique identities.

The user must be offered to encrypt the master key with a master password, or be given the master key, in which case the keynode state no longer keeps around the master key. This means the user must either supply the master password to decrypt the master key, or supply the master key for authentication.

EFS asyns calls may not be truly async

As of now, the async calls in the EFS have been wrappers around the sync functions. The sync functions implement all the logic to carry out the the corresponding functionality of the corresponding function and its async counterpart would util process.nextTick(...) to make it async.

However, calling process nextTick() may only push the top level sync function to the event queue. Imagine if the sync function calls further sync functions, when the top-level sync function runs, there is nothing to enforce the nested sync functions run in an async manner.

For example, the efs.writeSync() calls fs.writeSync(). If the above hypothesis is true, then a call to efs.write() would schedule efs.writeSync() to run async'ly but when it actually run, it would still be calling fs.writeSync() thereby blocking the main thread. In particular for large writes there will be a performance penalty. This is an issue because efs.write() is not actaully truly async.

To test this we can use a snippet:

process.nextTick(() => { //A
  fs.writeSync(50MiB);
  console.log('finish write');
});

process.nextTick(() => {  // B 
  console.log('async console log);
});

console.log('sync console log) // C

---

fs.write(50MiB, () => {  // A
  console.log('finish write');
});

process.nextTick(() => {  // B
  console.log('async console log);
});

console.log('sync console log) // C

If the above is true, then we would expect the order of completion to be C -> A -> B in the first instance; C -> B -> A in the second.

If it turns out that the second one is correct performance, then that means sync efs needs to call sync functions, but async efs needs to call sync functions from VFS, but async functions from native fs.

Exposing different interfaces for use of secrets within polykey

We need to figure out how to expose the secrets contained within polykey for use in difference contexts.

Some potential interfaces are:

clipboard
stdout,
a file descriptor
output to file (or just pipe to file/redirect to file)
"env variable" injection
http

Initial Key Generation is slow - progress hooks

The initial RSA key generation for a key-node setup can sometimes take 20+ seconds. We should integrate some kind of progress display in our CLI and UI applications. kbpgp lets you do this like this:

const my_asp = new kbpgp.ASP({
  progress_hook: function(o) {
    updateSomeProgressBar(o)
  }
});

const opts = {
  asp: my_asp,
  userid: "[email protected]",
  primary: {
    nbits: 4096
  },
  subkeys: []
};

kbpgp.KeyManager.generate(opts, some_callback_function);

CLI app architecture

I'm thinking of using the following libraries in building the CLI app:

For building the core of the CLI:
https://www.npmjs.com/package/oclif

For text styling:
https://www.npmjs.com/package/chalk

For text output and formatting:
https://www.npmjs.com/package/winston

There were some other contenders such as commander and yargs, both of which have a much larger user base than oclif, which I'm mainly putting down to having mature codebase.

The reasons I decided to go with oclif are:

It's modern
Built by Heruko (Heruko CLI uses oclif)
Coded in typscript so it will integrate well with our codebase
Good multi/sub-command support
Clean api and good docs

Secret Management - Subcommands to interact with Secrets

After a vault is created. We should be able to get/put secrets into the filesystem.

If we were able to expose the filesystem a real FUSE filesystem on Unix systems, then we would be able to make use any Unix CLI tool to create secrets. However I think this feature can be pushed till later, since the CLI is just a prototype while we focus on releasing a GUI as well (and in the GUI we'll end up translating GUI actions into low level commands that doesn't even touch the real filesystem anyway).

This is why we end up with a few basic commands to put secrets in and get secrets out. Almost like FTP in a way.

Note that vaults are always the root path. So /a/file means we are referring to a vault address. However since many commands need to refer to the real filesystem and the polykey filesystem, then we need some way to disambiguate. We can do something similar to AWS CLI with a special protocol used.

Note that we can do all these operations within 1 keynode. But interactions between keynodes is different as that means we are using the vault sharing system.

# copying a file from one place to another (within the same keynode)
polykey secrets cp pk://a/file ./file
polykey secrets cp ./file pk://a/file
polykey secrets cp pk://a/file pk://b/file
# copy from stdin
polykey secrets cp - pk://b/file
# copy to stdout (same as cat)
polykey secrets cp pk://a/file -
polykey secrets cat pk://a/file # you can use this to pipe the file contents to other commands
polykey secrets mv pk://a/file ./file
polykey secrets mv ./file /a/file
polykey secrets rm ./file pk://a/file
polykey secrets ls pk://a/directory
polykey secrets touch pk://a/file # creates an empty file
# maybe you can launch a subshell like this (then you can run the above commands without rewriting pk s)...
polykey secrets pk://a/file

Note that I'm not sure I like the usage of pk:// to disambiguate path for polykey paths vs the normal paths. It's possible we might be able to make use of another sigil similar to ~ is used to refer to the home directory. Potential sigils for us to use: @, %, +, -. All the other symbols are commonly used for various things in the shell.

The above commands are just sub commands that cover copying, reading, moving, deleting, and listing directories. We can add more into it. We don't actually re-implement all the shell commands in Polykey. Just to cover the basics for now. For more advanced uses, it's better to then present a FUSE interface.

A very important command is the ability to edit file. Now we could do something like use $EDITOR. But the problem is that this expects a real file somewhere. Doing this means the secret has to be leaked onto disk for random access unless we present a FUSE interface. Otherwise you have to pipe the contents to the editor, but then the editor cannot pipe back out to Polykey without some other kind of wrapper. See: https://loige.co/aws-command-line-s3-content-from-stdin-or-to-stdout/ This is a difficult problem to solve, and the GUI won't have to deal with it obviously as it can have its own editor.

Another thing is that we may be able to make use of the multiaddr in libp2p so that means keynodes themselves have a p2p peer ID, but then each vault within should also have a unique id as well. I'm thinking some content-addressed ID but I'm not sure... if that may leak secrets. Otherwise what we end up is instead with some sort of unique id generation that we want to be unique across the network. So you have to ensure that each peer has a unique id (probabilistic via the public key), and then each vault also has a unique id also probabilistic via a public key as well (but if we aren't using public keys then...?). Anyway then you can have a globally unique id for vault and globally unique id for peer. So a peer could have /a vault, and any peer could have that. But one could also then refer to vaults regardless of the peer itself for "vault discovery".

Redundant events in commit messages

As of now, the logs of commit messages are formed with the granularity of filesystem level changes. That is, Polykey logs any events deemed a change of state according to the filesystem, e.g. creation, modification, removal of a file...

But consider the case where a file: foo.txt contains the text:

bar

You can open this file and make no modifications and write the file again; or delete the entire contents, save it, and write bar again. The mtime of file will change and the filesystem will detect this as a change and as a result Polykey will also log this change despite the fact there is no change with respect to a secret's state.

So we need to look into log deltas based on secret/file granularity. Git maintains a cheksums for each file's contents in the working directory, staging area, and repository. It compares the checksums to determine if a file has been modified. Perhaps we can do something similar.

Synchronous vs Asynchronous API

The JS that we write can be written synchronous style or asynchronous style.

When interacting with virtualfs, the easiest to use is simply the synchronous system, since there's no IO and so there's no need to wait on anything.

However for the EncryptedFS we will need to perform IO to write the file to disk. Which means if we used synchronous style, it will block the entire JS runtime. This may be complicated if the UI system is running against the same loop.

Which isn't a problem for CLI applications most of the time. And even for things like Electron/React Native/NativeScript, there's a separate UI loop compared to the backend code.

So we may get away with writing it all synchronous. But it's probably a good idea to think about how to use asynchronous, or support both APIs just like how js-virtualfs exposes both async and sync APIs.

Git Communication over a gRPC connection

Currently we have the git server working over an insecure http connection (using nodejs http lib) but we want to eventually have the git server working over an existing connection that has already been secured between two peers.

I have implemented a ConnectionManager that creates a single secure connection between 2 peers (e.g. server from peer A and socket from peer B) using mutual TLS, but the question now is how do we make the git server use that connection?

The request (client connection) can be done fairly easily using the createConnection option. The server has been more challenging and that is what this issue is meant to address.

grpc-js: No connection established

This error is a little hard to decipher but it occurs when you try to run the server without also running the CA server that they were issued from. Best way to fix is to re-issue the ssl certs with ./scripts/generate_ssl_certs.sh.

I shall put something in the README along these lines in the upcoming agent PR

Multithreading

After seeing that key generation can take 30 seconds, and that kbpgp is async API based. I have realised that we will probably need multithreading to make this fast. Node now seems to have 2 solutions to the multithreading problem:

It seems that the library is matching the webworker API maintaining some compatibility with browser based web workers (although this may not be a big deal). The problem is though is if we want to make this work in IOS and Android we have to consider how this will be impact it if we are reliant on these Node specific features. See: https://www.nativescript.org/faq/how-does-nativescript-run-my-javascript-code

Vault sharing via Git

We want to harness git's abilities to push and pull repositories (i.e. secret bundles). We will only want to implement a pull based model for now. But want to ensure the the keynode pull the repository has is authenticated to access that secret bundle. This can be implementing using PGP. The node that owns the keynode can encrypt the bundle with requesting node's pub key and the node pulling the bundle will eventually get access with the corresponding private key. Bundles should also be tagged with keys. For example a key tag should list all the bundles currently shared with that node.

Concurrency read/write for vault operations

We need to integrate concurrency into our git operations on vaults. Write operations must be FIFO but can be asynchronous to the user. Read operations can be parallel.
With the use of a callback pattern, we can also take advantage of transnational commits.
Some examples of concurrency libs in node.js are async-mutex and live-mutex. The former is pure typescript. Both can be used with either callbacks or promises. Also, this article provides some background on the latter.

Related to #51

Peer to Peer Architecture

Inspirations:

https://github.com/sit-fyi/sit
Dat P2P
Libp2p (js libraries available)
Kademlia

The peer to peer architecture is the architecture in which Polykey Keynodes can discover each other and connect to each other. Note that communication between Polykey keynodes happens over this p2p network. But social discovery occurs via social providers like Keybase which facilitate discovery of human profiles to automated keynodes.

Defining 'random access' in polykey

Traditionally, random access implies that granularity at the block level is available when accessing files. However, in Polykey files will be encrypted using pgp which does not offer this definition of random access. This means accessing any chunk of the file will warrant the entire being deciphered. This is okay since for the moment Polykey will mainly be dealing with small files.

From a different perspective, it does offer random access, but on a granularity of files. Compared to representing a secret bundle as a tar archive, which offers no random access, the current implement where a secret bundle is a directory of encrypted secret still offers accessing individual secrets without the need to decrypt everything the in the bundle.

In future, there is scope to allow random access on block level. One potential implementation is if the filesystem (i.e. the secret bundle) is represented as a file. The contents of the bundle can be block mapped, similar to traditional filesystems which are also stored in a one dimensional space (hard disk).

Direct keybase integration - signup/login

Obviously a user of PolyKey can just initialise a new key-node and have it generate a new digital identity (pub/priv keypair) that is not linked to a social identity. But social proof for PolyKey is a huge opportunity.

We can come up with a way to directly integrate keybase's API for signup and login in order to link a users keybase account with a particular keynode.

I tried to follow the proof-integration-guide but I realised that this is more for a centralised service like example.com and wouldn't work for our purposes. One could potentially have a centralised public relay like polykey.com like zerotier does, but I think we can get away with just using keybase's HTTP API for now.

This should have it's own PR after the next few (atm its looking like peer discovery, git vault control and git network vault sharing PRs first).

Experiments with node-tar to use js-virtualfs and random access

The node-tar is the best tar library in the Node ecosystem.

However it was designed with 3 things:

Asynchronous IO
Streaming interface
Interacting with a real fs
Compatibility with loads of tar options.

Since the tar archives that we are manipulating in-memory, we do not need async IO, we don't need streaming since all creation and extraction will be synchronous (even though real tar is actually streaming), and we want to interact with our js-virutalfs. We also don't care about all the tar options and the pax spec. We just want simple ustar implementation of creation and extraction and random access.

We also need to integrate random-access into our tar archives. So to be exact, we are compatible with the tar spec, but we include at the end of file tar index based on the go-rat repository (https://github.com/mcuadros/go-rat).

I have forked node-tar and investigat[ed its source code and also the ustar spec.

The creation function is very abstracted based on custom stream implementation.

The real meat of it is in the lib/write-entry.js that utilises the header.js. These are the things that really write the bytestrings into the resulting tar archive for any given file entry.

Once we have ripped out the appropriate things, our own "tar" library will be much more simpler and geared towards to in-memory creation of archives. In fact the entire archive can just be a virtual tar, that can be serialised to disk. And represent a "memory mapping" of tar archives. Or our in our case, a virtual tar is the in-memory lazy unpacked representation of the packed tar archive (which can be on disk).

Note that we also don't care about compression. So we won't even have that. This is pure tar. Compression only is relevant for disk purposes. Like real disk. But our situation is all in-memory. But remember we are taking tar archives, being able to extract them into memory on random access.

Tar format: https://github.com/libarchive/libarchive/wiki/ManPageTar5

Investigations into a Random Read and Random Write State Substrate for Keynodes

Ideally we would want to be able to change secrets (that is write more secrets to a secret repo, remove secrets from a secret repo, or even change the size of a secret file) randomly without doing linear operations like re-encryption and re-serialisation of archive formats. This may even help secret sharing to be more efficient in the future.

https://en.wikipedia.org/wiki/Disk_encryption_theory

Shared access to vault key violates secrecy

Imagine two key nodes A and B have synced a particular vault. Now A wants to update/add secrets and does not those secrets to be shared. Since the vault key for this vault is still the same, if B manages to somehow get the encrypted, updated version of the vault it will be able to decrypt it. We need to avoid this issue.

This can be done by never sharing the vault key. There is no need. A vault key should only be used to for encryption of secrets at rest for a particular keynode and only that keynode. This means each keynode will maintain its own private vault key (still symmetric) for each vault.

But how will a keynode be able to decrypt the vault on synchronisation? It won't. We are using git for transmission occurring the upper dir, which is sandboxed and secure. The transmission channel iteself it secure using tls. The vault is now in the recipients upper dir, once again secure. So at no point during the transmission of the decrypted vault from upper dir to upper dir, is the security violated. Once in the upper dir, the recipient can encrypt it with it's own vault key and persist it.

Social Discovery Network

Related to #32 but instead focused on the human network instead of the machine network. Keybase now offers the ability for arbitrary services to become to integrate social identity proof to keybase: https://keybase.io/docs/proof_integration_guide

Other social networks may also be possible.

Also relevant discussion: https://news.ycombinator.com/item?id=19639297

Data Integrity and Authentication

It would be desired if we can offer guarantees to some level that the encrypted data has not been tampered with.

Some possibilities would be using HMAC or cipher modes such as GCM with built-in data integrity mechanisms.

IPC communication between agent and client (CLI/GUI)

The next problem to figure out is how the client and agent, it looks like there are a couple of ways to achieve this:

Via gRPC
Via NodeJS net module which has support for windows and unix

Discovery between the agent and the client also needs to be figured out, here are some ideas:

Rely on a shared path (this shared path must be user-specific) like /run/user/1000/polykey/socket (this only exists on Linux, find equivalent on Mac/Windows).
Rely on a unix domain socket which is also on a shared path that is user-specific. This relies on a shared path existing.

Using UDS is really about getting access to the OS security capabilities to securing the comms. The nodejs TLS module can also do UDS as its an abstraction over the net module, you just have to specify the path in tls.connect. I'm just not sure about the server and tls.

Automatic Commits for Vault Changes

Changes to any vault should involve an automatic commit to our Git version control system.

We should try to make this system write sensible messages. And we need to carefully manage the Git commit history and consider issues like diverging histories and merge strategies dealing with incompatible histories. We need to make this as automatic as possible so that end-users do not have to do a manual merge like they would for source control.

Issue regarding redundant commit messages: #5 - we may need to be able to transactionally modify things. So edits to multiple files should be 1 commit if they represent the same change in secrets.

Initial experiment here: aa8d4ae

Password Generator

Password suggestion/generator is useful for different things. One is for Polykey secrets itself. But another is just generating secrets within the constraints of external systems.

For Polykey secrets, we should use bip39: https://github.com/bitcoinjs/bip39

Peer Store for each Polykey node

we need to figure out how to layout our storage for peer information in order to keep track of things like which peers are connected, their accepted protocols/address etc. We can draw on inspiration from libp2p: https://github.com/libp2p/js-libp2p/tree/0.28.x/src/peer-store and hyperswarm: https://github.com/hyperswarm/hyperswarm
For now I think something simple and extensible is appropriate.

Passphrase Strength in Key Generation

I think we should throw an error for a weak passphrase passed into the kbpgp key generation. We can validate password strength with zxcvbn. They define anything under a score of 2 to be somewhat guessable (guesses < 10^8) so I think its reasonable to error if the user provided passphrase is below this.

Note: this issue is solved in the upcoming PR #35, but I thought it best to document our decision here.

Vault Integrity

A user can is only allows to access the content of a secret bundle if and only if they can have the ability to decrypt everything inside a secret bundle. This is because a secret bundle is the unit of sharing.

Merkle trees have the property that if a key can decrypt a node in the tree, it can also decrypt all subsequent children of that node. Thus, is a user can decrypt the root node of merkle tree which represent a secret bundle then they are able to decrypt the entire secret bundle and can then be granted access to that bundle.

After roughly commit 58, isomorphic git fails to commit again

I am getting a strange error with isomorphic git, namely:

Invalid checksum in GitIndex buffer: 
expected 0000000000000000000000000000000000000000 but saw b858b0f102cb2dcb25823c1ac7a10796b9a283d6

This only occurs after roughly 58 commits and is reliably failing on the 58th commit. I've tested isomorphic git with normal fs and it doesn't seem to be an issue:

import os from 'os'
import fs from 'fs'
import path from 'path'
import git from 'isomorphic-git'
import crypto from 'crypto'

/**
 * Returns a 5 character long random string of lower case letters
 */
function randomString(): string {
  return Math.random().toString(36).replace(/[^a-z]+/g, '').substr(0, 5)
}

async function main() {
  // Initialize new repo
  const tempDir = fs.mkdtempSync(path.join(os.tmpdir(), 'checksum-test'))
  await git.init({
    fs,
    dir: tempDir
  })

  for (const n of Array(1000).keys()) {
    // Write random number of random bytes to file
    const randomNumber = 1 + Math.round(Math.random() * 5000)
    const randomBuffer = crypto.randomBytes(randomNumber)
    const randomFilename = randomString()
    fs.writeFileSync(path.join(tempDir, randomFilename), randomBuffer)

    await git.add({
      fs,
      dir: tempDir,
      filepath: randomFilename
    })
    // commit file
    await git.commit({
      fs,
      dir: tempDir,
      message: randomFilename,
      author: {
        name: 'Test'
      }
    })
  }

}

main()

So it seems more likely to be an issue with our implementation and use of isomorphic git and not with isomorphic git itself.

npm release in gitlab-ci doesn't work

I am getting the following error when trying to deploy automatically with npm publish:

npm ERR! code E404
npm ERR! 404 Not Found - PUT https://registry.npmjs.org/js-polykey - Not found
npm ERR! 404 
npm ERR! 404  '[email protected]' is not in the npm registry.
npm ERR! 404 You should bug the author to publish it (or use the name yourself!)
npm ERR! 404 
npm ERR! 404 Note that you can also install from a
npm ERR! 404 tarball, folder, http url, or git url.

Colliding paths

There are colliding paths in the dist directory because on Windows, its filesystems are case insensitive.

warning: the following paths have collided (e.g. case-sensitive paths
on a case-insensitive filesystem) and only one from the same
colliding group is in the working tree:

  'dist/Polykey.js'
  'dist/polykey.js'
  'dist/Polykey.js.map'
  'dist/polykey.js.map'

This should not be done. It should either be Polykey.js or polykey.js. I think the lowercase makes more sense here if you're producing a library. But also within dist should be dist/bin for the CLI executables.

Prevent webpack from compiling lib code thrice for agent/cli/lib

Webpack currently has three outputs: a file each for the lib, cli and agent. The core lib code is compiled in all three of these outputs and that is unnecessary. we need to find some way to separate them out.
Here is a useful gist: https://gist.github.com/sokra/1522d586b8e5c0f5072d7565c2bee693

Encryption of vault keys

Each Vault will have it's own individual symmetric key for aes encryption. This needs to be secret. We also need to determine where to store the encrypted vault key so it can be retrieved easily when encrypting and decryption secrets.

To tackle the first problem, the vault key can be encrypted with the Keynode's public key so only it can only be decrypted with it's passphrase protected private key. Another option is to simply have a password derived symmetric key which will encrypt the vault key. This key need not be stored anywhere it can be recreated each time from the user's password and the salt which can be persisted upon creation of the keynode.

As to where the vault keys will be stored. Each Vault can have a hidden '.vault' folder at the root which can contain the encrypted vault key as well as other metadata.

Typescript and webpack migration

This library needs to be migrated to typescript to supersede the flow type annotations and start using webpack. This is relevant to PolyKey Issue 33.

NAT Busting for keynodes behind NAT layers

We need to be able to connect two keynode pairs that might both be behing NAT layers and not exposed via public IPs.

The way I see it there are two options. We could either incorporate NAT traversal options directly into polykey (UDP hole punching, peer circuit relays, git sharing over ssh port forwarding?).

The other way is to assume all polykey nodes are discoverable on the same virtual network and provide mesh capability like ZeroTierOne does. We could even set up our own ZeroTierOne controller (instructions) and provide it as a public service, or incorporate it some how into the MatrixOS as a system service. I think the latter will end up being useful for other parts of MatrixOS.

Spec out the API for js-polykey

We need to spec out the function types for js-polykey and the data structures that we are managing.

I've written notes on this in the Polykey repository, but they are all over the place and involve some older ideas. Hopefully @ll-aashwin-ll can clean them up and a better spec.

https://github.com/MatrixAI/PolyKey

See README.md, README2.md... etc.

We can discuss the spec here and eventually write them into the code. This js-polykey can be the first reference implementation.

Secret Vault Schema

Secret bundles (or whatever we shall call them), are just basically a basic filesystem. But to ensure we can have structured access, a schema must be applied to them.

Why would we want structured access? Well the fact that it is a filesystem and not just some binary blob already imposes some structure. The structure of files, directories, symlinks, hardlinks, permissions... etc. Actually we want to make this more specific. Properties like metadata and permissions of the files must be stripped when they are put into a secret bundle, they just don't make sense in the context of what we need. Even hardlinks might not make sense. However files, directories and symlinks should be fine. (Except on Windows). We can assume symlinks simply because we always work on our secret bundle in-memory via an in-memory filesystem (as provided by js-virtualfs). For other implementations they will need their own filesystem implementation.

However beyond just a filesystem, one has to consider what kind of files can be put into the filepaths. What file paths can exist, what are their names. If we put additional structure into it, we can offer more constraints over the "shape" of the data which gives any external consumers using Polykey more confidence/guarantees/type-safety over the key-values/info/secrets that they are putting into the secret bundles and taking out of the secret bundles. This means there is less of a need for discovery intelligence in the consumers (as in the consumers don't have to be too smart).

Here's an example. Say that within a secret bundle there is a single file:

/cards.json

The schema is saying that it must be a JSON file. Then we would not allow any consumer using the API to stick a PNG file and call it a JSON file. But even more than this, we can code up a plugin for these schemas. That is a JSON encoder/decoder, which means the API allows structured access the JSON file. It can pass a JSON dictionary, or even access a single attribute via an "attribute path" into the JSON file. Imagine using jq to directly acquire something within that file. Ideally we would not need to have plugins, since this means we are using magic strings like "JSON" to mean that there is something that knows about that format. Ideally the encoder/decoders should be derived from the schema itself. For example see: https://github.com/mozilla-services/react-jsonschema-form it uses the JSON schema and autogenerates HTML forms for React. There's also research in this from dependent types, OMeta, http://blog.rafaelferreira.net/2008/04/couple-of-interesting-dsls.html, https://categoricaldata.net/aql.html, boomerang, https://en.wikipedia.org/wiki/Bidirectional_transformation... etc. Consider this a low priority however.

Here are some examples of filesystem schemas:

There is a relationship between the schema language (and its associated tooling) and the ability to check, model, verify and generate marshalling interfaces (https://en.wikipedia.org/wiki/Marshalling_(computer_science)), also see how people write protobuf interfaces and autogenerate data marshalling libraries for given languages.

Consider all of this a low priority!

Crypto - Keypair Creation

It should be possible to create new key pairs. This can be used by other programs for whatever reason, or Polykey may change the master key of the keynode, but this may also involve re-encryption of any symmetric vault keys being used.

This is a low priority feature. Since many other programs usually allow creation of new keypairs to mean new digital identities. But in Polykey we don't really need to do that, since keynodes do not represent a new digital identity representing a person, but a unique agent that participates a in a P2P network.

But it can look like:

polykey crypto gen-keypair ./key ./key.pub
polykey crypto gen-keypair ./key

PolyKey Node as a Federated Hierarchy of Certificate Authorities (Federation of Trust)

As an initial solution, polykey lib should implement an interface to allow arbitrary CA's to be used for PKI (related to #25 (comment)), however we also want to facilitate a 'web of trust' by allowing polykey nodes to act as certificate authorities.

Git alternatives - propagating changes

We are currently going to use git to propagate changes and version control our secret bundles. The main reasons for this is that it more or less suits our needs and it is well understood system. It will also aid building a prototype quicker. Below is a discussion of possible version control/ propagation styles and some reasoning why Git appears to be the most appropriate solution for Polykey. In the future it is possible that a custom/hybrid solution will be used.

Some background on the qualities/properties of secretes:

Changes to secrets are atomic. There is partial change to a secret. Like in source code. You would never change part of a private key, you would completely replace it, nor would you make rapid incremental changes to something like a password or passport information. That is to say secrets' state have low coupling with their previous state. In this case the usefulness of delta dimishes as the benefit they provide is that is saves storage overhead by sequentially applying the detlas in the order that they happend until the desired state is reached. But since for Polykey when there is a change for a secret, it will most likely be completely new content, the delta will almost be a snapshot.

Snapshots make sense when you have small data size, and if there is low coupling with previous state. Polykey hits both.

Also to consider that previous states of a secret are less likely to be of value to the end user. Unlike source code where you might want to revert to a source that provided a stable stable, a previous state of a secret generally would mean that it is state.

There is also no use for multiple branches with dealing with secret bundles, as there is no apparent benefit.

Git uses a tree data structure for version control. It stores snapshots which are complete states of an entity every time an it is committed. Once pushed to a repository, key nodes can the pull the changes when they desire and reach eventual consistency.

As we can see we are not really using the version-control features of git, more as a way to bring some consistency to the distributed system.

Another possible solution is event sourcing. Event sourcing is a pattern that allows us to persist and entity from a series of events. That is it uses neither delta nor snapshots, but events result in state change. Traditionally to persist an object you would save it's state, but with ES you store sequence of state chaning events. It guarantees that any changes to an entity's state is initiated by an event. This allows us to do a complete rebuild of the state by rerunning the events from an empty application. It also provides an audit trail of how the current state came to be. This isn't really beneficial wrt to Polykey since we generally only care about the current state of the secret. Previous states are not that useful to us.

If the events store temporal information then we can do temporal queries, i.e. determine the state of the application at any point in time. Again not that beneficial for Polykey.

The event store can then expose an API for retrieve an entity's events. The store can also allow services to subscribe to events. When an event is persisted in the event store, it can deliver the event to all interested services.

This can be useful in Polykey as Keynodes can subscribe to events that relate to the secret bundle it contains. We would also then have to think about what exactly defines an event in Polykey. We also are currently explored a pull based model. Perhaps it can be extended to a pull-push based model in the future.

Since it appears that we are utilising more of the propagation functionality of the git more than the version control, perhaps exploring avenues such as implementing the raft algo or a variation of dat which is currently used for dynamic large binary blobs, as git currently seems best as only an interim solution.

Codegen for proto -> js

So I am trying to figure out the best way to compile proto files so they are resolved properly by webpack as it seems they are not handled properly.

The best way to me looks like first compiling the proto files into javascript using protoc or grpc_tools_node_protoc (from grpc-tools).

Lockfiles to constrain keynodes to a single agent

Also it is intended to have a single agent per user that can manage multiple keynodes just like a single ssh-agent per user can manage multiple keys. For any particular keynode, it can be registered against a particular agent that represents the authentication context for that keynode.

To prevent multiple agents (i.e. muiltiple users on the same system) from managing the same keynode, we could create a lockfile that constrains the keynode to a particular running agent.

Git backed server for sharing vaults

TBD.

Should be lower level pull/push style.

However we can also have a higher level sync. But that can also be automated via the polykey agent.

Relevant to #31

Authentication vs. Authorisation of keynodes

We need to separate the authentication and authorisation of the keynodes. Authentication is verifying that and agent is known and trustworthy or in other words that agent A knows agent B and trusts that person B controls keynode B. Authorisation is treated as the separate issue of what to allow the already authorised agent B to do.

One can also think of half opened vs fully opened connections. Here a half opened connection would occur when agent A has 'trusted' agent B but agent B has yet to manually vet agent A. Furthermore, agent B might not even know that agent A has trusted it so it only discovers this when it manually vets (i.e. approves the pub key of) agent A.

We probably don't want authentication to occur through pure peer discovery. What if agent B is malicious, but agent A thinks it's their friend from work? We could solve this issue by giving the agent A user some social proof (e.g. this pubkey that wants to connect with you also owns the twitter/github/facebook handle john-doe). But I think the best thing to do for now is assume that both agents have exchanged pub keys. In this way it's purely a mutual authentication and no sharing can happen unless both agents know the other's pub key already. The handshake is then the way we confirm that the target keynode actually controls the corresponding pubkey (i.e. confirm that they can use the private key to decrypt something)

This doesn't stop agent A from 'pre-sharing' one of it's vaults with agent B. In fact it's probably more efficient from a user-perspective to do it this way. Agent A knows about agent B and shares to B first, then when B verifies A, it doesn't then have to wait for A to approve a vault since its already authorised to pull from that vault.

Atomic Writes/Consistent Writes in the Lower FS

The lower fs provides the persistence of writes to our "files" in Polykey. We allow incremental mutation of "files" in Polykey. So that means given a 1 GiB file, editing the middle of the file doesn't mean rewriting the entire file. It's supposed to just edit and mutate that small partition in the middle.

However let's say that the buffer we are writing into the middle is 10 MiB. Then the problem is that when passing 10 MiB into the lower FS which is backed by the Node FS, that results in potentially multiple calls to the write syscall. And multiple write syscalls are not atomic. So that means it is possible to corrupt the encrypted ciphertext backing the plaintext file.

We want to make sure that we have "consistent" updates to our ciphertext files so we don't leave it in an inconsistent state. That would be quite bad.

There are a couple ways of doing this:

Write to temporary and perform atomic rename. (This defeats the purpose of incremental mutation)
Some sort of COW system

We should look into databases and ACID implementations for some ideas here.

It appears that using Git should mean this is not a problem for us. Because we don't care atomicity on a block basis or even on a single ciphertext basis. We only care about atomicity from 1 commit transaction to another. So if something fails in the middle of a commit transaction. An automatic rollback should be made.

Crypto - Encrypt/Decrypt Commands

We should have the ability to encrypt symmetrically or asymmetrically.

To do asymmetric, we need to have a "trust database". Technically we could point to any known public key. But by default we would have to be able to choose among any possible known public keys. To make this easier, a short truncated code that locates the public key is possible too. So there may be multiple ways to locate a public key to use for asymmetric encryption.

# encrypts a file and asks for password (this is symmetric)
polykey crypto encrypt ./file
polykey crypto encrypt --password=./pass ./file

# encrypts file with id locating a public key
polykey crypto encrypt ./file public-key-id

# maybe something like this for multi-key encryption, this would be the case where multiple readers can read the same file 
polykey crypto encrypt ./file public-key-id public-key-id

Note that we prompt the user for secrets, as secrets should not be specified as plaintext arguments/options. However they can be passed in via options.

Polykey Agent Design

I've always disliked it when programs create their own subshell/repl, it makes it difficult to integrate. This is OK, if the subshell is presenting a totally different world with different semantics, but with Polykey CLI it should just present a FS interface, and standard Unix tooling can work on it easily.

So if we are not creating a sub-shell within the normal shell, we need to instead integrate our "session" into the normal shell.

So that means that during our installation we also end up installing an agent. This is only needed during our CLI phase. However later I want re-use that agent for the GUI as well. So then it acts as a like taskbar agent in Windows and Mac, for Linux it would a user-level service. I would expect that the client and agent both integrate the js-polykey library.

So what does that mean? Well, it is possible for a single desktop to be running multiple keynodes. You can have a keynode in ~/.polykey1 and another keynode in ~/.polykey2. No problem with this.

Then the agent for now can keep the "login" session for each keynode that it knows about. So a user-level agent keeps track of keynodes for that user on Unix system (or Windows). But a root level agent would keep track of keynodes on the entire machine. But one could also have an agent within a nix-shell keeping track of just keynodes within its domain. So basically multiple agents may exist at the same time (this will make it easier to debug the agent and use it in infrastructure).

So from this point onwards, every polykey operation requires the client which is the polykey app to be authenticated. This means either asking for the password to unlock the stored private key, or asking for the private key if that is not stored.

However like the sudo command, as soon as we have authenticated once for any operation, we can then save that authentication to the polykey agent such that subsequent operations do not need to be reauthenticated.

We can do the above automatically or on the basis of some switch. But for prototyping purposes, let's do this automatically.

The session can have a time-expiry, and this can be refreshed on every operation. We can this configurable, but for now let's just fix to 1 hr.

The way in which polykey clients connect to the agent can be done similar to how ssh-agent or gpg agent is utilised. A standard environment variable pointing to some sort of socket that exists somewhere. See SSH_AUTH_SOCK.

Note that developing this agent also leads to interesting ideas about agent forwarding similar to how SSH works. However I want to leave this out for now, but it does pertain to another integration point, where in this the integration point is with SSH.

Add GitLab CI

Revocation of asymmetric master key and symmetric vault keys

We need to start thinking about revocation. Particularly revocation of a vault key and how we would propagate it through a PolyKey network.

It will be less common to revoke the master key of a particular keynode and that keynode would become untrustworthy from that point on. This might happen if a user has many sub-keynodes and one of them becomes compromised.

Crypto - Sign/Verify Commands

Signing and verification is competing in the digital signature market. I think this may become a bigger issue in the future especially for non-repudiation.

But we need to make this easy for people to use. So I propose these commands:

# sign with the keynode's private key
polykey crypto sign ./file

# sign with some other private key that exists
polykey cyrpto sign ./file private-key-id

The private-key-id can be any thing that can uniquely identify the private key. So we may potentially store private keys in polykey as a secret and use these as well, but also maybe we have some totally different private key.

The signature produced should be a "detached signature". See gpg detached signature functionality: https://www.gnupg.org/gph/en/manual/x135.html

I think this default is the most obvious and flexible, we end up with a separate file as the signature. And verification would refer to both the original file and the signature. If the file was tampered with then the verification would fail.

# the signature should be able to identify what public-key we need to use
polykey crypto verify ./file ./file.sig
# alternatively we use this in order to specify the public key we need
polykey crypto verify ./file ./file.sig public-key-id

Also note that without further specification, the output is then the outputted signature or other notes. STDOUT should be used for the obvious output. STDERR for any reporting. STDIN for taking in passwords or other prompts.

Persistence of secret vaults

Secret bundles will the unit of transfer in Polykey. One property of secret is that they are encrypted at rest. This is due to the fact they are containers of secrets and responsible for providing security for the secrets.

As of now secret bundle persistence will be implemented by naively serialising the secret bundle (using tar) and then encrypting it using the PGP standard.

One pitfall of this approach is that even a small change in secret bundle will mean the entire structure will reserialised. This may not be a big issue when dealing small files in small quantities, but if dealing with 1000s of secrets or large binary files, it will pose a greater issue.

Refer to issue as a possible solution