max-mapper / dat-core Goto Github PK

low level implementation of the dat data version graph

License: MIT License

JavaScript 95.93% Protocol Buffer 4.07%

dat-core's Introduction

dat-core

The core implementation of dat

npm install dat-core

Usage

var dat = require('dat-core')

var db = dat('./test')

db.put('hello', 'world', function (err) { // insert value
  if (err) return handle(err) // something went wrong
  db.get('hello', function (err, result) {
    if (err) return handle(err) // something went wrong
    console.log(result)   // prints result
    console.log(db.head) // the 'head' of the database graph (a hash)
  })
})

API

`db = dat(pathOrLevelDb, [options])`

Create a new dat instance.

Options

checkout - database version to access. default is latest
valueEncoding - 'json' | 'binary' | 'utf-8' or a custom encoder instance
createIfMissing - true or false, default false. creates dat folder if it doesnt exist
backend - a leveldown compatible constructor to use (default is require('leveldown'))
blobs - an abstract-blob-store compatible instance to use (default is content-addressable-blob-store)

Per default the path passed to the backend is {path}/.dat/db. If your custom backend requires a special url simply wrap in a function

var sqldown = require('sqldown')
var db = dat('/some/regular/path', {
  backend: function () {
    return sqldown('pg://localhost/database')
  }
})

`db.head`

String property containing the current head revision of the dat. Everytime you mutate the dat this head changes.

`db.init([cb])`

Inits the dat by adding a root node to the graph if one hasn't been added already. Is called implicitly when you do a mutating operation.

cb (if specified) will be called with one argument, (error)

`db.put(key, value, [opts], [cb])`

Insert a value into the dat

cb (if specified) will be called with one argument, (error)

Options

dataset - the dataset to use
valueEncoding - an encoder instance to use to encode the value

`db.get(key, [options], cb)`

Get a value node from the dat

cb will be called with two arguments, (error, value). If successful, value will have these keys:

{
  content:  // 'file' or 'row'
  type:     // 'put' or 'del'
  version:  // version hash
  change:   // internal change number
  key:      // row key
  value:    // row value
}

Options

dataset - the dataset to use
valueEncoding - an encoder instance to use to decode the value

`db.del(key, [cb])`

Delete a node from the dat by key

cb (if specified) will be called with one argument, (error)

`db.listDatasets(cb)`

Returns a list of the datasets currently in use in this checkout

cb will be called with two arguments, (error, datasets) where datasets is an array of strings (dataset names)

`set = dat.dataset(name)`

Returns a namespaced dataset (similar to a sublevel in leveldb). If you just use dat.put and dat.get it will use the default dataset (equaivalent of doing dat.dataset().

`stream = db.createReadStream([options])`

Stream out values of the dat. Returns a readable stream.

`stream = db.createWriteStream([options])`

Stream in values to the dat. Returns a writable stream.

Options

dataset - the dataset to store the data in
message - a human readable message string to store with the metadata for the changes made by the write stream
transaction - boolean, default false. if true everything written to the write stream will be stored as 1 transaction in the history
batchSize - default 128, the group size used to write to the underlying leveldown batch write. this also determines how many nodes end up in the graph (higher batch size = less nodes)
valueEncoding - override the value encoding set on the dat-core instance

Data format

When you write data to the write stream, it must look like this:

{
  type:     // 'put' or 'del'
  key:      // key
  value:    // value
}

`stream = db.createFileReadStream(key, [options])`

Read a file stored under the key specified. Returns a binary read stream.

`stream = db.createFileWriteStream(key, [options])`

Write a file to be stored under the key specified. Returns a binary write stream.

`stream = db.createPushStream([options])`

Create a replication stream that both pushes changes to another dat

`stream = db.createPullStream([options])`

Create a replication stream that both pulls changes from another dat

`stream = db.createReplicationStream([options])`

Create a replication stream that both pulls and pushes

`stream = db.createChangesStream([options])`

Get a stream of changes happening to the dat. These changes are ONLY guaranteed to be ordered locally.

`stream = db.heads()`

Get a stream of heads in the underlying dat graph.

`stream = db.layers()`

Get a stream of layers in the dat.

A layer will added if both you and a remote make changes to the dat and you then pull the remote's changes.

They can also happen if you checkout a prevision revision and make changes.

`stream = db.diff(branch1, branch2)`

Compare two or more branches with each other. The stream will emit key,value pairs that conflict across the branches

`stream = db.merge(branch1, branch2)`

Returns a merge stream. You should write key,value pairs to this stream that conflicts across the branches (see the compare method above).

Once you end this stream the branches will be merged assuming the don't contain conflicting keys anymore.

`anotherDat = db.checkout(ref)`

Checkout an older revision of the dat. This is useful if you want to pin your data to a point in time.

db.put('hello', 'world', function () {
  var oldHash = db.head
  db.put('hello', 'verden', function () {
    var oldDb = db.checkout(oldHash)

    oldDb.get('hello', function (err, result) {
      console.log(result) // contains 'hello' -> 'world'
    })

    db.get('hello', function (err, result) {
      console.log(result) // contains 'hello' -> 'verden'
    })
  })
})

If you want to make this checkout persistent, i.e. your default head, set the {persistent: true} option

var anotherDat = db.checkout(someHead, {persistent: true})

anotherDat.on('ready', function () {
  // someHash if your default head now if you create a new dat instance
})

To reset your persistent head to the previous use db.checkout(false, {persistent: true})

Custom Encoders

Wherever you can specify valueEncoding, in addition to the built in string types you can also pass in an object with encode and decode methods.

For example, here is the implementation of the built-in JSON encoder:

var json = {
  encode: function (obj) {
    return new Buffer(JSON.stringify(obj))
  },
  decode: function (buf) {
    return JSON.parse(buf.toString())
  }
}

License

MIT

dat-core's People

Contributors

Stargazers

Watchers

Forkers

sethvincent okdistribute prodigeni zectbynmo vespakoen linusu

dat-core's Issues

dat-core v5 API

we wanna strip down the dat-core API to make it simpler/reduce scope.

here's the proposed new dat-core features:

replication
change processor hooks
low level graph data manipulation api (similar to hyperlog api)
graph external data pointers (with hashes pointing to commits and blobs etc)
encoding support (abstract-encoder)
core should be pure JS (works in browser and everywhere)

the goal is to be able to use dat-core in a situation where we might currently use hyperlog, e.g. https://github.com/moose-team/thanks/blob/fcfac4f7f4581f0fa0fecfe90f2122445da22ff4/index.js. dat-core is currently too heavy for these kinds of custom use cases. being able to use it for hyperlog-style things will let us build custom things while still being compatible with the entire dat high level ecosystem (like replication etc)

this means we would remove all other existing functionality currently in this repo and make new modules that the dat cli can use directly (tentative names):

dat-fs (for blobs)
dat-datasets (for tables and key/value storage)
dat-containers (for VMs)
dat-blob-resolver
dat-commit-resolver

the dat cli won't necessarily change because of any of this, this is just a technical refactoring

make isCheckout a public property for knowing checkout state

right now its ._checkout but it would be useful as a public api. I propose .isCheckout

decouple commit data from graph data

store them separately so they can be fetched separately if needed

'merge nodes'

for e.g. merge --left, it can be a special graph node that just says 'point left' instead of duplicating data

add root: true in createChangesStream to root nodes

so we can know when they are the root node in e.g. dat log

writing incomplete data to a writestream should throw

e.g. writeStream.push({key: 'foo'}) currently silently fails but it should throw/error if value and type aren't specified

add more multiprocess tests

to prevent bugs like this in the future #39

Executing "Usage Example" => Error: No dat here

Do I need to pass in a leveldb?
The readme says something about

(default is require('leveldown'))

I do

var dat = require('dat-core')

var db = dat('./test')

db.put('hello', 'world', function (err) { // insert value 
  if (err) return handle(err) // something went wrong 
  db.get('hello', function (err, result) {
    if (err) return handle(err) // something went wrong 
    console.log(result)   // prints result 
    console.log(db.head) // the 'head' of the database graph (a hash) 
  })
})

// function handle (err) { console.error(err); }

and get

events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: No dat here
    at /home/serapath/EXPERIMENTS/DAT_CORE/node_modules/dat-core/index.js:213:44
    at FSReqWrap.cb [as oncomplete] (fs.js:212:19)

way to list datasets

either as a callback w/ an array or a stream, but right now there is no way to do it.

remove default dataset in core

make it so you can only put/get/del etc if you have done .dataset() first

multiprocess live bug

admin@dathub:~/src/sleep-irc/data$ taco-nginx --name datircserver dat serve --readonly
Listening on port 54786 (readonly)
/home/admin/src/dat-core/lib/multiprocess.js:162
        res.cb(err)
            ^
TypeError: undefined is not a function
    at Array.<anonymous> (/home/admin/src/dat-core/lib/multiprocess.js:162:13)
    at eval [as _decode] (eval at <anonymous> (/home/admin/src/dat-core/node_modules/pbs/node_modules/generate-function/index.js:55:21), <anonymous>:38:22)
    at Decoder._pushMessage (/home/admin/src/dat-core/node_modules/pbs/decoder.js:130:10)
    at Decoder._parseMissing (/home/admin/src/dat-core/node_modules/pbs/decoder.js:102:19)
    at Decoder._parse (/home/admin/src/dat-core/node_modules/pbs/decoder.js:143:23)
    at Decoder._write (/home/admin/src/dat-core/node_modules/pbs/decoder.js:193:10)
    at doWrite (/home/admin/src/dat-core/node_modules/pbs/node_modules/readable-stream/lib/_stream_writable.js:279:12)
    at writeOrBuffer (/home/admin/src/dat-core/node_modules/pbs/node_modules/readable-stream/lib/_stream_writable.js:266:5)
    at Writable.write (/home/admin/src/dat-core/node_modules/pbs/node_modules/readable-stream/lib/_stream_writable.js:211:11)
    at Duplexify._write (/home/admin/src/dat-core/node_modules/duplexify/index.js:200:22)

show number of datasets in status message

e.g. 4 datasets

visual differ

I'd like to start working on the visual differ, and I'm going to use this function which isn't implemented yet:

db.compare

At the R open sci unconference last weekend, there was a discussion/workshop on visual diffing from the user's perspective (thanks @karthik!). I took notes here: okdistribute/knead#1

I've been looking at multiple options and we could use daff for the html display side of a particular diff if we want. It's pretty bare bones, though, and requires that the user passes the data through like so:

starts with:

var data1 = [
    ['Country','Capital'],
    ['Ireland','Dublin'],
    ['France','Paris'],
    ['Spain','Barcelona']
];
var data2 = [
    ['Country','Code','Capital'],
    ['Ireland','ie','Dublin'],
    ['France','fr','Paris'],
    ['Spain','es','Madrid'],
    ['Germany','de','Berlin']
];

which you must turn into their 'highligher format'

[ [ '!', '', '+++', '' ],
  [ '@@', 'Country', 'Code', 'Capital' ],
  [ '+', 'Ireland', 'ie', 'Dublin' ],
  [ '+', 'France', 'fr', 'Paris' ],
  [ '->', 'Spain', 'es', 'Barcelona->Madrid' ],
  [ '+++', 'Germany', 'de', 'Berlin' ] ]

So what should db.compare output? Stick with daff-style formatting? allow it as an export option? What about very large dats? daff was made with little ones in mind.

Use cases that we can do that daff does not do:

filter by diff type checking the first N rows to guess at the diff type.
- column-wise: if all the rows in the first N of a particular column have changed, the whole column has probably changed. people might want to filter out these columns, or 'approve changes' for an entire column so they can visually focus on the row-based changes.
- spot-checking: what little changes have been made across the whole table?

"transactions"

from the IRC discussion yesterday, we need a 'start' point and an 'end' point for operations, so that when a concurrent operation happens it does a checkout to the start point, etc

multiprocess mode can make replication hang

there seems to be an issue with dag.changes not being updated in hyperlog when running in multiprocess mode

add require('debug')

with debug info for me :)

reimporting the same key/value should be a no-op

e.g. if you import the same csv twice, the second time it should not affect any state on disk

another way to phrase it is: ignore subsequent key/value pairs that are the same

createDiffStream on forked db

https://github.com/karissa/dat-visualdiff/blob/master/test/test.js

> dat-visualdiff@1.0.0 test /Users/karissa/dev/node_modules/dat-visualdiff
> node test/test.js

[ 'b55933248b064927483188f220ccda637d240d24fe83ab280142dabbe0cbf80f',
  '914179703ea46fe419248d5b8ff6bd418f143006dfec2e351c7b5042509f3725' ]

/Users/karissa/dev/node_modules/dat-core/index.js:394
          return Math.min(a._layers[i][0], b._layers[i][0])
                                                       ^
TypeError: Cannot read property '0' of undefined
    at findFork (/Users/karissa/dev/node_modules/dat-core/index.js:394:56)
    at DestroyableTransform.filter [as _transform] (/Users/karissa/dev/node_modules/dat-core/index.js:403:29)
    at DestroyableTransform.Transform._read (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:184:10)
    at DestroyableTransform.Transform._write (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:12)
    at doWrite (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:237:10)
    at writeOrBuffer (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:227:5)
    at DestroyableTransform.Writable.write (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:194:11)
    at Class.ondata (/Users/karissa/dev/node_modules/dat-core/node_modules/sorted-diff-stream/node_modules/from2/node_modules/readable-stream/lib/_stream_readable.js:572:20)
    at Class.emit (events.js:95:17)
    at readableAddChunk (/Users/karissa/dev/node_modules/dat-core/node_modules/sorted-diff-stream/node_modules/from2/node_modules/readable-stream/lib/_stream_readable.js:195:16)
npm ERR! Test failed.  See above for more details.

cant diff root node

~/Desktop/dat-test 🐈  dat init --no-prompt
Initialized a new dat at /Users/maxogden/Desktop/dat-test
~/Desktop/dat-test 🐈  dat log --json
{"root":true,"change":1,"date":"2015-07-23T02:47:16.050Z","version":"b45458f06ef6fbd1331417e8edd949604298589ca699478cf11a19a63c2f94b7","message":"","links":[],"puts":0,"deletes":0,"files":0}
{"root":false,"change":2,"date":"2015-07-23T02:47:16.061Z","version":"e1badc19d32efb3b0ece9b218578fcd78780590924d45049908a4ba082a9be88","message":"","links":["b45458f06ef6fbd1331417e8edd949604298589ca699478cf11a19a63c2f94b7"],"puts":1,"deletes":0,"files":1}
~/Desktop/dat-test 🐈  dat diff b45458f06ef6fbd1331417e8edd949604298589ca699478cf11a19a63c2f94b7
undefined:1
[object Object]
 ^
SyntaxError: Unexpected token o
    at Object.parse (native)
    at Object.json.decode (/usr/local/lib/node_modules/dat/node_modules/dat-core/lib/encoding.js:6:17)
    at decode (/usr/local/lib/node_modules/dat/node_modules/dat-core/index.js:827:31)
    at DestroyableTransform.filter [as _transform] (/usr/local/lib/node_modules/dat/node_modules/dat-core/index.js:840:12)
    at DestroyableTransform.Transform._read (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:184:10)
    at DestroyableTransform.Transform._write (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:12)
    at doWrite (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:237:10)
    at writeOrBuffer (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:227:5)
    at DestroyableTransform.Writable.write (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:194:11)
    at Class.ondata (/usr/local/lib/node_modules/dat/node_modules/dat-core/node_modules/sorted-diff-stream/node_modules/from2/node_modules/readable-stream/lib/_stream_readable.js:572:20)
~/Desktop/dat-test 🐈

I don't have a use case for diffing the root node, but o/ shouldn't happen

opt in for 'get on put'

default should be that puts override data, but you should be able to say 'only put if the key doesnt exist'

add status API

https://github.com/maxogden/dat/blob/4187ad473282c56d7e717cd0c09bce768821769d/beta-cli-api.md#dat-status

try and ensure 100% success rate even on crappy networks

today while testing duplex http at tivoli gardens in copenhagen we had the follow success/fails when doing a dat pull from an ubuntu server to a client tethered to either a slow t-mobile international roaming connection or a fast danish roaming connection:

http w/ nginx on tmobile: fail
https w/ nginx on tmobile: success
http w/ nginx on 3: success
https w/ nginx on 3: success
http w/o nginx on tmobile: success
https w/o nginx on tmobile: success
http w/o nginx on 3: success
https w/o nginx on 3: success
ssh on tmobile: success
http w/ nginx on tmobile plus calling http request.end - success (because it wasn't duplex http)

so something involving tmobile and nginx causes duplex http to break (major WTF). this wasn't a dat bug, we made a very small test case. the fix is to use https or ssh, and not http

we are thinking if you try and pull over http we should show a warning. we could also do a self signed certificate automatically when you do dat serve and default to https when cloning

we also want to write a simple duplex-http-tester CLI that anyone can run to test connections

max-mapper / dat-core Goto Github PK

dat-core's Introduction

dat-core

Usage

API

db = dat(pathOrLevelDb, [options])

Options

db.head

db.init([cb])

db.put(key, value, [opts], [cb])

Options

db.get(key, [options], cb)

Options

db.del(key, [cb])

db.listDatasets(cb)

set = dat.dataset(name)

stream = db.createReadStream([options])

stream = db.createWriteStream([options])

Options

Data format

stream = db.createFileReadStream(key, [options])

stream = db.createFileWriteStream(key, [options])

stream = db.createPushStream([options])

stream = db.createPullStream([options])

stream = db.createReplicationStream([options])

stream = db.createChangesStream([options])

stream = db.heads()

stream = db.layers()

stream = db.diff(branch1, branch2)

stream = db.merge(branch1, branch2)

anotherDat = db.checkout(ref)

Custom Encoders

License

dat-core's People

Contributors

Stargazers

Watchers

Forkers

dat-core's Issues

Recommend Projects

Recommend Topics

Recommend Org

`db = dat(pathOrLevelDb, [options])`

`db.head`

`db.init([cb])`

`db.put(key, value, [opts], [cb])`

`db.get(key, [options], cb)`

`db.del(key, [cb])`

`db.listDatasets(cb)`

`set = dat.dataset(name)`

`stream = db.createReadStream([options])`

`stream = db.createWriteStream([options])`

`stream = db.createFileReadStream(key, [options])`

`stream = db.createFileWriteStream(key, [options])`

`stream = db.createPushStream([options])`

`stream = db.createPullStream([options])`

`stream = db.createReplicationStream([options])`

`stream = db.createChangesStream([options])`

`stream = db.heads()`

`stream = db.layers()`

`stream = db.diff(branch1, branch2)`

`stream = db.merge(branch1, branch2)`

`anotherDat = db.checkout(ref)`