Git Product home page Git Product logo

dat-core's Introduction

dat-core

The core implementation of dat

npm install dat-core

build status dat

Usage

var dat = require('dat-core')

var db = dat('./test')

db.put('hello', 'world', function (err) { // insert value
  if (err) return handle(err) // something went wrong
  db.get('hello', function (err, result) {
    if (err) return handle(err) // something went wrong
    console.log(result)   // prints result
    console.log(db.head) // the 'head' of the database graph (a hash)
  })
})

API

db = dat(pathOrLevelDb, [options])

Create a new dat instance.

Options
  • checkout - database version to access. default is latest
  • valueEncoding - 'json' | 'binary' | 'utf-8' or a custom encoder instance
  • createIfMissing - true or false, default false. creates dat folder if it doesnt exist
  • backend - a leveldown compatible constructor to use (default is require('leveldown'))
  • blobs - an abstract-blob-store compatible instance to use (default is content-addressable-blob-store)

Per default the path passed to the backend is {path}/.dat/db. If your custom backend requires a special url simply wrap in a function

var sqldown = require('sqldown')
var db = dat('/some/regular/path', {
  backend: function () {
    return sqldown('pg://localhost/database')
  }
})

db.head

String property containing the current head revision of the dat. Everytime you mutate the dat this head changes.

db.init([cb])

Inits the dat by adding a root node to the graph if one hasn't been added already. Is called implicitly when you do a mutating operation.

cb (if specified) will be called with one argument, (error)

db.put(key, value, [opts], [cb])

Insert a value into the dat

cb (if specified) will be called with one argument, (error)

Options
  • dataset - the dataset to use
  • valueEncoding - an encoder instance to use to encode the value

db.get(key, [options], cb)

Get a value node from the dat

cb will be called with two arguments, (error, value). If successful, value will have these keys:

{
  content:  // 'file' or 'row'
  type:     // 'put' or 'del'
  version:  // version hash
  change:   // internal change number
  key:      // row key
  value:    // row value
}
Options
  • dataset - the dataset to use
  • valueEncoding - an encoder instance to use to decode the value

db.del(key, [cb])

Delete a node from the dat by key

cb (if specified) will be called with one argument, (error)

db.listDatasets(cb)

Returns a list of the datasets currently in use in this checkout

cb will be called with two arguments, (error, datasets) where datasets is an array of strings (dataset names)

set = dat.dataset(name)

Returns a namespaced dataset (similar to a sublevel in leveldb). If you just use dat.put and dat.get it will use the default dataset (equaivalent of doing dat.dataset().

stream = db.createReadStream([options])

Stream out values of the dat. Returns a readable stream.

stream = db.createWriteStream([options])

Stream in values to the dat. Returns a writable stream.

Options
  • dataset - the dataset to store the data in
  • message - a human readable message string to store with the metadata for the changes made by the write stream
  • transaction - boolean, default false. if true everything written to the write stream will be stored as 1 transaction in the history
  • batchSize - default 128, the group size used to write to the underlying leveldown batch write. this also determines how many nodes end up in the graph (higher batch size = less nodes)
  • valueEncoding - override the value encoding set on the dat-core instance
Data format

When you write data to the write stream, it must look like this:

{
  type:     // 'put' or 'del'
  key:      // key
  value:    // value
}

stream = db.createFileReadStream(key, [options])

Read a file stored under the key specified. Returns a binary read stream.

stream = db.createFileWriteStream(key, [options])

Write a file to be stored under the key specified. Returns a binary write stream.

stream = db.createPushStream([options])

Create a replication stream that both pushes changes to another dat

stream = db.createPullStream([options])

Create a replication stream that both pulls changes from another dat

stream = db.createReplicationStream([options])

Create a replication stream that both pulls and pushes

stream = db.createChangesStream([options])

Get a stream of changes happening to the dat. These changes are ONLY guaranteed to be ordered locally.

stream = db.heads()

Get a stream of heads in the underlying dat graph.

stream = db.layers()

Get a stream of layers in the dat.

A layer will added if both you and a remote make changes to the dat and you then pull the remote's changes.

They can also happen if you checkout a prevision revision and make changes.

stream = db.diff(branch1, branch2)

Compare two or more branches with each other. The stream will emit key,value pairs that conflict across the branches

stream = db.merge(branch1, branch2)

Returns a merge stream. You should write key,value pairs to this stream that conflicts across the branches (see the compare method above).

Once you end this stream the branches will be merged assuming the don't contain conflicting keys anymore.

anotherDat = db.checkout(ref)

Checkout an older revision of the dat. This is useful if you want to pin your data to a point in time.

db.put('hello', 'world', function () {
  var oldHash = db.head
  db.put('hello', 'verden', function () {
    var oldDb = db.checkout(oldHash)

    oldDb.get('hello', function (err, result) {
      console.log(result) // contains 'hello' -> 'world'
    })

    db.get('hello', function (err, result) {
      console.log(result) // contains 'hello' -> 'verden'
    })
  })
})

If you want to make this checkout persistent, i.e. your default head, set the {persistent: true} option

var anotherDat = db.checkout(someHead, {persistent: true})

anotherDat.on('ready', function () {
  // someHash if your default head now if you create a new dat instance
})

To reset your persistent head to the previous use db.checkout(false, {persistent: true})

Custom Encoders

Wherever you can specify valueEncoding, in addition to the built in string types you can also pass in an object with encode and decode methods.

For example, here is the implementation of the built-in JSON encoder:

var json = {
  encode: function (obj) {
    return new Buffer(JSON.stringify(obj))
  },
  decode: function (buf) {
    return JSON.parse(buf.toString())
  }
}

License

MIT

dat-core's People

Contributors

feross avatar karissa avatar linusu avatar mafintosh avatar max-mapper avatar okdistribute avatar sethvincent avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dat-core's Issues

dat-core v5 API

we wanna strip down the dat-core API to make it simpler/reduce scope.

here's the proposed new dat-core features:

  • replication
  • change processor hooks
  • low level graph data manipulation api (similar to hyperlog api)
  • graph external data pointers (with hashes pointing to commits and blobs etc)
  • encoding support (abstract-encoder)
  • core should be pure JS (works in browser and everywhere)

the goal is to be able to use dat-core in a situation where we might currently use hyperlog, e.g. https://github.com/moose-team/thanks/blob/fcfac4f7f4581f0fa0fecfe90f2122445da22ff4/index.js. dat-core is currently too heavy for these kinds of custom use cases. being able to use it for hyperlog-style things will let us build custom things while still being compatible with the entire dat high level ecosystem (like replication etc)

this means we would remove all other existing functionality currently in this repo and make new modules that the dat cli can use directly (tentative names):

  • dat-fs (for blobs)
  • dat-datasets (for tables and key/value storage)
  • dat-containers (for VMs)
  • dat-blob-resolver
  • dat-commit-resolver

the dat cli won't necessarily change because of any of this, this is just a technical refactoring

'merge nodes'

for e.g. merge --left, it can be a special graph node that just says 'point left' instead of duplicating data

Executing "Usage Example" => Error: No dat here

Do I need to pass in a leveldb?
The readme says something about

(default is require('leveldown'))

I do

var dat = require('dat-core')

var db = dat('./test')

db.put('hello', 'world', function (err) { // insert value 
  if (err) return handle(err) // something went wrong 
  db.get('hello', function (err, result) {
    if (err) return handle(err) // something went wrong 
    console.log(result)   // prints result 
    console.log(db.head) // the 'head' of the database graph (a hash) 
  })
})

// function handle (err) { console.error(err); }

and get

events.js:141
      throw er; // Unhandled 'error' event
            ^
Error: No dat here
    at /home/serapath/EXPERIMENTS/DAT_CORE/node_modules/dat-core/index.js:213:44
    at FSReqWrap.cb [as oncomplete] (fs.js:212:19)

way to list datasets

either as a callback w/ an array or a stream, but right now there is no way to do it.

multiprocess live bug

admin@dathub:~/src/sleep-irc/data$ taco-nginx --name datircserver dat serve --readonly
Listening on port 54786 (readonly)
/home/admin/src/dat-core/lib/multiprocess.js:162
        res.cb(err)
            ^
TypeError: undefined is not a function
    at Array.<anonymous> (/home/admin/src/dat-core/lib/multiprocess.js:162:13)
    at eval [as _decode] (eval at <anonymous> (/home/admin/src/dat-core/node_modules/pbs/node_modules/generate-function/index.js:55:21), <anonymous>:38:22)
    at Decoder._pushMessage (/home/admin/src/dat-core/node_modules/pbs/decoder.js:130:10)
    at Decoder._parseMissing (/home/admin/src/dat-core/node_modules/pbs/decoder.js:102:19)
    at Decoder._parse (/home/admin/src/dat-core/node_modules/pbs/decoder.js:143:23)
    at Decoder._write (/home/admin/src/dat-core/node_modules/pbs/decoder.js:193:10)
    at doWrite (/home/admin/src/dat-core/node_modules/pbs/node_modules/readable-stream/lib/_stream_writable.js:279:12)
    at writeOrBuffer (/home/admin/src/dat-core/node_modules/pbs/node_modules/readable-stream/lib/_stream_writable.js:266:5)
    at Writable.write (/home/admin/src/dat-core/node_modules/pbs/node_modules/readable-stream/lib/_stream_writable.js:211:11)
    at Duplexify._write (/home/admin/src/dat-core/node_modules/duplexify/index.js:200:22)

visual differ

I'd like to start working on the visual differ, and I'm going to use this function which isn't implemented yet:

db.compare

At the R open sci unconference last weekend, there was a discussion/workshop on visual diffing from the user's perspective (thanks @karthik!). I took notes here: okdistribute/knead#1

I've been looking at multiple options and we could use daff for the html display side of a particular diff if we want. It's pretty bare bones, though, and requires that the user passes the data through like so:

starts with:

var data1 = [
    ['Country','Capital'],
    ['Ireland','Dublin'],
    ['France','Paris'],
    ['Spain','Barcelona']
];
var data2 = [
    ['Country','Code','Capital'],
    ['Ireland','ie','Dublin'],
    ['France','fr','Paris'],
    ['Spain','es','Madrid'],
    ['Germany','de','Berlin']
];

which you must turn into their 'highligher format'

[ [ '!', '', '+++', '' ],
  [ '@@', 'Country', 'Code', 'Capital' ],
  [ '+', 'Ireland', 'ie', 'Dublin' ],
  [ '+', 'France', 'fr', 'Paris' ],
  [ '->', 'Spain', 'es', 'Barcelona->Madrid' ],
  [ '+++', 'Germany', 'de', 'Berlin' ] ]

So what should db.compare output? Stick with daff-style formatting? allow it as an export option? What about very large dats? daff was made with little ones in mind.

Use cases that we can do that daff does not do:

  • filter by diff type checking the first N rows to guess at the diff type.
    • column-wise: if all the rows in the first N of a particular column have changed, the whole column has probably changed. people might want to filter out these columns, or 'approve changes' for an entire column so they can visually focus on the row-based changes.
    • spot-checking: what little changes have been made across the whole table?

"transactions"

from the IRC discussion yesterday, we need a 'start' point and an 'end' point for operations, so that when a concurrent operation happens it does a checkout to the start point, etc

createDiffStream on forked db

https://github.com/karissa/dat-visualdiff/blob/master/test/test.js

> dat-visualdiff@1.0.0 test /Users/karissa/dev/node_modules/dat-visualdiff
> node test/test.js

[ 'b55933248b064927483188f220ccda637d240d24fe83ab280142dabbe0cbf80f',
  '914179703ea46fe419248d5b8ff6bd418f143006dfec2e351c7b5042509f3725' ]

/Users/karissa/dev/node_modules/dat-core/index.js:394
          return Math.min(a._layers[i][0], b._layers[i][0])
                                                       ^
TypeError: Cannot read property '0' of undefined
    at findFork (/Users/karissa/dev/node_modules/dat-core/index.js:394:56)
    at DestroyableTransform.filter [as _transform] (/Users/karissa/dev/node_modules/dat-core/index.js:403:29)
    at DestroyableTransform.Transform._read (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:184:10)
    at DestroyableTransform.Transform._write (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:12)
    at doWrite (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:237:10)
    at writeOrBuffer (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:227:5)
    at DestroyableTransform.Writable.write (/Users/karissa/dev/node_modules/dat-core/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:194:11)
    at Class.ondata (/Users/karissa/dev/node_modules/dat-core/node_modules/sorted-diff-stream/node_modules/from2/node_modules/readable-stream/lib/_stream_readable.js:572:20)
    at Class.emit (events.js:95:17)
    at readableAddChunk (/Users/karissa/dev/node_modules/dat-core/node_modules/sorted-diff-stream/node_modules/from2/node_modules/readable-stream/lib/_stream_readable.js:195:16)
npm ERR! Test failed.  See above for more details.

cant diff root node

~/Desktop/dat-test ๐Ÿˆ  dat init --no-prompt
Initialized a new dat at /Users/maxogden/Desktop/dat-test
~/Desktop/dat-test ๐Ÿˆ  dat log --json
{"root":true,"change":1,"date":"2015-07-23T02:47:16.050Z","version":"b45458f06ef6fbd1331417e8edd949604298589ca699478cf11a19a63c2f94b7","message":"","links":[],"puts":0,"deletes":0,"files":0}
{"root":false,"change":2,"date":"2015-07-23T02:47:16.061Z","version":"e1badc19d32efb3b0ece9b218578fcd78780590924d45049908a4ba082a9be88","message":"","links":["b45458f06ef6fbd1331417e8edd949604298589ca699478cf11a19a63c2f94b7"],"puts":1,"deletes":0,"files":1}
~/Desktop/dat-test ๐Ÿˆ  dat diff b45458f06ef6fbd1331417e8edd949604298589ca699478cf11a19a63c2f94b7
undefined:1
[object Object]
 ^
SyntaxError: Unexpected token o
    at Object.parse (native)
    at Object.json.decode (/usr/local/lib/node_modules/dat/node_modules/dat-core/lib/encoding.js:6:17)
    at decode (/usr/local/lib/node_modules/dat/node_modules/dat-core/index.js:827:31)
    at DestroyableTransform.filter [as _transform] (/usr/local/lib/node_modules/dat/node_modules/dat-core/index.js:840:12)
    at DestroyableTransform.Transform._read (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:184:10)
    at DestroyableTransform.Transform._write (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_transform.js:172:12)
    at doWrite (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:237:10)
    at writeOrBuffer (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:227:5)
    at DestroyableTransform.Writable.write (/usr/local/lib/node_modules/dat/node_modules/through2/node_modules/readable-stream/lib/_stream_writable.js:194:11)
    at Class.ondata (/usr/local/lib/node_modules/dat/node_modules/dat-core/node_modules/sorted-diff-stream/node_modules/from2/node_modules/readable-stream/lib/_stream_readable.js:572:20)
~/Desktop/dat-test ๐Ÿˆ  

I don't have a use case for diffing the root node, but o/ shouldn't happen

opt in for 'get on put'

default should be that puts override data, but you should be able to say 'only put if the key doesnt exist'

try and ensure 100% success rate even on crappy networks

today while testing duplex http at tivoli gardens in copenhagen we had the follow success/fails when doing a dat pull from an ubuntu server to a client tethered to either a slow t-mobile international roaming connection or a fast danish roaming connection:

  • http w/ nginx on tmobile: fail
  • https w/ nginx on tmobile: success
  • http w/ nginx on 3: success
  • https w/ nginx on 3: success
  • http w/o nginx on tmobile: success
  • https w/o nginx on tmobile: success
  • http w/o nginx on 3: success
  • https w/o nginx on 3: success
  • ssh on tmobile: success
  • http w/ nginx on tmobile plus calling http request.end - success (because it wasn't duplex http)

so something involving tmobile and nginx causes duplex http to break (major WTF). this wasn't a dat bug, we made a very small test case. the fix is to use https or ssh, and not http

we are thinking if you try and pull over http we should show a warning. we could also do a self signed certificate automatically when you do dat serve and default to https when cloning

we also want to write a simple duplex-http-tester CLI that anyone can run to test connections

dont hash timestamps

if you e.g. do the same merge twice on two separate machines, the hashes should match

basicalliy dont put timestamps in the graph

clone a hash

only download data from repo creation -> checkout hash. right now you have to clone all the things and then access backwards to the checkout hash

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.