Git Product home page Git Product logo

databox-store-blob's Introduction

Build Status

databox-store-blob

Databox Store for JSON data blobs handles time series and key value data.

The datastore exposes an HTTP-based API on port 8080 and a WebSocket based API for live data. All requests must have arbiter tokens passed as per section 7.1 of the Hypercat 3.0 specs.

Read API

Time series data

URL: /<datasourceid>/ts/latest
Method: GET
Parameters: <datasourceid> the datasourceid to get data for.
Notes: will return the latest data based on the datasourceid

URL: /<datasourceid>/ts/since
Method: GET
URL Parameters: <datasourceid> the datasourceid to get data for.
Body Parameters: <startTimestamp> the timestamp in ms to return records after.
Notes: will return the all data since the provided timestamp for the provided datasourceid

URL: /<datasourceid>/range
Method: GET
URL Parameters: <datasourceid> the datasourceid to get data for
Body Parameters: <startTimestamp> and <endTimestamp> for the range.
Notes: will return the all data between the provided start and end timestamps for the provided datasourceid.

Key value pairs

URL: /<key>/kv/
Method: GET
Parameters: replace <key> with document key
Notes: will return the data stored with that key. Returns an empty array 404 {status:404,error:"Document not found."} if no data is stored

Websockets

Connect to a websocket client to /ws. Then subscribe for data using:

For time serries:

URL: /sub/<datasourceid>/ts
Method: GET
Parameters: replace <datasourceid> with datasourceid
Notes: Will broadcast over the websocket the data stored by datasourceid when data is added.


For key value:

URL: /sub/<key>/kv
Method: GET
Parameters: replace <key> with document key
Notes:  Will broadcast over the websocket the data stored with that key when it is add or updated.

Write API

Managing the data source catalog

URL: /cat
Method: POST
Parameters: Raw JSON body containing a Hypercat item (as per PAS212 (https://shop.bsigroup.com/upload/276605/PAS212-corr.pdf) Table 2).
For example:
{
    "item-metadata": [{
            // NOTE: Required
            "rel": "urn:X-hypercat:rels:hasDescription:en",
            "val": "Test item"
        }, {
            // NOTE: Required
            "rel": "urn:X-hypercat:rels:isContentType",
            "val": "text/plain"
        }, {
            "rel": "urn:X-databox:rels:hasVendor",
            "val": "Databox Inc."
        }, {
            "rel": "urn:X-databox:rels:hasType",
            "val": "Test"
        }, {
            "rel": "urn:X-databox:rels:hasDatasourceid",
            "val": "MyLongId"
        }, {
            "rel": "urn:X-databox:rels:isActuator",
            "val": false
        }, {
            "rel": "urn:X-databox:rels:hasStoreType",
            "val": "databox-store-blob"
        }
    ],
    "href": "https://databox-store-blob:8080"
}

Time series data

URL: /<datasourceid>/ts/
Method: POST
Parameters: Raw JSON body containing elements as follows {data: <json blob to store>}
Notes: Stores a value a timestamp is added on insertion

Key value pairs

URL: /<key>/kv/
Method: POST
Parameters: Raw JSON body containing elements as follows {<data to be stored in JSON format>}
Notes: will insert if the <key> is not in the database and update the document if it is.

Websockets

Not available for writing

Arbiter Facing

The data source catalog

URL: /cat
Method: GET
Parameters: none
Notes: will return the latest data source catalog in Hypercat format.

Status

This is beta. Expect bugs but the API should be reasonably stable.

#Building running

npm install && npm start

Developing

Start the container manger in developer mode:

DATABOX_DEV=1 npm start

Clone the repo and make your changes. To build a new Databox image and push it to you local registry:

npm run build && npm run deploy

Then restart the container manger to use you updated version.

Testing

npm install --development
NO_SECURITY=1 NO_LOGGING=1 npm test 

databox-store-blob's People

Contributors

toshbrown avatar yousefamar avatar haddadi avatar mor1 avatar

Watchers

John P. T. Moore avatar Tom Lodge avatar  avatar Dominic Price avatar Chris Greenhalgh avatar James Cloos avatar  avatar  avatar  avatar Poonam Hiwal avatar Mohammad Malekzadeh avatar  avatar

Forkers

yousefamar mor1

databox-store-blob's Issues

Audit log

Audit log, to recover information about accesses made to/from the store.

Log all interaction into an audit DB.

Add the /audit end points

Suggestions:

1 Make it a time series like endpoint with /latest and /since endpoints  
2 Filter by sensor ID
3 Filter by sensor type 
4 Filter by reads and writes.  

Binary data

Do we envisage that this store eventually will store binary data or will that be a job of another store?

Https support

APIs need moving to https with cert signed by the CM

Time-series sensor ID should be a route parameter

For the time-series API (if that's even here and not its own repo?) we should have sensor_id be a route parameter so that path permissions cover individual sensors for any given driver-store pair. E.g. /api/data/:sensorID/latest, and that way some apps could only have access to [ "/api/data/accelerometer/latest", "/api/data/light/*" ] while another has "/api/data/*" for example to replicate original behaviour.

Solve the data on-demand problem

I had a chat with @mmalekzadeh the other day when this came up. I think it's fundamental enough that it doesn't count as an early optimisation, so maybe should be tackled as soon as all the MVP things are done.

I'm not sure if this even is a problem right now, so any input is appreciated.

Problem

Certain kinds of drivers may expose a whole range of data sources. Certain apps may only need some of these. For the driver to be streaming all of these into a store at all times is inefficient, and uses up more resources than is necessary (source battery, bandwidth, disk space, etc).

For example, an app may only need a mobile phone's accelerometer data, so there's no reason it should be streaming microphone input to a store 24/7 too. (Or is there? Is making historical time series data — that existed before the birth of an app — available to that app allowed?)

Potential solutions

Some kinds of workarounds can already be done. For example, in a store where you want time series to only stream on-demand, you can have a small key-value store alongside it with a bunch of flags for each data source. Apps can set or unset these flags, a driver can read them, and start or stop requesting data from a source based on the value of these flags.

This becomes hard to manage though when you have more than one app that needs the same data sources since you would need to prevent one app from turning off the tap for another.

Another way may be to give drivers some way of finding out what apps have access to it's stores' data sources and which (e.g. arbiter endpoint?). That way, when an app is launched with permissions to read microphone data, the driver can start collecting that data, and when the last app with microphone permissions dies, the driver can stop.

Expand WebSocket subscription system

Precursor to #13 and second part of #16.

I asked during the call whether data pulled through polling and data pushed live through WS should be controlled by different permissions, and the consensus was that yes they should.

In order to be consistent with how tokens are passed in other places (and also to give app devs less to learn) I think we should extend how WS streaming works.

Firstly, that streaming data (in or out via WS) should be separated from subscribing to data, and the authorization that will need to entail.

Secondly, that subscription should be by endpoint, which the store will understand to mean sensor or key. Since we already have authorization for paths, this would kill two birds with one stone (similar to #15).

I propose that for every item pushed to a store-hosted Hypercat catalogue, there should be an equivalent WS subscription endpoint. How this works exactly is not that important — we could have both as items in the catalogue automatically, or we could say that the driver has to push them to the catalogue (and thus control what data can be subscribed to at all), or we could have a metadata flag for things that are subscribable via WS.

For example, for the endpoint:

/api/key/:key

we have the subscription endpoint similar to anything like:

/subscribe/api/key/:key
/subscribe/key/:key
/subscribe/api/key/:key
/sub/key/:key

We could decide on anything for the namespacing as long as it's [something]/[original rest endpoint]. Potentially also a way to unsubscribe, though that might be weird to have as a route, since that would mean you could allow an app to subscribe but forbid them from unsubscribing (maybe could be a GET/POST param for the same one instead).

So this way, just by controlling the path caveat, an app can have access to REST but not WS, or WS but not REST, or both ("/(sub|api)/key/*" as a permission), or neither. Since subscription to WS is done via a REST endpoint and not WS, the same macaroon/token passing system applies.

On subscribing, an event listener is registered somewhere to notify that container of new inputs. Containers still need to go through the WS handshake with proper auth (see #16) and this also identifies them to the store (kind of like a session cookie), so that the store knows which WS connections a subscribe request applies to.

So the advantages are:

  • Permissions to access WS at all can be controlled separately (if it's at root, then with the path "/", else "/ws" or "/notif" or something). We could even apply routes to WS connections for whatever purpose and have even more fine-grained control with the path caveat.
  • All an app needs to do is open a WS connection as a client, then separately subscribe to stuff with normal requests. It would never actually have to send anything over WS to the store.
  • All a driver needs to do is perform the WS handshake with their token on connection, then they can write normally without having to send over a token every time.
  • Since subscription is by endpoint, path caveats apply, so no need to add additional complexity in checking permissions when a subscription request is made (though that is certainly possible now) since you can just do it with paths.
  • REST and WS for the same data have separate permissions, as they should.

If you guys think that's the way to go, I'd be happy to take a stab at it. I hope I explained that clearly enough, but let me know if something is unclear (I may have forgotten parts I was thinking of).

Consider accepting DELETE request on /cat endpoint

We've talked about this before in #21, but there was never any need to implement it. I don't think it's important to have this for v0.1, but it's something I came across again and wanted to hear thoughts on. Just thought I'd leave it in a GitHub issue so we could look at this again later perhaps.

I was adding a little extra logic to the mobile driver to make it handle the case where a user might turn off a sensor at the source mid-stream, and was thinking about how the driver should react, regarding the store catalogue.

There are two possibilities:

  • The data stream is quenched, but the source entry stays in the catalogue. Apps can still request access to this data in the store, only they shouldn't expect any new data to come, just what was already stored.
  • The driver (having gotten path=/cat;method=DELETE permissions too) DELETEs the source from it's store catalogue (PAS212 5.6). All apps that can currently access that source just get no new data, and any new apps can't be launched with those sources because they no longer exist.

The way I have this driver working now, is that catalogue items are only added once a user turns on a source in the config UI, but they're not removed when the user turns them off (https://github.com/me-box/databox-driver-mobile/blob/master/main.js#L38). This might be ok. After all, the driver may have already stored some data. The question here is if turning sources on and off should be an act of controlling discoverability in addition to toggling data streams.

My inclination is no, and that once a datasource is added to a catalogue, it's there forever, until the actual data is also removed from it's store.

The problem with that is that apps won't know if the reason that there's no new data is because a sensor has been turned off, or if it really is just because there's no new data. It would be like a smartphone app having permission to access GPS, but when it does, it just gets nothing back. Possible workaround: a key-value store alongside that indicates what sensors are active or not.

Schema discovery

To obtain details of stored data types

Add the /listDataSources endpoint which will respond with an array of JSON objects describing the available data sources

Add the /listActuators endpoint which will respond with an array of JSON objects describing the available actuators

Integrate macaroons

Macaroons need to be supported and checked with the arbiter for all endpoints.

Account for expired macaroons with WS subscriptions

I'm in the middle of #15/#17 and just realised something: if an app's permissions are revoked, then they can still continue to get data through a WS subscription, which is obviously not ideal. The reason for this, is that permissions are only checked on subscription (as opposed to every request, like the REST API). Similarly, WS access permissions are only checked on initial handshake.

For now, with macaroons being long-lived, this is not as big a problem, but still worth having an issue for. One option is that on initial verification, we schedule a macaroon's expiry by checking timestamp caveat (when we actually use that). Once that time has passed, and the app is still connected and receiving data, then the store severs that connection and unsubscribes all that app's subscriptions.

The main issue is that WS connections are also bound to macaroons, so when one expires, the app would need to re-initiate the connection, and resubscribe to all the endpoints it needs. Either way, this would mean that it's data stream could be interrupted as it regains access.

Alternatively, an app could preemptively initiate a new connection with a new macaroon while the old one is still active, and hot-swap the two streams. But that means extra complexity app-side, and is just plain messy. Same issues if instead of expiry we did anything else like "uses" or number of samples/datum.

Realistically, maybe we could have some sort of buffer that queues data between when an app's macaroon has expired, and when it has re-authorised itself with a new macaroon, checking that it's permissions are still the same and revoking subscriptions if not.

Thoughts?

Use WSS instead of WS

Title says it all — HTTPS is currently everywhere except the notification system. Secure WebSockets over plain WebSockets needed. Seems doable but a bit tricky based on what I've read so far.

Clarify values accepted/returned by store

for example:

  • is it correct that POST /kv will accept any JSON object (but not, e.g. an array or primitive) as seems to be implied by the README "{}"?
  • exactly what data structures are returned by GET /ts/latest, /ts/since and /range?
  • exactly what data structure(s) are emitted over the websocket in response to /sub for ts and for kv?

Fix actuation

Actuation is currently not working post-Hypercat update

Hypercat compliance

To enable Hypercat compliance this datastore needs to implement:

A /cat endpoint that returns a valid Hypercat catalog. This must be app facing

A /register/datasource endpoint that is driver facing to enable a driver to add their sources

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.