janelia-flyem / dvid Goto Github PK
View Code? Open in Web Editor NEWDistributed, Versioned, Image-oriented Dataservice
Home Page: http://dvid.io
License: Other
Distributed, Versioned, Image-oriented Dataservice
Home Page: http://dvid.io
License: Other
The following should give 512x512x512 regions. The first substack returned is larger in Z
curl emdata2:8000/api/node/628/mbroi/partition?batchsize=16
We should start specifying the schema for the messages sent to/from dvid, and dvid should validate the schemas. Most likely, the schemas will be stored in a separate repo (for example, dvidschemas), and pulled into dvid as a submodule as part of the dvid build.
When an incorrectly formatted API call is given, e.g., a 2d size is given for a 3d request, the server has an error and recovers incompletely, not fulfilling later requests even though the system mostly stays up.
Example:
GET "/api/node/bf1/bodies/raw/0_1_2/749_617/2714_3292_2440"
The size is 2d and causes panic on conversion to 3d point.
build dvid
gives following error on Fedora 20
--- FAIL: TestParseInSydney (0.00 seconds)
format_test.go:201: ParseInLocation(Feb 01 2013 EST, Sydney) = 2013-02-01 00:00:00 +0000 EST, want 2013-02-01 00:00:00 +1100 AEDT
FAIL
FAIL time 2.508s
ok unicode 0.013s
ok unicode/utf16 0.002s
ok unicode/utf8 0.003s
? unsafe [no test files]
make[3]: *** [/home/jah/BUILDEM/src/golang-1.3.1-stamp/golang-1.3.1-stupid_step] Error 1
make[2]: *** [CMakeFiles/golang-1.3.1.dir/all] Error 2
make[1]: *** [CMakeFiles/dvid.dir/rule] Error 2
make: *** [dvid] Error 2
In the DVID REST API, the following call returns a json file describing the axes, resolution, pixel type, etc.
/api/node/<UUID>/<data name>/schema
The word "schema" here is misleading, because that word is traditionally used to describe the structure of e.g. a json or xml tree. This data is not a schema in that sense -- I think the better term is "metadata".
Also: at some point, we will start publishing the json schemas for the messages produced by certain DVID API calls. To avoid confusion, we should not overload the term "schema" for API calls that return anything other than a true json schema.
Rather than have DVID process internal voxel blocks into requested subvolumes and planes, allow a lower latency API call: given a block index + # of blocks along x, returns optionally default compressed block data. This minimizes processing on DVID side. This also fits into how Ting requests grayscale data from arbitrarily shaped bodies.
Because Scality sproxyd driver is not an ordered key-value store, I'll have to store keys in a separate, fast store (SmallData store) or do brute force check on every key within a range.
If we do have to store keys, it makes sense to implement content addressable hashing for versioning since we are already paying the extra round-trip to get keys.
Allow types to control whether some of their API endpoints are logged. This would allow simple status API to not clutter the log. It's already done for /api/load.
Currently, there is a mix of "go get" in the CMakeLists.txt and go package dependencies that are locked via a git repo "github.com/janelia-flyem/go". The latter is preferable because multiple version control systems under the "go get" umbrella, e.g., hg and bazaar, do not have to be installed in target computers. Also, we can lock down particular versions of the go packages.
Issues with current system:
GOPATH=myrepos:$GOPATH
to prioritize our locked versions of packages. This is how go-deps works. Currently, we must modify import paths in source code.References to various Go dependency and build approaches:
Please provide an API that allows me to retrieve a representative point for a given label. The client will specify a label and his/her current coordinates, DVID should return a point where that label can be found. The returned point should ideally be close to the provided point.
When generating labels via segmentation, labels64 is the natural datatype to use. When revising said segmentation, it needs to be in labelmap form. Please provide a mechanism to create a label map instance from a labels64 instance.
If Raveler has a bad label mapping, e.g., superpixel X is present in raster but is not present in labelmap, don't abort processing as soon as its hit. Instead, give the bad superpixel a body 0 label and continue processing other voxels in the block.
non-interactive requests might have to be flagged by the client because it depends how the clients use DVID. For example, Steve has a cluster job status system that polls the keyvalue type.
To accomodate these cases, I've added a query string "interactive=0" or "interactive=false" that allows client to mark a call as non-interactive. Fixed in ef5af7d
Requested by Stephan Gerhard for CATMAID
It should be possible for a GET/POST request of an ND volume to result in a 'busy' status if DVID is in fact busy with another GET/POST request. Perhaps, adding a new URI for such a call would be sufficient (otherwise you might need to check the size of a request to see if it is a small ND volume or not). While a sophisticated, global log system could better indicate "load" on a DVID server, a datatype-specific queue is probably more than sufficient.
When a key/value pair is added to DVID through the interface, the value must be a file. It would be good to have the ability to pipe stdin as the value as well.
In ND-data API, it would be nice of the volume metadata info also included information about the native block shape. This would allow clients to (optionally) choose efficient request block boundaries when requesting lots of ND data.
I try creating a new keyvalue datatype with:
curl -X POST http://emdata1/api/dataset/339/new/keyvalue/classifiers -d '{}'
and get:
ERROR using REST API: Config data structure has not been initialized (/api/dataset/339/new/keyvalue/classifiers). Use 'dvid help' to get proper API request format.
I cannot delete datatype instances anymore. I could before.
DVID should have the ability to automatically create tiles based on labels pushed to the server.
I'm wondering if perhaps the 'voxels' datatype specification is a little more flexible than we need. I think clients would benefit from a modest simplification. For purposes of discussion, here's an example metadata request and the corresponding json response:
GET /api/node/abc123/my_rgb_volume/metadata
...
{
"Axes": [
{
"Label": "X",
"Resolution": 3.1,
"Units": "nanometers",
"Size": 100
},{
"Label": "Y",
"Resolution": 3.1,
"Units": "nanometers",
"Size": 200
},{
"Label": "Z",
"Resolution": 40,
"Units": "nanometers",
"Size": 400
}
],
"Values": [
{
"DataType": "uint8",
"Label": "intensity-R"
},
{
"DataType": "uint8",
"Label": "intensity-G"
},
{
"DataType": "uint8",
"Label": "intensity-B"
}
]
}
In the example above, all three channels ("Values") is a uint8. However, the current API seems to allow each pixel to be composed of channels with multiple datatypes. That is, "R" could be uint8
while "G" could be float32
. This means that clients can't treat the resulting data as a simple ND array. While that isn't impossible to deal with, it complicates the clients' job.
In numpy, for example, one could use structured arrays, but I'm not quite sure if the data can be copied directly to/from the raw buffer returned by DVID (I'd have to do some experiments). In C++, even more manual work has to be done. I don't think VIGRA (for example) has a way of dealing with such data directly. It would likely need to be copied into separate arrays anyway.
Are there any known use cases for the ability to mix pixel types within a single image? If not, I propose disallowing it. If we do, the DVID metadata response will look something like the following. (As a side note, I think the term "channels" is more descriptive than "values" in this context -- but that's a minor detail.)
{
"Axes": [
{
"Label": "X",
"Resolution": 3.1,
"Units": "nanometers",
"Size": 100
},{
"Label": "Y",
"Resolution": 3.1,
"Units": "nanometers",
"Size": 200
},{
"Label": "Z",
"Resolution": 40,
"Units": "nanometers",
"Size": 400
}
],
"DataType": "uint8",
"Channels": ["intensity-R", "intensity-G", "intensity-B"]
}
}
Or if you want to get a little more fancy, we can still leave room for additional per-channel metadata, such as the range of possible values in each channel (if it happens to be known):
{
"Axes": [
{
"Label": "X",
"Resolution": 3.1,
"Units": "nanometers",
"Size": 100
},{
"Label": "Y",
"Resolution": 3.1,
"Units": "nanometers",
"Size": 200
},{
"Label": "Z",
"Resolution": 40,
"Units": "nanometers",
"Size": 400
}
],
"DataType": "float32",
"Channels": [
{
"Label": "indicator-red",
"Range": [0.0, 100.0]
},
{
"Label": "indicator-green",
"Range": [0.0, 750.0]
}
]
}
Please ensure returned tiles are always TileSize x TileSize where tiles outside of the main extents are just all 0.
The data returned calling raw from the labelmap, sp2body, is different from the data returned by the labels64, bodies, in the FIB25 stack. I seem to only get a single label id back when calling the labelmap.
When requesting the metadata for a grayscale uint8 volume, I received the following json. Note that the "Values" section lists the pixel type as "T" instead of "uint8". That's a bug, right?
{
"Axes": [
{
"Label": "X",
"Resolution": 10,
"Units": "nanometers",
"Size": 900,
"Offset": 0
},
{
"Label": "Y",
"Resolution": 10,
"Units": "nanometers",
"Size": 1000,
"Offset": 0
},
{
"Label": "Z",
"Resolution": 10,
"Units": "nanometers",
"Size": 800,
"Offset": 0
}
],
"Values": [
{
"T": 0,
"Label": "grayscale"
}
]
}
Note that in the "Values"
Right now the parameters used to create a new volume in dvid are not well documented. But beyond that, it would be nice if the client could specify exactly what the datatype of the pixels are. For example, all of the information in the metadata
json should be provided when creating a new volume.
This would require at least the following enhancements:
voxels
data with an arbitrary number of channels (currently the client is limited to the predefined datatypes, e.g. grayscale8
, rgba8
).float32
as a pixel typeCI test fails on foreground roi test occasionally. See https://drone.io/github.com/janelia-flyem/dvid/360 and following run succeeds. Likely some kind of race condition involving the status of the foreground ROI.
I would like to add information to a DVID datatype on how that datatype was created. In particular, I want to associate segmentation settings used to generate a given labels64. This should probably be generalized for all DVID datatypes.
When requesting a data volume "schema" (a.k.a. metadata -- see Issue #11), the response comes back with MIME type application/vnd.dvid-nd-data+json
. But this call does not include any binary ND data -- it is pure json.
The MIME type of the response should therefore be application/json
, and the application/vnd.dvid-nd-data+json
MIME type should be reserved for actual volume data as requested in this GET
request:
GET /api/node/<UUID>/<data name>/raw/<dims>/<size>/<offset>[/<format>]
There should be a low-resolution version of the sparse volume viewer that will render typical cell shapes in a fraction of a second. The denormalizations supporting this viewer should be updated efficiently when label merges are performed.
Two sub requirements:
a) The client should be able to specify a bounding box (often just a plane) that will be display in the viewer.
b) The user should be able to retrieve rough x,y,z coordinates by picking locations on the body.
It would be good to have this ready within the next few weeks, but the slower viewer is probably tolerable for now.
For simplicity and rapid prototyping, there should be an endpoint for accessing ROI datasets via the usual voxels nd-data API. Specifically:
Scanning dependencies of target test
runtime.main: call to external function main.main
runtime.main: undefined: main.main
Unlike leveldb variants, computation of spatial index & label indices get slower over time:
2014/04/02 00:32:01 Adding spatial information from label volume superpixels for mapping sp2body...
2014/04/02 00:32:46 Processed all superpixels blocks for layer 1/205: 44.93275195s
2014/04/02 00:33:43 Processed all superpixels blocks for layer 2/205: 56.618448829s
…
2014/04/02 01:00:27 Processed all superpixels blocks for layer 21/205: 2m17.899856337s
…
2014/04/02 04:28:03 Processed all superpixels blocks for layer 56/205: 10m13.924529914s
…
2014/04/02 08:46:46 Processed all superpixels blocks for layer 76/205: 14m52.106129852s
2014/04/02 09:01:51 Processed all superpixels blocks for layer 77/205: 15m5.189581523s
Track down this issue and see if it's inefficient processing independent of storage engine or something that lmdb handles poorly compared to leveldb.
Return a sparse volume with grayscale values
Addressing issue #47 DVID should optionally email admins if any kind of panic (even those from which it recovered) occurs.
DVID currently allows choice of Snappy or LZ4 for compressing data into key/value store. Allow selection of compression per data instance, storing the selected compression into that instance record. This is also first step to returning compressed data w/o processing on DVID-side for a request, e.g., storing gzipped tiles that are simply returned on request.
Also add gzip and possibly bzip2 compression as options, including ability to select level of compression from 1 (fastest) to 9 (most compression).
When writing splits of a body back into DVID, the old body, which should be completely gone, left some fragments in sparsevol. The body labels are fine.
Right now DVID returns 400 (Bad Request) when the client requests an item that does not exist in the server. In such cases, a 404 error (Not Found) might make more sense. For example, what status code should DVID return in response to the following query?
GET /api/node/my_dset/doesnt_exist/metadata
On a related note, there's also a question regarding what DVID should return when the user has posted data, and DVID sends back an empty response. For example:
POST /api/node/abc123/mydata/keyA
If successful, DVID will return an empty response body. Should the status code be 200 (OK) or 204 (No Content)? I can see arguments for both cases.
I have a 1020x1020x1020 dataset named "gigacube" which I've initialized as follows:
dvid node a7e6 gigacube load 0,0,0 "/magnetic/gigacube_pngs/*.png"
Requesting a .png works just fine:
http://localhost:8000/api/node/a7/gigacube/raw/0_1/512_256/0_0_100
But when I attempt to request raw nd-data, I get back all zeros. For example, using dvidclient
, I can check the returned data:
In [9]: from dvidclient.volume_client import VolumeClient
In [10]: vol_client = VolumeClient( "localhost:8000", "a7", "gigacube" )
In [11]: cutout_array = vol_client.retrieve_subvolume( (0,0,0,0), (1,100,100,100) )
In [12]: print cutout_array.sum()
0
The REST API call used in the above example is something like this:
http://localhost:8000/api/node/a7/gigacube/raw/0_1_2/100_100_100/0_0_0
DVID is sending back a properly formatted message, with the correct buffer size. In the dvid server log, I see nothing unusual:
2014/03/18 17:40:21 HTTP GET: 3d volume (100,100,100) at offset (0,0,0) (/api/node/a7/gigacube/raw/0_1_2/100_100_100/0_0_0): 10.88681ms
Request from Stephan Gerhard. If voxel data is anisotropic, XZ, YZ tile generation can optionally produce isotropic tiles, which will require interpolation from original voxel data. There is a bandwidth and CPU utilization arguments for not doing interpolation on server side, and just transmitting anisotropic tiles that get scaled appropriately on client-side.
After more though, I think default behavior should be non-isotropic tile generation but if a "isotropic=true" parameter is supplied to tile generation command, DVID will produce isotropic data.
Perhaps allow paging via a query string if the number of keys is very large.
If I do
dvid node c7 raw load local xy 0,0,0 raw/.png
dvid node c7 membranes load local xy 0,0,0 membranes/.png
The dataset of raw is overwritten by membranes, e.g. as seen when doing an API call to fetch the images.
Should be able to specify email to be notified, maximum log file sizes, etc.
Allow multi-scale sparse volumes and surfaces. This will also be useful for Ting's split tools.
Add query string "noblanks=on" to return 404 if tile request is outside stored range. Otherwise, use default of blank tile return.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.