janelia-flyem / dvid Goto Github PK

View Code? Open in Web Editor NEW

195.0 19.0 33.0 15.63 MB

Distributed, Versioned, Image-oriented Dataservice

Home Page: http://dvid.io

License: Other

Go 99.39% Python 0.37% Makefile 0.12% Shell 0.09% Dockerfile 0.02%

go dataservice image-storage http-service connectomics big-data neuroscience key-value versioning

dvid's Issues

ROI not giving a regular substack size

The following should give 512x512x512 regions. The first substack returned is larger in Z

curl emdata2:8000/api/node/628/mbroi/partition?batchsize=16

Support schema validation for all messages.

We should start specifying the schema for the messages sent to/from dvid, and dvid should validate the schemas. Most likely, the schemas will be stored in a separate repo (for example, dvidschemas), and pulled into dvid as a submodule as part of the dvid build.

Make API more robust to errors

When an incorrectly formatted API call is given, e.g., a 2d size is given for a 3d request, the server has an error and recovers incompletely, not fulfilling later requests even though the system mostly stays up.

Example:

GET "/api/node/bf1/bodies/raw/0_1_2/749_617/2714_3292_2440"

The size is 2d and causes panic on conversion to 3d point.

Problems with Atlantic time?

build dvid
gives following error on Fedora 20
--- FAIL: TestParseInSydney (0.00 seconds)
format_test.go:201: ParseInLocation(Feb 01 2013 EST, Sydney) = 2013-02-01 00:00:00 +0000 EST, want 2013-02-01 00:00:00 +1100 AEDT
FAIL
FAIL time 2.508s
ok unicode 0.013s
ok unicode/utf16 0.002s
ok unicode/utf8 0.003s
? unsafe [no test files]
make[3]: *** [/home/jah/BUILDEM/src/golang-1.3.1-stamp/golang-1.3.1-stupid_step] Error 1
make[2]: *** [CMakeFiles/golang-1.3.1.dir/all] Error 2
make[1]: *** [CMakeFiles/dvid.dir/rule] Error 2
make: *** [dvid] Error 2

Request for API name change: "schema" should be "metadata"

In the DVID REST API, the following call returns a json file describing the axes, resolution, pixel type, etc.

/api/node/<UUID>/<data name>/schema

The word "schema" here is misleading, because that word is traditionally used to describe the structure of e.g. a json or xml tree. This data is not a schema in that sense -- I think the better term is "metadata".

Also: at some point, we will start publishing the json schemas for the messages produced by certain DVID API calls. To avoid confusion, we should not overload the term "schema" for API calls that return anything other than a true json schema.

Support google datastore as storage engine

Permit fast voxel block retrieval API

Rather than have DVID process internal voxel blocks into requested subvolumes and planes, allow a lower latency API call: given a block index + # of blocks along x, returns optionally default compressed block data. This minimizes processing on DVID side. This also fits into how Ting requests grayscale data from arbitrarily shaped bodies.

Support Scality as BigData key-value store

Because Scality sproxyd driver is not an ordered key-value store, I'll have to store keys in a separate, fast store (SmallData store) or do brute force check on every key within a range.

If we do have to store keys, it makes sense to implement content addressable hashing for versioning since we are already paying the extra round-trip to get keys.

Turn off logging for specified API endpoints

Allow types to control whether some of their API endpoints are logged. This would allow simple status API to not clutter the log. It's already done for /api/load.

Refactor build process

Currently, there is a mix of "go get" in the CMakeLists.txt and go package dependencies that are locked via a git repo "github.com/janelia-flyem/go". The latter is preferable because multiple version control systems under the "go get" umbrella, e.g., hg and bazaar, do not have to be installed in target computers. Also, we can lock down particular versions of the go packages.

Issues with current system:

Mix of go package inclusions across CMake and via the janelia-flyem/go repo. Should only be one, preferably a "dvid-deps" repo with all versions of all dependencies.
Dependencies of included go packages will reference packages outside janelia-flyem/go repo. Better to reference the standard import path and use GOPATH=myrepos:$GOPATH to prioritize our locked versions of packages. This is how go-deps works. Currently, we must modify import paths in source code.

References to various Go dependency and build approaches:

labelmap find closest representative point for a label

Please provide an API that allows me to retrieve a representative point for a given label. The client will specify a label and his/her current coordinates, DVID should return a point where that label can be found. The returned point should ideally be close to the provided point.

Create labelmap identity from labels64

When generating labels via segmentation, labels64 is the natural datatype to use. When revising said segmentation, it needs to be in labelmap form. Please provide a mechanism to create a label map instance from a labels64 instance.

Don't amplify bad labelmap

If Raveler has a bad label mapping, e.g., superpixel X is present in raster but is not present in labelmap, don't abort processing as soon as its hit. Instead, give the bad superpixel a body 0 label and continue processing other voxels in the block.

Allow background processes for batch jobs like tile generation, etc.

non-interactive requests might have to be flagged by the client because it depends how the clients use DVID. For example, Steve has a cluster job status system that polls the keyvalue type.

To accomodate these cases, I've added a query string "interactive=0" or "interactive=false" that allows client to mark a call as non-interactive. Fixed in ef5af7d

Allow standard tiff image import

Requested by Stephan Gerhard for CATMAID

Expand API grayscale / label64 ND-volume GET/POST to indicate whether DVID is busy

It should be possible for a GET/POST request of an ND volume to result in a 'busy' status if DVID is in fact busy with another GET/POST request. Perhaps, adding a new URI for such a call would be sufficient (otherwise you might need to check the size of a request to see if it is a small ND volume or not). While a sophisticated, global log system could better indicate "load" on a DVID server, a datatype-specific queue is probably more than sufficient.

Allow putting stdin as a value in the key/value pair

When a key/value pair is added to DVID through the interface, the value must be a file. It would be good to have the ability to pipe stdin as the value as well.

API enhancement: blockshape in nd-data volume metadata

In ND-data API, it would be nice of the volume metadata info also included information about the native block shape. This would allow clients to (optionally) choose efficient request block boundaries when requesting lots of ND data.

Error creating a new keyvalue datatype

I try creating a new keyvalue datatype with:

curl -X POST http://emdata1/api/dataset/339/new/keyvalue/classifiers -d '{}'

and get:

ERROR using REST API: Config data structure has not been initialized (/api/dataset/339/new/keyvalue/classifiers). Use 'dvid help' to get proper API request format.

Delete doesn't work

I cannot delete datatype instances anymore. I could before.

Automatic tiling computation

DVID should have the ability to automatically create tiles based on labels pushed to the server.

Discussion: constraints on 'voxels' datatype

I'm wondering if perhaps the 'voxels' datatype specification is a little more flexible than we need. I think clients would benefit from a modest simplification. For purposes of discussion, here's an example metadata request and the corresponding json response:

GET  /api/node/abc123/my_rgb_volume/metadata

...

{
    "Axes": [
        {
            "Label": "X",
            "Resolution": 3.1,
            "Units": "nanometers",
            "Size": 100
        },{
            "Label": "Y",
            "Resolution": 3.1,
            "Units": "nanometers",
            "Size": 200
        },{
            "Label": "Z",
            "Resolution": 40,
            "Units": "nanometers",
            "Size": 400
        }
    ],
    "Values": [
        {
            "DataType": "uint8",
            "Label": "intensity-R"
        },
        {
            "DataType": "uint8",
            "Label": "intensity-G"
        },
        {
            "DataType": "uint8",
            "Label": "intensity-B"
        }
    ]
}

In the example above, all three channels ("Values") is a uint8. However, the current API seems to allow each pixel to be composed of channels with multiple datatypes. That is, "R" could be uint8 while "G" could be float32. This means that clients can't treat the resulting data as a simple ND array. While that isn't impossible to deal with, it complicates the clients' job.

In numpy, for example, one could use structured arrays, but I'm not quite sure if the data can be copied directly to/from the raw buffer returned by DVID (I'd have to do some experiments). In C++, even more manual work has to be done. I don't think VIGRA (for example) has a way of dealing with such data directly. It would likely need to be copied into separate arrays anyway.

Are there any known use cases for the ability to mix pixel types within a single image? If not, I propose disallowing it. If we do, the DVID metadata response will look something like the following. (As a side note, I think the term "channels" is more descriptive than "values" in this context -- but that's a minor detail.)

{
    "Axes": [
        {
            "Label": "X",
            "Resolution": 3.1,
            "Units": "nanometers",
            "Size": 100
        },{
            "Label": "Y",
            "Resolution": 3.1,
            "Units": "nanometers",
            "Size": 200
        },{
            "Label": "Z",
            "Resolution": 40,
            "Units": "nanometers",
            "Size": 400
        }
    ],
    "DataType": "uint8",
    "Channels": ["intensity-R", "intensity-G", "intensity-B"]
    }
}

Or if you want to get a little more fancy, we can still leave room for additional per-channel metadata, such as the range of possible values in each channel (if it happens to be known):

{
    "Axes": [
        {
            "Label": "X",
            "Resolution": 3.1,
            "Units": "nanometers",
            "Size": 100
        },{
            "Label": "Y",
            "Resolution": 3.1,
            "Units": "nanometers",
            "Size": 200
        },{
            "Label": "Z",
            "Resolution": 40,
            "Units": "nanometers",
            "Size": 400
        }
    ],
    "DataType": "float32",
    "Channels": [
        {
            "Label": "indicator-red",
            "Range": [0.0, 100.0]
        },
        {
            "Label": "indicator-green",
            "Range": [0.0, 750.0]
        }
    ]
}

default multi-rez tile behavior on request outside of main extents

Please ensure returned tiles are always TileSize x TileSize where tiles outside of the main extents are just all 0.

labelmap raw does not equal labels64 raw

The data returned calling raw from the labelmap, sp2body, is different from the data returned by the labels64, bodies, in the FIB25 stack. I seem to only get a single label id back when calling the labelmap.

Bug in "schema" (a.k.a. metadata) json

When requesting the metadata for a grayscale uint8 volume, I received the following json. Note that the "Values" section lists the pixel type as "T" instead of "uint8". That's a bug, right?

{
  "Axes": [
    {
      "Label": "X",
      "Resolution": 10,
      "Units": "nanometers",
      "Size": 900,
      "Offset": 0
    },
    {
      "Label": "Y",
      "Resolution": 10,
      "Units": "nanometers",
      "Size": 1000,
      "Offset": 0
    },
    {
      "Label": "Z",
      "Resolution": 10,
      "Units": "nanometers",
      "Size": 800,
      "Offset": 0
    }
  ],
  "Values": [
    {
      "T": 0,
      "Label": "grayscale"
    }
  ]
}

Note that in the "Values"

Revise volume creation parameters for REST API

Right now the parameters used to create a new volume in dvid are not well documented. But beyond that, it would be nice if the client could specify exactly what the datatype of the pixels are. For example, all of the information in the metadata json should be provided when creating a new volume.

This would require at least the following enhancements:

DVID needs to support voxels data with an arbitrary number of channels (currently the client is limited to the predefined datatypes, e.g. grayscale8, rgba8).
DVID needs to support float32 as a pixel type

foreground roi test fails occasionally

CI test fails on foreground roi test occasionally. See https://drone.io/github.com/janelia-flyem/dvid/360 and following run succeeds. Likely some kind of race condition involving the status of the foreground ROI.

Ability to associate meta data for labels64 datatype (all datatypes)

I would like to add information to a DVID datatype on how that datatype was created. In particular, I want to associate segmentation settings used to generate a given labels64. This should probably be generalized for all DVID datatypes.

MIME type of "schema" json response should just be "application/json"

When requesting a data volume "schema" (a.k.a. metadata -- see Issue #11), the response comes back with MIME type application/vnd.dvid-nd-data+json. But this call does not include any binary ND data -- it is pure json.

The MIME type of the response should therefore be application/json, and the application/vnd.dvid-nd-data+json MIME type should be reserved for actual volume data as requested in this GET request:

GET  /api/node/<UUID>/<data name>/raw/<dims>/<size>/<offset>[/<format>]

Add head command to check if a key exists

CORS is not enabled for non-datatype API calls

Low-res 3D body viewer

There should be a low-resolution version of the sparse volume viewer that will render typical cell shapes in a fraction of a second. The denormalizations supporting this viewer should be updated efficiently when label merges are performed.

Two sub requirements:

a) The client should be able to specify a bounding box (often just a plane) that will be display in the viewer.

b) The user should be able to retrieve rough x,y,z coordinates by picking locations on the body.

It would be good to have this ready within the next few weeks, but the slower viewer is probably tolerable for now.

Don't save purely black (0 intensity) blocks for voxels data types.

New endpoint for ROI as voxels nd-data

For simplicity and rapid prototyping, there should be an endpoint for accessing ROI datasets via the usual voxels nd-data API. Specifically:

Available as plain ND-data, not RLE.
For simplicity, the mask data should be provided at full resolution, just like the grayscale data. Yes, this wastes space on the wire, because the ROI is defined block-wise. But it is dirt simple and will let us move quickly to start using ROI masks right away.
Data should be of type uint8, where 1 means "inside the roi" and 0 means "outside the roi"

make test failed on Mac OS X 10.7.5

Scanning dependencies of target test

github.com/janelia-flyem/dvid

runtime.main: call to external function main.main
runtime.main: undefined: main.main

Lightning MDB storage engine gets slower as more labelmap indices are added.

Unlike leveldb variants, computation of spatial index & label indices get slower over time:

2014/04/02 00:32:01 Adding spatial information from label volume superpixels for mapping sp2body...
2014/04/02 00:32:46 Processed all superpixels blocks for layer 1/205: 44.93275195s
2014/04/02 00:33:43 Processed all superpixels blocks for layer 2/205: 56.618448829s
…
2014/04/02 01:00:27 Processed all superpixels blocks for layer 21/205: 2m17.899856337s
…
2014/04/02 04:28:03 Processed all superpixels blocks for layer 56/205: 10m13.924529914s
…
2014/04/02 08:46:46 Processed all superpixels blocks for layer 76/205: 14m52.106129852s
2014/04/02 09:01:51 Processed all superpixels blocks for layer 77/205: 15m5.189581523s

Track down this issue and see if it's inefficient processing independent of storage engine or something that lmdb handles poorly compared to leveldb.

body with grayscale value

Return a sparse volume with grayscale values

Add ability to email contacts in case of DVID panic

Addressing issue #47 DVID should optionally email admins if any kind of panic (even those from which it recovered) occurs.

Add ability to choose compression per data instance.

DVID currently allows choice of Snappy or LZ4 for compressing data into key/value store. Allow selection of compression per data instance, storing the selected compression into that instance record. This is also first step to returning compressed data w/o processing on DVID-side for a request, e.g., storing gzipped tiles that are simply returned on request.

Also add gzip and possibly bzip2 compression as options, including ability to select level of compression from 1 (fastest) to 9 (most compression).

sparsevol denormalization in splitting

When writing splits of a body back into DVID, the old body, which should be completely gone, left some fragments in sparsevol. The body labels are fine.

Error when using labelmap GET on 3d volume.

http status code refinements

Right now DVID returns 400 (Bad Request) when the client requests an item that does not exist in the server. In such cases, a 404 error (Not Found) might make more sense. For example, what status code should DVID return in response to the following query?

GET /api/node/my_dset/doesnt_exist/metadata

On a related note, there's also a question regarding what DVID should return when the user has posted data, and DVID sends back an empty response. For example:

POST /api/node/abc123/mydata/keyA

If successful, DVID will return an empty response body. Should the status code be 200 (OK) or 204 (No Content)? I can see arguments for both cases.

nd-data API returns all zeros

I have a 1020x1020x1020 dataset named "gigacube" which I've initialized as follows:

dvid node a7e6 gigacube load 0,0,0 "/magnetic/gigacube_pngs/*.png"

Requesting a .png works just fine:

http://localhost:8000/api/node/a7/gigacube/raw/0_1/512_256/0_0_100

But when I attempt to request raw nd-data, I get back all zeros. For example, using dvidclient, I can check the returned data:

In [9]: from dvidclient.volume_client import VolumeClient
In [10]: vol_client = VolumeClient( "localhost:8000", "a7", "gigacube" )
In [11]: cutout_array = vol_client.retrieve_subvolume( (0,0,0,0), (1,100,100,100) )
In [12]: print cutout_array.sum()
0

The REST API call used in the above example is something like this:

http://localhost:8000/api/node/a7/gigacube/raw/0_1_2/100_100_100/0_0_0

DVID is sending back a properly formatted message, with the correct buffer size. In the dvid server log, I see nothing unusual:

2014/03/18 17:40:21 HTTP GET: 3d volume (100,100,100) at offset (0,0,0) (/api/node/a7/gigacube/raw/0_1_2/100_100_100/0_0_0): 10.88681ms

Support isotropic tile generation from anisotropic data.

Request from Stephan Gerhard. If voxel data is anisotropic, XZ, YZ tile generation can optionally produce isotropic tiles, which will require interpolation from original voxel data. There is a bandwidth and CPU utilization arguments for not doing interpolation on server side, and just transmitting anisotropic tiles that get scaled appropriately on client-side.

After more though, I think default behavior should be non-isotropic tile generation but if a "isotropic=true" parameter is supplied to tile generation command, DVID will produce isotropic data.

Add API to allow listing of all keys within a keyvalue type

Perhaps allow paging via a query string if the number of keys is very large.

Add rotating logs.

Pushing images to dataset overwrites image content of other datasets

If I do

dvid node c7 raw load local xy 0,0,0 raw/.png
dvid node c7 membranes load local xy 0,0,0 membranes/.png

The dataset of raw is overwritten by membranes, e.g. as seen when doing an API call to fetch the images.

janelia-flyem / dvid Goto Github PK

dvid's Issues

github.com/janelia-flyem/dvid

Recommend Projects

Recommend Topics

Recommend Org