Git Product home page Git Product logo

tensorstore's Introduction

TensorStore

License PyPI Build Docs

TensorStore is an open-source C++ and Python software library designed for storage and manipulation of large multi-dimensional arrays that:

  • Provides advanced, fully composable indexing operations and virtual views.

  • Provides a uniform API for reading and writing multiple array formats, including zarr and N5.

  • Natively supports multiple storage systems, such as local and network filesystems, Google Cloud Storage, Amazon S3-compatible object stores, HTTP servers, and in-memory storage.

  • Offers an asynchronous API to enable high-throughput access even to high-latency remote storage.

  • Supports read caching and transactions, with strong atomicity, isolation, consistency, and durability (ACID) guarantees.

  • Supports safe, efficient access from multiple processes and machines via optimistic concurrency.

Documentation and installation instructions are at https://google.github.io/tensorstore.

Getting Started

To get started using the TensorStore Python API, you can install the tensorstore PyPI package using:

pip install tensorstore

Refer to the tutorials and API documentation, or the announcement on the Google Research Blog for more details.

This is not an officially supported Google product.

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this software except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

tensorstore's People

Contributors

chromehearts avatar ckennelly avatar cpgaffney1 avatar cushon avatar hectorgon avatar jbms avatar jongwooo avatar junyer avatar katre avatar laramiel avatar ldeakin avatar lionofjewdah avatar mkruskal-google avatar qrczakmk avatar rickeylev avatar shoyer avatar sjperkins avatar tjgq avatar vrabaud avatar yilei avatar yunni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tensorstore's Issues

Adding a metadata accessor for Tensorstore object

I have a tensorstore object that reads a OMETiff file the following after executing this piece of code

dataset_future = ts.open({ 
                    'driver'    : 'ometiff',
                    'kvstore'   : { 
                        'driver' : 'tiff',
                        'path' : "/home/samee/axle/data/r001_c001_z000.ome.tif",
                    },
                    'context': {
                        'cache_pool': {
                            'total_bytes_limit': 100_000_000
                        }
                    },
                })
dataset = dataset_future.result()
print(dataset)

I want to access the metadata section of this tensorstore using Python API.

TensorStore({
  'context': {
    'cache_pool': {'total_bytes_limit': 1000000000},
    'data_copy_concurrency': {},
    'file_io_concurrency': {},
  },
  'driver': 'ometiff',
  'dtype': 'uint16',
  'kvstore': {
    'driver': 'tiff',
    'path': '/home/samee/axle/data/r001_c001_z000.ome.tif/',
  },
  'metadata': {
    'BigEndian': 'false',
    'DimensionOrder': 'XYZCT',
    'ID': 'Pixels:0',
    'Interleaved': 'false',
    'PhysicalSizeX': '0.325',
    'PhysicalSizeXUnit': '\xc2\xb5m',
    'PhysicalSizeY': '0.325',
    'PhysicalSizeYUnit': '\xc2\xb5m',
    'SignificantBits': '16',
    'SizeC': '1',
    'SizeT': '1',
    'SizeX': '42906',
    'SizeY': '29286',
    'SizeZ': '1',
    'Type': 'uint16',
    'blockSize': [1, 1, 1, 1024, 1024],
    'dataType': 'uint16',
    'dimOrder': 2,
    'dimensions': [1, 1, 1, 29286, 42906],
    'tiffData': {'0': [0, 0, 0]},
  },
  'transform': {
    'input_exclusive_max': [[1], [1], [1], [29286], [42906]],
    'input_inclusive_min': [0, 0, 0, 0, 0],
  },
})

Is it possible to have an accessor that returns a Python dictionary from the metadata section?

Delete existing GCS precomputed volume

It does not seem possible to open an existing volume for overwrite. Did I get this incantation incorrect, or is this by design?

bucket = "..."

import tensorstore as ts
ds = ts.open({
    "driver": "neuroglancer_precomputed",
    "kvstore": {
        "driver": "gcs",
        "bucket": bucket,
    },
    "path": "tmp",
    "multiscale_metadata": {
        "type": "image",
        "data_type": "uint32",
        "num_channels": 1,
    },
    "scale_metadata": {
        "size": (64,64,64),
        "chunk_size": (64,64,64),
        "resolution": (1,1,1),
        "encoding": "raw",
    },
    "create": True,
    "delete_existing": True
}).result()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-603642a94535> in <module>
      1 import tensorstore as ts
----> 2 ds = ts.open({
      3     "driver": "neuroglancer_precomputed",
      4     "kvstore": {
      5         "driver": "gcs",

ValueError: Error opening "neuroglancer_precomputed" driver: KeyValueStore does not support deleting by range

Unhelpful error when trying to overwrite existing

With tensorstore.open called with delete_existing: True with n5 driver and gcs kv store the error reported is

ValueError: Error opening "n5" driver

The error still occurs when the path is deleted.

It works when delete_existing is False

creating a sharded store

Running this:

spec = {'driver': 'neuroglancer_precomputed',
     'kvstore': {'driver': 'file', 'path': kvstore_path},
     'path': pth,
     'scale_metadata': {'size': d.shape,
      'resolution': resolutions[idx],
      'encoding': 'jpeg',
      'jpeg_quality': 100,
      'sharding': sharding,
      'chunk_size': chunk_size,
      'key': keys[idx],
      'voxel_offset': (40, 40, 40)},
     'multiscale_metadata': {'data_type': d.dtype.name,
      'num_channels': 1,
      'type': 'image'}}

arr = ts.open(spec=spec, create=True,delete_existing=True).result()
arr.spec()

produces this output:

Spec({
  'driver': 'neuroglancer_precomputed',
  'dtype': 'uint8',
  'kvstore': {'driver': 'file', 'path': '/nrs/cosem/davis'},
  'multiscale_metadata': {'num_channels': 1, 'type': 'image'},
  'path': 'jrc_hela-2.precomputed_0',
  'scale_index': 0,
  'scale_metadata': {
    'chunk_size': [96, 96, 96],
    'encoding': 'jpeg',
    'key': 's1',
    'resolution': [2.0, 2.0, 2.0],
    'sharding': {
      '@type': 'neuroglancer_uint64_sharded_v1',
      'data_encoding': 'raw',
      'hash': 'identity',
      'minishard_bits': 6,
      'minishard_index_encoding': 'gzip',
      'preshift_bits': 9,
      'shard_bits': 15,
    },
    'size': [512, 512, 512],
    'voxel_offset': [40, 40, 40],
  },
  'transform': {
    'input_exclusive_max': [552, 552, 552, 1],
    'input_inclusive_min': [40, 40, 40, 0],
    'input_labels': ['x', 'y', 'z', 'channel'],
  },
})

but the underlying info file contains this:

{"@type":"neuroglancer_multiscale_volume",
"data_type":"uint8",
"num_channels":1,
"scales":[
{"chunk_sizes":[[96,96,96]],
"encoding":"jpeg",
"jpeg_quality":100,
"key":"s1",
"resolution":[2.0,2.0,2.0],
"size":[512,512,512],
"voxel_offset":[40,40,40]}],
"type":"image"}

with no references to the sharding specified in the spec. Additionally, later attempts to open this store for writing will fail, because the sharding config was not propagated to the info file.

Is there something I should be doing to get sharding working?

jpeg quality

How can I adjust the quality setting in jpeg compression? My cursory search through the codebase suggests it might be hard-coded to 75?

python API: tensorstore.array.shape type is `np.ndarray` whereas np.ndarray.shape is `tuple`

I noticed that .shape on a tensorstore array returns its shape as np.ndarray.
In contrast, .shape on a numpy ndarray, returns as a simple tuple.

This causes some issues when trying to pass a tensorstore array to functions that expect a numpy array-like argument.

E.g. when passing a tensorstore array arr as output argument to scipy.ndimage.affine transform:

affine_transform(np.zeros((100,100), dtype=np.uint16), matrix=np.array(np.eye(3)), output=arr)`

I get the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-7-b8875698d701> in <module>
----> 1 affine_transform(np.zeros((100,100), dtype=np.uint16), matrix=np.array(np.eye(3)), output=arr)

~/miniconda3/envs/napari/lib/python3.7/site-packages/scipy/ndimage/interpolation.py in affine_transform(input, matrix, offset, output_shape, output, order, mode, cval, prefilter)
    553     complex_output = numpy.iscomplexobj(input)
    554     output = _ni_support._get_output(output, input, shape=output_shape,
--> 555                                      complex_output=complex_output)
    556     if complex_output:
    557         kwargs = dict(offset=offset, output_shape=output_shape, order=order,

~/miniconda3/envs/napari/lib/python3.7/site-packages/scipy/ndimage/_ni_support.py in _get_output(output, input, shape, complex_output)
     91             raise RuntimeError("output must have complex dtype")
     92         output = numpy.zeros(shape, dtype=output)
---> 93     elif output.shape != shape:
     94         raise RuntimeError("output shape not correct")
     95     elif complex_output and output.dtype.kind != 'c':

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

This is because the shape comparison in line 93 gives a single truth value if the shapes are simple tuples of int, but not if output.shape is of type array.

I am not sure whether things would break elsewhere if tensorstore.array.shape properties were modified to be simple tuples of integers?

s3 support?

It seems that tensorstore does not support AWS S3, but only for GCS. Is there any plan to support it?

sharding advice

I'm trying to save a large uint8 array with dimensions == (7500, 3400, 10391) as a multiresolution pyramid in the precomputed sharded format with the following spec:

{
    "@type": "neuroglancer_multiscale_volume",
    "data_type": "uint8",
    "num_channels": 1,
    "scales": [
        {
            "chunk_sizes": [
                [
                    96,
                    96,
                    96
                ]
            ],
            "encoding": "jpeg",
            "jpeg_quality": 90,
            "key": "s0",
            "resolution": [
                4.0,
                4.0,
                4.28
            ],
            "sharding": {
                "@type": "neuroglancer_uint64_sharded_v1",
                "data_encoding": "raw",
                "hash": "identity",
                "minishard_bits": 6,
                "minishard_index_encoding": "gzip",
                "preshift_bits": 9,
                "shard_bits": 15
            },
            "size": [
                7500,
                3400,
                8158
            ],
            "voxel_offset": [
                0,
                0,
                0
            ]
        },
        {
            "chunk_sizes": [
                [
                    96,
                    96,
                    96
                ]
            ],
            "encoding": "jpeg",
            "jpeg_quality": 90,
            "key": "s1",
            "resolution": [
                8.0,
                8.0,
                8.56
            ],
            "sharding": {
                "@type": "neuroglancer_uint64_sharded_v1",
                "data_encoding": "raw",
                "hash": "identity",
                "minishard_bits": 6,
                "minishard_index_encoding": "gzip",
                "preshift_bits": 9,
                "shard_bits": 15
            },
            "size": [
                3750,
                1700,
                4079
            ],
            "voxel_offset": [
                0,
                0,
                0
            ]
        },
        {
            "chunk_sizes": [
                [
                    96,
                    96,
                    96
                ]
            ],
            "encoding": "jpeg",
            "jpeg_quality": 90,
            "key": "s2",
            "resolution": [
                16.0,
                16.0,
                17.119999999999997
            ],
            "sharding": {
                "@type": "neuroglancer_uint64_sharded_v1",
                "data_encoding": "raw",
                "hash": "identity",
                "minishard_bits": 6,
                "minishard_index_encoding": "gzip",
                "preshift_bits": 9,
                "shard_bits": 15
            },
            "size": [
                1875,
                850,
                2040
            ],
            "voxel_offset": [
                0,
                0,
                0
            ]
        },
        {
            "chunk_sizes": [
                [
                    96,
                    96,
                    96
                ]
            ],
            "encoding": "jpeg",
            "jpeg_quality": 90,
            "key": "s3",
            "resolution": [
                32.0,
                32.0,
                34.239999999999995
            ],
            "sharding": {
                "@type": "neuroglancer_uint64_sharded_v1",
                "data_encoding": "raw",
                "hash": "identity",
                "minishard_bits": 6,
                "minishard_index_encoding": "gzip",
                "preshift_bits": 9,
                "shard_bits": 15
            },
            "size": [
                938,
                425,
                1020
            ],
            "voxel_offset": [
                0,
                0,
                0
            ]
        },
        {
            "chunk_sizes": [
                [
                    96,
                    96,
                    96
                ]
            ],
            "encoding": "jpeg",
            "jpeg_quality": 90,
            "key": "s4",
            "resolution": [
                64.0,
                64.0,
                68.48000000000002
            ],
            "sharding": {
                "@type": "neuroglancer_uint64_sharded_v1",
                "data_encoding": "raw",
                "hash": "identity",
                "minishard_bits": 6,
                "minishard_index_encoding": "gzip",
                "preshift_bits": 9,
                "shard_bits": 15
            },
            "size": [
                469,
                213,
                510
            ],
            "voxel_offset": [
                0,
                0,
                0
            ]
        },
        {
            "chunk_sizes": [
                [
                    96,
                    96,
                    96
                ]
            ],
            "encoding": "jpeg",
            "jpeg_quality": 90,
            "key": "s5",
            "resolution": [
                128.0,
                128.0,
                136.96
            ],
            "sharding": {
                "@type": "neuroglancer_uint64_sharded_v1",
                "data_encoding": "raw",
                "hash": "identity",
                "minishard_bits": 6,
                "minishard_index_encoding": "gzip",
                "preshift_bits": 9,
                "shard_bits": 15
            },
            "size": [
                235,
                107,
                255
            ],
            "voxel_offset": [
                0,
                0,
                0
            ]
        }
    ],
    "type": "image"
}

Although the save completes without error, the resulting highest-resolution array seems incomplete when I view it with neuroglancer -- there are a lot of Error retrieving chunk errors emitted by the chunk worker, and a the total file size of the highest resolution array is clearly too small.

I'm saving the data to a network store using nfs, which might be causing problems if worker state is being synchronized via the file system. I will try saving the same dataset on a different file system and see if the problems are the same.

But beyond trying a different filesystem, do you have any advice for debugging this?

How to consume this library via CMake?

Building the library via CMake works. This is good - we don't need to know bazel. But the build process does not produce even an install target, much less a tensorstore-config.cmake file.

The main point of CMakeifying the library is to enable it to be used via CMake, preferably with find_package(tensorstore) and target_link_libraries(myLibrary tensorstore::all_drivers) (or something similar). A usage example is Qt. Somebody wrote a relatively short article about how to accomplish this.

If we have to discover the include directories, library files, and transitive dependencies, it does not really matter whether the library was compiled using CMake, bazel, autotools, or a totally custom bash script.

Request to add .shape and .ndim attributes to TensorStore objects

Thank you @jbms for answering my StackOverflow question!

For a bit of context, the reason @AbigailMcGovern and I need this is to use napari to paint segmentations into (parts of) .zarr files. Now we are able to use tensorstore (๐ŸŽ‰๐Ÿ™), but only after patching in .shape and .ndim attributes, which napari expects of its inputs at various points. These are quite standard in the Python array world, with NumPy, dask, zarr, PyTorch, and Tensorflow all supporting them, among others.

I understand API creep is real but I hope that these are standard enough to be considered for inclusion in tensorstore!

Thank you!

Flaky segfaults when performing many writes quickly

Hi!

I'm using jax with their jax.experimental.global_device_array.GlobalDeviceArrays and the serialisation code they come with, that uses tensorstore.

My use case (checkpointing big neural nets) requires serialising big (many leaves, big array leaves) pytrees of GDAs on large TPU VMs (many workers writing simultaneously).

The setup seems to be stable at a small scale but starts failing nondeterministically when we scale up our models and number of workers. The error we get looks as follows.

tensorstore/internal/oauth2/google_auth_provider.cc:173: Using credentials at xxx/.config/gcloud/application_default_credentials.json
tensorstore/internal/oauth2/google_auth_provider.cc:180: Using OAuth2 AuthProvider
*** SIGSEGV received at time=1644534310 on cpu 158 ***
PC: @     0x7fbefa8effd3  (unknown)  tcmalloc::ThreadCache::ReleaseToCentralCache()
    @     0x7fbefa8b23c0  (unknown)  (unknown)
    @                0x5  (unknown)  (unknown)
    @               0xef  (unknown)  (unknown)
    @               0xef  (unknown)  (unknown)
[2022-02-10 23:05:10,337 E 327783 330515] logging.cc:317: *** SIGSEGV received at time=1644534310 on cpu 158 ***
[2022-02-10 23:05:10,337 E 327783 330515] logging.cc:317: PC: @     0x7fbefa8effd3  (unknown)  tcmalloc::ThreadCache::ReleaseToCentralCache()
[2022-02-10 23:05:10,340 E 327783 330515] logging.cc:317:     @     0x7fbefa8b23c0  (unknown)  (unknown)
[2022-02-10 23:05:10,342 E 327783 330515] logging.cc:317:     @                0x5  (unknown)  (unknown)
[2022-02-10 23:05:10,347 E 327783 330515] logging.cc:317:     @               0xef  (unknown)  (unknown)
[2022-02-10 23:05:10,352 E 327783 330515] logging.cc:317:     @               0xef  (unknown)  (unknown)
Fatal Python error: Segmentation fault

Anecdotally, it seems that if the first checkpoint gets written ok, the following ones in the same training session also work. Perhaps that is an interesting hint. (I can't swear by it being an absolute rule for now.) The first save fails >50% (but <100%) of the time for one of our common use cases.


I'm aware this isn't much info. Don't hesitate to let me know how we could help you understand the problem better. Thank you!

Configure failure with CMake on Windows

I am trying to build current version (0702389) using CMake 3.24.1. I run into an error during configure step, using both VS2019 and VS2022.
configure.log
CMakeError.log
CMakeOutput.log
The relevant files are attached. The most interesting part seems:

Loading C:/Misc/tensorstore-vs22/_deps/liblzma-src/BUILD.bazel
Loading C:/Misc/tensorstore/utils.bzl

MSBuild version 17.3.1+2badb37d1 for .NET Framework
  Checking Build System
  Creating directories for 'openssl-populate'
  Building Custom Rule C:/Misc/tensorstore-vs22/_deps/openssl-subbuild/CMakeLists.txt
  Performing download step (download, verify and extract) for 'openssl-populate'
  -- Downloading...
     dst='C:/Misc/tensorstore-vs22/_deps/openssl-subbuild/openssl-populate-prefix/src/098695591f3a2665fccef83a3732ecfc99acdcdd.tar.gz'
     timeout='none'
     inactivity timeout='none'
  -- Using src='https://github.com/google/boringssl/archive/098695591f3a2665fccef83a3732ecfc99acdcdd.tar.gz'
  -- verifying file...
         file='C:/Misc/tensorstore-vs22/_deps/openssl-subbuild/openssl-populate-prefix/src/098695591f3a2665fccef83a3732ecfc99acdcdd.tar.gz'
  -- Downloading... done
  -- extracting...
       src='C:/Misc/tensorstore-vs22/_deps/openssl-subbuild/openssl-populate-prefix/src/098695591f3a2665fccef83a3732ecfc99acdcdd.tar.gz'
       dst='C:/Misc/tensorstore-vs22/_deps/openssl-src'
  -- extracting... [tar xfz]
  -- extracting... [analysis]
  -- extracting... [rename]
  -- extracting... [clean up]
  -- extracting... done
  No update step for 'openssl-populate'
  Performing patch step for 'openssl-populate'
  patching file BUILD
  Hunk #1 FAILED at 94 (different line endings).
  1 out of 1 hunk FAILED -- saving rejects to file BUILD.rej
C:\Program Files\Microsoft Visual Studio\2022\Community\MSBuild\Microsoft\VC\v170\Microsoft.CppCommon.targets(247,5): error MSB8066: Custom build for 'C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-mkdir.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-download.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-update.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-patch.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-configure.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-build.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-install.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\0ca2dbcd7bc8bdf4d7847cccd3911000\openssl-populate-test.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\85dde88c802439c23c62ca419e06eb4a\openssl-populate-complete.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeFiles\893e5776bf6d24ff7b85c024db008c0d\openssl-populate.rule;C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\CMakeLists.txt' exited with code 1. [C:\Misc\tensorstore-vs22\_deps\openssl-subbuild\openssl-populate.vcxproj]

CMake Error at C:/Program Files/CMake/share/cmake-3.24/Modules/FetchContent.cmake:1601 (message):
  Build step for openssl failed: 1
Call Stack (most recent call first):
  C:/Program Files/CMake/share/cmake-3.24/Modules/FetchContent.cmake:1741:EVAL:2 (__FetchContent_directPopulate)
  C:/Program Files/CMake/share/cmake-3.24/Modules/FetchContent.cmake:1741 (cmake_language)
  C:/Program Files/CMake/share/cmake-3.24/Modules/FetchContent.cmake:1955 (FetchContent_Populate)
  C:/Misc/tensorstore-vs22/third_party/CMakeLists.txt:3 (FetchContent_MakeAvailable)


Configuring incomplete, errors occurred!
See also "C:/Misc/tensorstore-vs22/CMakeFiles/CMakeOutput.log".
See also "C:/Misc/tensorstore-vs22/CMakeFiles/CMakeError.log".

As configure step is successful on Ubuntu 22.04, I assume this error is due to lack of testing on Windows (Hunk #1 FAILED at 94 (different line endings).)?

ValueError when using an array to index

I've been trying to diagnose an issue where I get a ValueError exception when using an index array. This happens very infrequently, but I am now able to reproduce it in the context of a specific input within a larger application.

Unfortunately, my efforts to demonstrate the bug with a snippet have been unsuccessful, but I'll update this issue if I can.

Given a numpy array co of shape=(1044, 3) and dtype=int32:

# This works
for row in range(co.shape[0]):
    x = vol[co[row, 0], co[row, 1], co[row, 2]].read().result()

# This also works
split = co.shape[0] // 2
x1 = vol[co[:split, 0], co[:split, 1], co[:split, 2]].read().result()
x2 = vol[co[split:, 0], co[split:, 1], co[split:, 2]].read().result()

# But this doesn't
co_id = vol[co[:, 0], co[:, 1], co[:, 2]].read().result()

The exception I get is:

  File "./app/process.py", line 39, in _get_ids
    co_id = vol[co[:, 0], co[:, 1], co[:, 2]].read().result()
ValueError: In index array map for output dimension 0: Index 28 is outside valid range [28928, 29056)

The error is confusing for two reasons. First, all of the values are within this range:

print(co[28])
print('0: ', np.min(co[:,0]), np.max(co[:,0]))
print('1: ', np.min(co[:,1]), np.max(co[:,1]))
print('2: ', np.min(co[:,2]), np.max(co[:,2]))

[28931 12703  3603]
0:  28928 29054
1:  12672 12777
2:  3586 3647

Secondly, the actual volume is much larger:

TensorStore({
  'context': {'cache_pool': {'total_bytes_limit': 200000000}},
  'driver': 'zarr',
  'dtype': 'int16',
  'key_encoding': '/',
  'kvstore': {
    'driver': 'file',
    'path': '/data/fields/flywire_v1/field.zarr/s2',
  },
  'metadata': {
    'chunks': [128, 128, 128, 2],
    'compressor': {'clevel': 9, 'cname': 'zstd', 'id': 'blosc', 'shuffle': 1},
    'dtype': '<i2',
    'fill_value': 0,
    'filters': None,
    'order': 'C',
    'shape': [63488, 36864, 7062, 2],
    'zarr_format': 2,
  },
  'recheck_cached_data': 'open',
  'transform': {
    'input_exclusive_max': [[67584], [36864], [7063], [2]],
    'input_inclusive_min': [4096, 0, 1, 0],
    'output': [
      {'input_dimension': 0, 'offset': -4096},
      {'input_dimension': 1},
      {'input_dimension': 2, 'offset': -1},
      {'input_dimension': 3},
    ],
  },
})

Build fails on Linux with GCC 9, 10, and 12

https://google.github.io/tensorstore/installation.html says Build dependencies are GCC 9 or later, but it seems that tensorstore cannot be built using GCC 9 or 10. What is the minimum required version of GCC/G++?

Environment: Ubuntu 20.04 Focal LTS
tensorstore: 0.1.28 (08-11-2022)

GCC 9.x: ./tensorstore/box.h:214:44: error: expected template-name before '<' token
  Use --sandbox_debug to see verbose messages from the sandbox
  In file included from ./tensorstore/box.h:30,
                   from ./tensorstore/internal/box_difference.h:20,
                   from tensorstore/internal/box_difference.cc:15:
  ./tensorstore/internal/multi_vector.h: In substitution of 'template<long int Extent, class ... Ts> using MultiVectorStorage = tensorstore::internal::MultiVectorStorageImpl<tenso
rstore::RankConstraint::FromInlineRank(Extent).tensorstore::RankConstraint::operator tensorstore::DimensionIndex(), tensorstore::InlineRankLimit(Extent), Ts ...> [with long int Ex
tent = Rank; Ts = {long int, long int}]':
  ./tensorstore/box.h:137:67:   required from here
  ./tensorstore/internal/multi_vector.h:105:58: error: taking address of rvalue [-fpermissive]
    105 |     MultiVectorStorageImpl<RankConstraint::FromInlineRank(Extent),
        |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~
  ./tensorstore/internal/multi_vector.h:106:58: error: no matching function for call to 'tensorstore::RankConstraint::operator tensorstore::DimensionIndex(tensorstore::RankConstra
int*)'
    106 |                            InlineRankLimit(Extent), Ts...>;
        |                                                          ^
  In file included from ./tensorstore/internal/multi_vector.h:29,
                   from ./tensorstore/box.h:30,
                   from ./tensorstore/internal/box_difference.h:20,
                   from tensorstore/internal/box_difference.cc:15:
  ./tensorstore/rank.h:127:13: note: candidate: 'constexpr tensorstore::RankConstraint::operator tensorstore::DimensionIndex() const'
    127 |   constexpr operator DimensionIndex() const { return rank; }
        |             ^~~~~~~~
  ./tensorstore/rank.h:127:13: note:   candidate expects 0 arguments, 1 provided
  In file included from ./tensorstore/internal/box_difference.h:20,
                   from tensorstore/internal/box_difference.cc:15:
  ./tensorstore/box.h:214:44: error: expected template-name before '<' token
    214 | class Box : public internal_box::BoxStorage<Rank> {
        |                                            ^
  ./tensorstore/box.h:214:44: error: expected '{' before '<' token
  cc1plus: warning: unrecognized command line option '-Wno-unknown-warning-option'
  Target //python/tensorstore:_tensorstore__shared_objects failed to build
GCC 10.x: python_headers/object.h:136:30: error: lvalue required as left operand of assignment
  ERROR: /tmp/pip-install-ltft0rab/tensorstore_0bc800fcb17142a7ba2120da8a360a3f/python/tensorstore/BUILD:381:20: Compiling python/tensorstore/bfloat16.cc failed:
 (Exit 1): gcc-10 failed: error executing command                                                                                                                
    (cd $HOME/.cache/bazel/_bazel_$USER/4694eb6f528e602d9a898e0775c25c1f/sandbox/linux-sandbox/1851/execroot/com_google_tensorstore && \                   
    exec env - \                                                                                                                                                 
      PATH=/bin:/usr/bin:/usr/local/bin \                                                                                                                        
      PWD=/proc/self/cwd \                                                                                                                                       

  (... omitted...)
                                                                                               
  In file included from bazel-out/k8-opt/bin/external/local_config_python/_virtual_includes/python_headers/Python.h:44,                                          
                   from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/detail/../detail/common.h:208,              
                   from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/detail/../attr.h:13,                        
                   from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/detail/class.h:12,                          
                   from bazel-out/k8-opt/bin/external/com_github_pybind_pybind11/_virtual_includes/pybind11/pybind11/pybind11.h:13,                              
                   from ./python/tensorstore/numpy.h:35,                                                                                                         
                   from python/tensorstore/bfloat16.cc:15:                                                                                                       
  python/tensorstore/bfloat16.cc: In function 'bool tensorstore::internal_python::{anonymous}::Initialize()':                                                    
  bazel-out/k8-opt/bin/external/local_config_python/_virtual_includes/python_headers/object.h:136:30: error: lvalue required as left operand of assignment       
    136 | #  define Py_TYPE(ob) Py_TYPE(_PyObject_CAST(ob))                                                                                                      
        |                       ~~~~~~~^~~~~~~~~~~~~~~~~~~~
  python/tensorstore/bfloat16.cc:774:3: note: in expansion of macro 'Py_TYPE'
    774 |   Py_TYPE(&NPyBfloat16_Descr) = &PyArrayDescr_Type;
        |   ^~~~~~~
  At global scope:
  cc1plus: note: unrecognized command-line option '-Wno-unknown-warning-option' may have been intended to silence earlier diagnostics
  Target //python/tensorstore:_tensorstore__shared_objects failed to build
GCC 12.x: com_google_boringssl/src/crypto/refcount_c11.c:29:15: error: expected declaration specifiers or '...'
In file included from external/com_google_boringssl/src/crypto/refcount_c11.c:15:
external/com_google_boringssl/src/crypto/internal.h: In function 'CRYPTO_load_word_be':
external/com_google_boringssl/src/crypto/internal.h:923:3: warning: implicit declaration of function 'static_assert' [-Wimplicit-function-declaration]
  923 |   static_assert(sizeof(v) == 8, "crypto_word_t has unexpected size");
      |   ^~~~~~~~~~~~~
      In file included from external/com_google_boringssl/src/crypto/internal.h:130:
external/com_google_boringssl/src/crypto/refcount_c11.c: At top level:
external/com_google_boringssl/src/crypto/refcount_c11.c:29:15: error: expected declaration specifiers or '...' before '_Alignof'
29 | static_assert(alignof(CRYPTO_refcount_t) == alignof(_Atomic CRYPTO_refcount_t),
      |               ^~~~~~~
external/com_google_boringssl/src/crypto/refcount_c11.c:30:15: error: expected declaration specifiers or '...' before string constant
   30 |               "_Atomic alters the needed alignment of a reference count");
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/com_google_boringssl/src/crypto/refcount_c11.c:31:15: error: expected declaration specifiers or '...' before 'sizeof'
   31 | static_assert(sizeof(CRYPTO_refcount_t) == sizeof(_Atomic CRYPTO_refcount_t),
      |               ^~~~~~
external/com_google_boringssl/src/crypto/refcount_c11.c:32:15: error: expected declaration specifiers or '...' before string constant
   32 |               "_Atomic alters the size of a reference count");
|               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
external/com_google_boringssl/src/crypto/refcount_c11.c:34:15: error: expected declaration specifiers or '...' before '(' token
   34 | static_assert((CRYPTO_refcount_t)-1 == CRYPTO_REFCOUNT_MAX,
|               ^
external/com_google_boringssl/src/crypto/refcount_c11.c:35:15: error: expected declaration specifiers or '...' before string constant
   35 |               "CRYPTO_REFCOUNT_MAX is incorrect");
|               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: note: unrecognized command-line option '-Wno-unknown-warning-option' may have been intended to silence earlier diagnostics
Target //python/tensorstore:_tensorstore__shared_objects failed to build
INFO: Elapsed time: 41.252s, Critical Path: 27.78s
INFO: 1243 processes: 162 internal, 1081 linux-sandbox.
FAILED: Build did NOT complete successfully

Command I ran in the above: pip install tensorstore

Custom CA cert file name

When I try to use tensorstore from the Janelia cluster, I encounter the following error:

ValueError: Error opening "neuroglancer_precomputed" driver:
Error reading "gs://neuroglancer-janelia-flyem-hemibrain/v1.1/segmentation/info":
CURL error[77] Problem with the SSL CA cert (path? access rights?):
error setting certificate verify locations:
  CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: none

Indeed, the file /etc/ssl/certs/ca-certificates.crt does not exist on our cluster nodes. Instead, I see /etc/ssl/certs/ca-bundle.crt, but unfortunately I don't have the necessary permissions to add a symlink.

Is it possible for the user to provide a custom file path for the cert file? Apparently the curl command line tool allows it to be configured at runtime, but I guess that configuration isn't built into libcurl itself.

Build issues in a clean environment

I tried to install tensorstore in a clean CentOS 8 container, but I run into a compilation error. The container is using gcc-c++-8.3.1 and python36-3.6.8.

Here's the error
    tensorstore/driver/neuroglancer_precomputed/driver.cc: In member function 'virtual tensorstore::Result<tensorstore::IndexTransform<> > tensorstore::internal_neuroglancer_precomputed::{anonymous}::DataCacheStateBase::GetExternalToInternalTransform(const void*, std::size_t)':
    tensorstore/driver/neuroglancer_precomputed/driver.cc:209:52: error: no matching function for call to 'tensorstore::IndexTransformBuilder<>::input_labels(<brace-enclosed initializer list>)'
         builder.input_labels({"x", "y", "z", "channel"});
                                                        ^
    In file included from tensorstore/driver/neuroglancer_precomputed/driver.cc:26:
    ./tensorstore/index_space/index_transform_builder.h:480:32: note: candidate: 'tensorstore::span<std::__cxx11::basic_string<char>, Count> tensorstore::IndexTransformBuilder<InputRank, OutputRank>::input_labels() [with long int InputRank = -1; long int OutputRank = -1]'
       span<std::string, InputRank> input_labels() {
                                    ^~~~~~~~~~~~
    ./tensorstore/index_space/index_transform_builder.h:480:32: note:   candidate expects 0 arguments, 1 provided
    ./tensorstore/index_space/index_transform_builder.h:496:3: note: candidate: 'template<class Labels> absl::enable_if_t<tensorstore::internal_index_space::IsStaticExtentCompatibleWithRange<InputRank, Indices>::value, tensorstore::IndexTransformBuilder<InputRank, OutputRank>&> tensorstore::IndexTransformBuilder<InputRank, OutputRank>::input_labels(const Labels&) [with Labels = Labels; long int InputRank = -1; long int OutputRank = -1]'
       input_labels(const Labels& labels) {
       ^~~~~~~~~~~~
    ./tensorstore/index_space/index_transform_builder.h:496:3: note:   template argument deduction/substitution failed:
    tensorstore/driver/neuroglancer_precomputed/driver.cc:209:52: note:   couldn't deduce template parameter 'Labels'
         builder.input_labels({"x", "y", "z", "channel"});
                                                        ^
    In file included from tensorstore/driver/neuroglancer_precomputed/driver.cc:26:
    ./tensorstore/index_space/index_transform_builder.h:507:26: note: candidate: 'template<long unsigned int N> tensorstore::IndexTransformBuilder<InputRank, OutputRank>& tensorstore::IndexTransformBuilder<InputRank, OutputRank>::input_labels(const string_view (&)[N]) [with long unsigned int N = N; long int InputRank = -1; long int OutputRank = -1]'
       IndexTransformBuilder& input_labels(const absl::string_view (&labels)[N]) {
                              ^~~~~~~~~~~~
    ./tensorstore/index_space/index_transform_builder.h:507:26: note:   template argument deduction/substitution failed:
    tensorstore/driver/neuroglancer_precomputed/driver.cc:209:52: note:   mismatched types 'std::basic_string_view<char>' and 'const char*'
         builder.input_labels({"x", "y", "z", "channel"});
                                                        ^

Here's my Dockerfile.

FROM centos:8

RUN yum update -y \
 && yum install -y \
            python3 \
            platform-python-devel \
            gcc-c++ \
 && yum clean all

RUN python3 -m pip install tensorstore -vv

CMD [ "/bin/bash" ]

Any ideas?

t5x checkpoint importer crashes

We've been using t5x to run some experiments. Until today it was successfully loading previous checkpoint, but with 0.1.14 it started crashing:

Traceback (most recent call last):
  File "/home/thomas/code/t5x/t5x/train.py", line 616, in <module>
    gin_utils.run(main)
  File "/home/thomas/code/t5x/t5x/gin_utils.py", line 103, in run
    app.run(
  File "/home/thomas/.local/lib/python3.8/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/home/thomas/.local/lib/python3.8/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/home/thomas/code/t5x/t5x/train.py", line 596, in main
    _main(argv)
  File "/home/thomas/code/t5x/t5x/train.py", line 614, in _main
    train_using_gin()
  File "/home/thomas/.local/lib/python3.8/site-packages/gin/config.py", line 1605, in gin_wrapper
    utils.augment_exception_message_and_reraise(e, err_str)
  File "/home/thomas/.local/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise
    raise proxy.with_traceback(exception.__traceback__) from None
  File "/home/thomas/.local/lib/python3.8/site-packages/gin/config.py", line 1582, in gin_wrapper
    return fn(*new_args, **new_kwargs)
  File "/home/thomas/code/t5x/t5x/train.py", line 324, in train
    train_state = train_state_initializer.from_checkpoint_or_scratch(
  File "/home/thomas/code/t5x/t5x/utils.py", line 528, in from_checkpoint_or_scratch
    return (self.from_checkpoint(ckpt_cfgs, ds_iter=ds_iter, init_rng=init_rng)
  File "/home/thomas/code/t5x/t5x/utils.py", line 513, in from_checkpoint
    train_states = list(
  File "/home/thomas/code/t5x/t5x/utils.py", line 499, in from_checkpoints
    yield _restore_path(ckpt_path, restore_cfg)
  File "/home/thomas/code/t5x/t5x/utils.py", line 461, in _restore_path
    return restore_checkpointer.restore(
  File "/home/thomas/code/t5x/t5x/checkpoints.py", line 811, in restore
    state_dict = self._read_state_from_tensorstore(
  File "/home/thomas/code/t5x/t5x/checkpoints.py", line 860, in _read_state_from_tensorstore
    state_dict = _run_future_tree(future_state_dict)
  File "/home/thomas/code/t5x/t5x/checkpoints.py", line 160, in _run_future_tree
    leaves = loop.run_until_complete(asyncio.gather(*future_leaves))
  File "/usr/lib/python3.8/asyncio/base_events.py", line 616, in run_until_complete
    return future.result()
  File "/home/thomas/code/t5x/t5x/checkpoint_importer.py", line 115, in _get_and_cast
    arr = await self._get_fn()  # pytype: disable=bad-return-type
  File "/home/thomas/code/t5x/t5x/checkpoints.py", line 1187, in _read_ts
    t = await ts.open(tmp_ts_spec_dict, open=True)
ValueError: Error opening "zarr" driver: Metadata at "gs://t5x-dummy-bucket/gs://{EXPERIMENT_NAME}/checkpoint_420000/state.param_states.decoder.layers_0.pre_mlp_layer_norm.scale.v/.zarray" does not exist

Reverting back to 0.1.13 worked. I'm guessing there was some breaking change?

pickling tensorstore.TensorStore

It appears that TensorStore objects cannot be pickled, at least by cloudpickle or the python standard library's pickle. This is a blocker for distributed computing with tensorstore arrays, e.g. via dask. What would need to change in order to get this resolved?

performance overwriting a shard

For my ingestion service, I am doing shard-aligned writes. However, sometimes, due to network issues, I could get restarts of some of my workers causing a shard-aligned overwrite. I have provisioned memory 2-3x greater than my shard size. I notice this operation often takes a really long time, longer than I would expect since the range for my write is the entire shard, so in some ways the write should be short-circuited.

Do you have any recommendations for speeding this up or wiping out a given shard through your API? I could potentially look for the shard and delete ahead of time but that then requires me to decode the sharding index format (not too difficult but potentially error prone o my part and non fully abstracting away some of the lower level details).

mixing sharded and unsharded scales in 'info'

thanks for making this package available!

I am attempting to write a volume in neuroglancer sharded format using the spec at the bottom of the email. When writing is finished, I see a directory for 32x32x32, 16x16x16, and 8x8x8 but not 64x64x64 and 128x128x128 (the two unsharded scales). Rather it stores these files flat in the directory with info which will lead to collisions. The neuroglancer web app correctly looks for files in 64x64x64 and 128x128x128.

Here is my info file, thanks for any help!

{"@type": "neuroglancer_multiscale_volume", "data_type": "uint8", "num_channels": 1, "scales": [{"chunk_sizes": [[64, 64, 64]], "encoding": "jpeg", "key": "8.0x8.0x8.0", "resolution": [8, 8, 8], "sharding": {"@type": "neuroglancer_uint64_sharded_v1", "hash": "identity", "minishard_bits": 3, "minishard_index_encoding": "gzip", "preshift_bits": 9, "shard_bits": 18}, "size": [16384, 16384, 1024], "realsize": [16384, 16384, 1024], "offset": [0, 0, 0], "realoffset": [0, 0, 0]}, {"chunk_sizes": [[64, 64, 64]], "encoding": "jpeg", "key": "16.0x16.0x16.0", "resolution": [16, 16, 16], "sharding": {"@type": "neuroglancer_uint64_sharded_v1", "hash": "identity", "minishard_bits": 0, "minishard_index_encoding": "gzip", "preshift_bits": 9, "shard_bits": 18}, "size": [8192, 8192, 512], "realsize": [8192, 8192, 512], "offset": [0, 0, 0], "realoffset": [0, 0, 0]}, {"chunk_sizes": [[64, 64, 64]], "encoding": "jpeg", "key": "32.0x32.0x32.0", "resolution": [32, 32, 32], "sharding": {"@type": "neuroglancer_uint64_sharded_v1", "hash": "identity", "minishard_bits": 0, "minishard_index_encoding": "gzip", "preshift_bits": 6, "shard_bits": 18}, "size": [4096, 4096, 256], "realsize": [4096, 4096, 256], "offset": [0, 0, 0], "realoffset": [0, 0, 0]}, {"chunk_sizes": [[64, 64, 64]], "encoding": "jpeg", "key": "64.0x64.0x64.0", "resolution": [64, 64, 64], "size": [2048, 2048, 128], "realsize": [2048, 2048, 128], "offset": [0, 0, 0], "realoffset": [0, 0, 0]}, {"chunk_sizes": [[64, 64, 64]], "encoding": "jpeg", "key": "128.0x128.0x128.0", "resolution": [128, 128, 128], "size": [1024, 1024, 64], "realsize": [1024, 1024, 64], "offset": [0, 0, 0], "realoffset": [0, 0, 0]}], "type": "image"}

No such file or directory: rules_nasm.bzl in user project

I am trying to update to latest version of tensorstore, going from b8949f2 to af7dbb9. I now get an error at configure time (log tail below). I assume that changes over the last week or two have introduced this. My build specification is here.

Loading C:/Dev/ITKIOOMEZarrNGFF-vs19/_deps/aom-src/BUILD.bazel
Using bzl library: @com_google_tensorstore//:rules_nasm.bzl at C:/Dev/ITKIOOMEZarrNGFF-vs19/_deps/tensorstore-src/rules_nasm.bzl
Loading C:/Dev/ITKIOOMEZarrNGFF-vs19/_deps/tensorstore-src/rules_nasm.bzl

Traceback (most recent call last):

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake.py", line 41, in <module>

    sys.exit(bazel_to_cmake.main.main())

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake\main.py", line 133, in main

    context.process_build_file(build_file)

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake\evaluation.py", line 457, in process_build_file

    self.process_build_content(build_file_path, self._load(build_file_path))

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake\evaluation.py", line 471, in process_build_content

    exec(compile(content, build_file_path, "exec"), scope)  # pylint: disable=exec-used

  File "C:/Dev/ITKIOOMEZarrNGFF-vs19/_deps/aom-src/BUILD.bazel", line 6, in <module>

    load("@com_google_tensorstore//:rules_nasm.bzl", "nasm_library")

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake\evaluation.py", line 593, in bazel_load

    library = self._context.get_library(library_target)

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake\evaluation.py", line 433, in get_library

    content = self._load(library_path)

  File "C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\tools\cmake\bazel_to_cmake\evaluation.py", line 451, in _load

    return pathlib.Path(path).read_text(encoding="utf-8")

  File "C:\Program Files\Python39\lib\pathlib.py", line 1266, in read_text

    with self.open(mode='r', encoding=encoding, errors=errors) as f:

  File "C:\Program Files\Python39\lib\pathlib.py", line 1252, in open

    return io.open(self, mode, buffering, encoding, errors, newline,

  File "C:\Program Files\Python39\lib\pathlib.py", line 1120, in _opener

    return self._accessor.open(self, flags, mode)

FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Dev\\ITKIOOMEZarrNGFF-vs19\\_deps\\tensorstore-src\\rules_nasm.bzl'

CMake Error at C:/Dev/ITKIOOMEZarrNGFF-vs19/_deps/aom-src/CMakeLists.txt:26 (execute_process):
  execute_process failed command indexes:

    1: "Child return code: 1"



Configuring incomplete, errors occurred!

The following file exists: C:\Dev\ITKIOOMEZarrNGFF-vs19\_deps\tensorstore-src\bazel\rules_nasm.bzl.

build: Setting path to system libs

The documentation explains how to build with system libraries instead of bazel-provided libraries, by using the TENSORSTORE_SYSTEM_LIBS environment variable:

export TENSORSTORE_SYSTEM_LIBS=se_haxx_curl,jpeg,com_google_boringssl

Is there a way to tell bazel where to look for those libraries, analogous to CMAKE_PREFIX_PATH?


Rationale:

I'm interested in building a tensorstore conda package, based on the conda-forge distirbution. As far as I can tell, conda-forge has all the necessary dependencies and compilers, with the exception of boringssl.

I think conda packages might be particularly useful for Mac users, since only Linux wheels are available currently. Unfortunately, pip install tensorstore fails on my Mac due to a Segmentation Fault in clang.

CMake support

CMake support will be added in the future.

This was written more than 2 years ago. The future is here, where is CMake support? ๐Ÿ˜„

Seriously, is there some plan or a timeline?

build issues in build_wheels_manylinux2014x86_64.sh script

Running build_wheels_manylinux2014x86_64.sh on ubuntu 18.04 errors out with the following error:

[1,146 / 1,148] Compiling python/tensorstore/index_space.cc; 35s processwrapper-sandbox
Target //python/tensorstore:_tensorstore__shared_objects up-to-date (nothing to build)
INFO: Elapsed time: 50.841s, Critical Path: 37.64s
INFO: 929 processes: 929 processwrapper-sandbox.
INFO: Build completed successfully, 1148 total actions
INFO: Build completed successfully, 1148 total actions
Downloading https://releases.bazel.build/3.4.1/release/bazel-3.4.1-linux-x86_64...
/opt/python/cp35-cp35m/lib/python3.5/site-packages/setuptools/distutils_patch.py:26: UserWarning: Distutils was imported before Setuptools. This usage is discouraged and may exhibit undesirable behaviors or errors. Please use Setuptools' objects directly or at least import Setuptools first.
  "Distutils was imported before Setuptools. This usage is discouraged "
error: [Errno 2] No such file or directory: 'bazel-bin/python/tensorstore/_tensorstore.so'

This error might be resaged by this earlier warning about symlinks:

INFO: Analyzed target //python/tensorstore:_tensorstore__shared_objects (68 packages loaded, 2553 targets configured).
INFO: Found 1 target...                                                                                                                                       
WARNING: failed to create one or more convenience symlinks for prefix 'bazel-':                                       
  cannot create symbolic link bazel-bin -> /tmp/.cache/bazel/_bazel_root/7e563aa8f1493c3ba33faa85b32d7a3c/execroot/com_github_google_tensorstore/bazel-out/k8-
opt/bin:  /io/bazel-bin (Permission denied)                                                                                                                     cannot create symbolic link bazel-testlogs -> /tmp/.cache/bazel/_bazel_root/7e563aa8f1493c3ba33faa85b32d7a3c/execroot/com_github_google_tensorstore/bazel-ou
t/k8-opt/testlogs:  /io/bazel-testlogs (Permission denied)             
  cannot create symbolic link bazel-out -> /tmp/.cache/bazel/_bazel_root/7e563aa8f1493c3ba33faa85b32d7a3c/execroot/com_github_google_tensorstore/bazel-out:  /
io/bazel-out (Permission denied)                                                                            
[0 / 2] checking cached actions    

New zlib version zlib-1.2.12 breaks build

On March 27 2022, a new version of zlib was released on zlib.net
zlib-1.2.12.tar.gz

This caused the tensorstore build to fail because the url: https://zlib.net/zlib-1.2.11.tar.gz doesn't exist,
it will say "File Not Found".

Suggested fix:

  1. Update the zlib url in tensorstore/third_party/net_zlib/workspace.bzl:

"https://zlib.net/zlib-1.2.11.tar.gz" -> "https://zlib.net/zlib-1.2.12.tar.gz"
replace sha with 91844808532e5ce316b3c010929493c0244f3d37593afd6de04f71821d5136d9

This will build zlib but create a conflict compiling png because the ZLIB_VERNUM won't match the one being used to
compile libpng, so bump that version too.

  1. In tensorstore/third_party/png/png.BUILD.bazel:

replace:
PNG_IN_SUBSTITUTIONS = {
"PNG_ZLIB_VERNUM 0": "PNG_ZLIB_VERNUM 0x12b0",
}

with:
PNG_IN_SUBSTITUTIONS = {
"PNG_ZLIB_VERNUM 0": "PNG_ZLIB_VERNUM 0x12c0",
}

These changes got the library compiled.

Compilation fails in Release mode on VS2019

Compiling the current version (f6da8b5) works well in Debug mode, but fail in Release and RelWithDebInfo modes. Incremental build log:

Build started...
1>------ Build started: Project: aom_config_h_tmpl, Configuration: RelWithDebInfo x64 ------
2>------ Build started: Project: aom_config_asm_tmpl, Configuration: RelWithDebInfo x64 ------
3>------ Build started: Project: aom_config_version_tmpl, Configuration: RelWithDebInfo x64 ------
4>------ Build started: Project: JPEG_configure_internal, Configuration: RelWithDebInfo x64 ------
5>------ Build started: Project: JPEG_jversion_config, Configuration: RelWithDebInfo x64 ------
6>------ Build started: Project: JPEG_configure, Configuration: RelWithDebInfo x64 ------
7>------ Build started: Project: dav1d_build_vcs_version_h, Configuration: RelWithDebInfo x64 ------
8>------ Build started: Project: dav1d_build_version_h, Configuration: RelWithDebInfo x64 ------
9>------ Build started: Project: tensorstore_proto_index_transform.proto__cc_protoc, Configuration: RelWithDebInfo x64 ------
10>------ Build started: Project: PNG_png_config, Configuration: RelWithDebInfo x64 ------
11>------ Build started: Project: tensorstore_proto_array.proto__cc_protoc, Configuration: RelWithDebInfo x64 ------
12>------ Build started: Project: tensorstore_internal_metrics_metrics.proto__cc_protoc, Configuration: RelWithDebInfo x64 ------
13>------ Build started: Project: tensorstore_proto_schema.proto__cc_protoc, Configuration: RelWithDebInfo x64 ------
14>------ Build started: Project: tensorstore_driver_cast_cast_test, Configuration: RelWithDebInfo x64 ------
15>------ Skipped Build: Project: NightlyMemoryCheck, Configuration: RelWithDebInfo x64 ------
15>Project not selected to build for this solution configuration
16>------ Skipped Build: Project: Continuous, Configuration: RelWithDebInfo x64 ------
16>Project not selected to build for this solution configuration 
17>------ Skipped Build: Project: Nightly, Configuration: RelWithDebInfo x64 ------
17>Project not selected to build for this solution configuration 
18>------ Skipped Build: Project: RUN_TESTS, Configuration: RelWithDebInfo x64 ------
18>Project not selected to build for this solution configuration 
19>------ Skipped Build: Project: Experimental, Configuration: RelWithDebInfo x64 ------
19>Project not selected to build for this solution configuration 
20>------ Build started: Project: aom_aom_dsp_rtcd, Configuration: RelWithDebInfo x64 ------
21>------ Build started: Project: aom_aom_scale_rtcd, Configuration: RelWithDebInfo x64 ------
22>------ Build started: Project: aom_av1_rtcd, Configuration: RelWithDebInfo x64 ------
23>------ Build started: Project: tensorstore_examples_extract_slice, Configuration: RelWithDebInfo x64 ------
14>cast_test.cc
23>extract_slice.obj : error LNK2019: unresolved external symbol "protected: void __cdecl riegeli::CFileWriterBase::InitializePos(struct _iobuf *,class std::optional<unsigned __int64>,bool)" (?InitializePos@CFileWriterBase@riegeli@@IEAAXPEAU_iobuf@@V?$optional@_K@std@@_N@Z) referenced in function "protected: void __cdecl riegeli::CFileWriter<class riegeli::OwnedCFile>::Initialize(class std::basic_string_view<char,struct std::char_traits<char> >,class riegeli::CFileWriterBase::Options &&)" (?Initialize@?$CFileWriter@VOwnedCFile@riegeli@@@riegeli@@IEAAXV?$basic_string_view@DU?$char_traits@D@std@@@std@@$$QEAVOptions@CFileWriterBase@2@@Z)
23>C:\Misc\tensorstore-vs19\bin\RelWithDebInfo\tensorstore_examples_extract_slice.exe : fatal error LNK1120: 1 unresolved externals
14>C:\Misc\tensorstore\tensorstore\driver\cast\cast_test.cc(59,1): error C2607: static assertion failed
14>C:\Misc\tensorstore\tensorstore\driver\cast\cast_test.cc(61,1): error C2607: static assertion failed
14>C:\Misc\tensorstore\tensorstore\driver\cast\cast_test.cc(63,1): error C2607: static assertion failed
14>C:\Misc\tensorstore\tensorstore\driver\cast\cast_test.cc(65,1): error C2607: static assertion failed
14>C:\Misc\tensorstore\tensorstore\driver\cast\cast_test.cc(83,1): error C2607: static assertion failed
14>C:\Misc\tensorstore\tensorstore\driver\cast\cast_test.cc(93,1): error C2607: static assertion failed
23>Done building project "tensorstore_examples_extract_slice.vcxproj" -- FAILED.
14>Done building project "tensorstore_driver_cast_cast_test.vcxproj" -- FAILED.
24>------ Skipped Build: Project: ALL_BUILD, Configuration: RelWithDebInfo x64 ------
24>Project not selected to build for this solution configuration 
========== Build: 16 succeeded, 2 failed, 630 up-to-date, 6 skipped ==========

The full log is attached.
Output-Build.txt

Slicing beyond array bounds

Most python array libraries, including numpy, zarr and h5py, allow slicing beyond the range of the underlying array. This is consistent with the python's list slicing where the slice bounds are reduced to the length of the array (e.g. [0,1][1:400] == [1]).

import numpy as np
import tensorstore as ts

array = np.array([[1,2],[3,4]])
# This works: output [[4]]
print(array[1:10,1:10])

# This throws an exception
ts.array(array)[1:10,1:10]

Overall, I think this difference in behavior is minor, and helped expose a bug with slicing in another codebase. ๐Ÿ˜„

setup.py doesn't mandate `numpy` and `wheel` pkgs

I just thought I'd open this as feedback to my first experience in following the install from src instructions.

For the record, I was using python setup.py develop from a fresh venv (on windows). I've been trying to keep my global site-packages rather clean lately.

On first try, it failed due to not having wheel installed. On second try it failed due to not having numpy installed. On third try, it seems to be doing fine, although it's going on 45+ minutes

[2,568 / 2,572] Compiling src/x86/itx_init_tmpl.c; 2878s local

Although it might be obvious (given the nature of the software), I noticed the numpy and wheel pkgs are not listed as part of the build dependencies.

Cannot write to string `dtype` stores

I'm trying to create a tensorstore for time series data. I have the numerical data in a tensorstore and now I'm trying to define a new tensorstore for the DateTime index. I read that TS doesn't support the zarr datetime dtype (dtype('<M8[ns]')) yet, so I just defined the store as string dtype like this:

index = ts.open({
    'driver': 'zarr',
     'kvstore': {
        'driver': 'file',
        'path': '/home/...',
    },
    'metadata': {
        'dtype': '|S1',
        'shape': [100000],
        'chunks': [128],
    },
    'create': True,
    'delete_existing': True
}).result()

Then I'm just casting a Pandas datetime index to a byte string.

So now I get

index[0, :].read().result()

>> [b'']

That's fine, it's just empty.

But I just can't succesfully write values to it. I'm trying e.g. index[0, 0].write(b'a').result() or variations but I just don't get any error etc and nothing is written, I just the the [b''] as above.

Also, using 'dtype': '|S2' just defines each row as a list of empty byte strings... is this intentional? I.e. should I define S1 for any length strings? Probably not... but how do I fill the values? And is there a silent error somewhere when writing fails?

distributed writes

I'm trying to write a tensorstore array using dask on a single machine. I have a bunch of worker processes that each a) call tensorstore.open(spec, write=True).result(), and then attempt to assign a numpy array to a contiguous, chunk-aligned range of indices in the resulting tensorstore array. This does not appear to be running in parallel.

How can I make this work? Are there any best practices for writing to tensorstore arrays from multiple processes that I should be aware of?

Regression in (possibly accidental) dtype comparison functionality

The following code works fine in 0.1.10 but fails with __eq__(): incompatible function arguments in 0.1.11:

import tensorstore as ts
import numpy as np
dataset = ts.open({
    'driver': 'n5',
    'kvstore': {
        'driver': 'file',
        'path': '/tmp/dataset',
    },
    'metadata': {
        'compression': {
            'type': 'gzip'
        },
        'dataType': 'uint32',
        'dimensions': [1000, 20000],
        'blockSize': [100, 100],
    },
    'create': True,
    'delete_existing': True,
}).result()

print(f'{(dataset.dtype == np.uint32)=}')
print(f'{(dataset.dtype == bool)=}')

Traceback:

Traceback (most recent call last):
  File "/Users/jni/projects/play/ts-eq.py", line 21, in <module>
    print(f'{(dataset.dtype == np.uint32)=}')
TypeError: __eq__(): incompatible function arguments. The following argument types are supported:
    1. (self: tensorstore.dtype, other: tensorstore.dtype) -> bool

Invoked with: dtype("uint32"), <class 'numpy.uint32'>

Adding driver for Tiff images

Hello there.
I am trying to see if I can add support for reading (and eventually writing) Tiff images using tensorstore. The motivation is to be able to read/write large OMETiff images. Initially I am following the png driver example to create an interface with libtiff library. But, longer term, I would prefer to do the IO in a chunked fashion. Since Tiff has the concept of tiles (X and Y axis) and IFDs (Z axis and additional dimensions, at least according to OME spec), I think chucked IO will work similar to zarr driver.

I am looking for some guidance for getting started. For example, if I have something like the following, how does information of which file need to be opened gets to the driver?

  tensorstore::Context context = Context::Default();
  TENSORSTORE_CHECK_OK_AND_ASSIGN(auto store, tensorstore::Open({{"driver", "tiff"},
                            {"kvstore", {{"driver", "file"},
                                         {"path", "p01_x01_y01_wx0_wy0_c1.ome.tif"}}
                            }},
                            context,
                            tensorstore::OpenMode::open,
                            tensorstore::RecheckCached{false},
                            tensorstore::ReadWriteMode::read).result());

Apple M1 support

is it possible to install on Apple M1? I wasn't able to install via pip

Incorrect values when querying a large number of points in a zarr array

We are seeing bogus results when querying a large number of locations referenced from another array (e.g, vol[query[:, 0], query[:, 1], query[:, 2]]). This only occurs when the length of query is sufficiently large (~1000), but the exact value seems to be transient between runs.

I am using tensorstore==0.1.17 with Python 3.9.10 on macOS 12.2.1.

Sample code:

import numpy as np
import pandas as pd
import tensorstore as ts

def main():
    length = 50

    vol = ts.open({
            'driver' : 'zarr',
            'kvstore': {
                'driver': 'memory'
            },
            'metadata': {
                'chunks': [length, length, length],
                'dtype': '<i2',
                'shape': [length, length, length],
                'fill_value': 1
            },
            'create': True
    }).result()

    print("Full set of values [should be [1]]: ", np.unique(vol))

    for count in [5, 50, 500, 5000, 50000, 500000]:
        rng = np.random.default_rng(12345)
        query = rng.integers(0, length, size=(count,3))
        co_id = vol[query[:, 0], query[:, 1], query[:, 2]].read().result()

        print(f"Return values with {count} queries [should be [1]]", np.unique(co_id))

if __name__ == "__main__":
    main()

The output (varies by run):

Full set of values [should be [1]]:  [1]
Return values with 5 queries [should be [1]] [1]
Return values with 50 queries [should be [1]] [1]
Return values with 500 queries [should be [1]] [1]
Return values with 5000 queries [should be [1]] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49]
Return values with 50000 queries [should be [1]] [-30462 -30186 -29326 -23438 -22926 -22670 -22414 -21646 -17550 -10894
 -10240  -9728  -9643  -9377  -7168  -6912  -6542  -6030  -5006  -1422
   -910   -512   -254      0      1      2      3      4      5      6
      7      8      9     10     11     12     13     14     15     16
     17     18     19     20     21     22     23     24     25     26
     27     28     29     30     31     32     33     34     35     36
     37     38     39     40     41     42     43     44     45     46
     47     48     49     83     87     89     92     93     97     99
    100    102    113    120    124    132    150    160    161    165
    217    231    236    237    251    253    254    256    258    259
    260    264    265    268    270    297    318    355    370    372
    512    513    515    520    524    569    622    768    774    780
    788    986   1024   1025   1040   1123   1138   1242   1282   1283
   1300   1394   1536   1540   1560   1650   1754   1816   1820   1906
   2010   2048   2051   2052   2080   2125   2418   2560   2561   2930
   3073   3120   3194   3442   3587   3640   3674   3802   3836   3930
   3954   4058   4186   4442   4466   4474   4718   4730   4860   4954
   4978   5466   5490   5978   6002   6185   6441   6656   6697   6746
   6770   7026   7282   7794   7801   8050   8306   8506   8818  10098
  10496  10542  10574  10591  10610  10988  11378  11801  11822  11877
  11884  11887  11890  11891  11892  15368  16150  16961  17152  17683
  17765  17920  17990  17998  18028  19004  19779  19968  19970  20046
  20052  21106  21362  21582  21588  21619  21874  23040  23666  24330
  24366  24415  24420  24421  24428  24434  24435  24932  24933  24941
  24942  24944  24946  24948  25193  25203  25390  25455  25464  25646
  25695  25697  25701  25710  25714  25902  25951  25955  25956  25964
  25968  25970  25974  26207  26212  26222  26473  26478  26739  26926
  26950  26992  26996  27743  27749  27753  27768  28001  28013  28015
  28257  28261  28265  28271  28511  28515  28516  28521  28522  28525
  28526  28530  28531  28532  28685  28692  28693  28695  28697  28783
  29184  29186  29187  29203  29262  29279  29281  29283  29285  29286
  29292  29294  29295  29296  29298  29299  29300  29301  29440  29441
  29537  29550  29554  29704  29761  29797  29801  29806  29808  29810
  29813  29998  30047  30052  30210  30466  30580  30789  30821  31088
  31090  31092  31233  31602  32512  32514  32532]
Return values with 500000 queries [should be [1]] [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
 48 49]

Segfault when using PyTorch Dataloader with multiple workers

I wrote a pytorch Dataset class with TensorStore, which has a __getitem__ method something like this:

def __getitem__(self, index):
        data_slice = self.data[index:index + 1].read().result() 

where self.data = ts.open(config).result() is a TensorStore dataset instance (GCS driver). This works fine.

Then I'm using the PyTorch Dataloader like this:

dataloader = td.DataLoader(dataset,
                           batch_size=16,
                           num_workers=num_workers)

where num_workers determines the number of python subprocesses launched, which will each download and process the data before collating all the data into a single tensor.

Everything works fine when num_workers = 0, but if num_workers > 0, I get a segfault in one of the worker processes.

I suspect it's the async .read().result() in the getitem... is TensorStore using multithreading under the hood which interferes with the Dataloader multiprocessing somehow? Can I turn it off somehow?

Or am I supposed to load all the data "lazily" and only call .return() after I have all the data retrieved and loaded? That would be a bit problematic since I would still need multiprocessing to handle any data preprocessing before packing it into a tensor and sending to GPU...

Is it possible at all to use TensorStore with a PyTorch Dataloader in the multiple workers context?

force write zeros

I am trying to write an array of stimulus for visualization purposes, and so the behavior of skipping writing chunks with all zeros is not appropriate for this use case. I couldn't find an option to override this default behavior... am I missing it?

Problem to build the package on Arm 64

Hello I am trying to build the package on graviton machines. However I found a couple of issues when trying to build

  • First the function normalized_machine_arch_name return aarch64 instead of arm64. A similar issue to the amd64 and x86_64
  • Second after changing the platform However I got this error:
root@0f737d02df82:/tensorstore# vim bazelisk.py 
root@0f737d02df82:/tensorstore# python3 setup.py develop
/usr/local/python/lib/python3.10/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
  warnings.warn(
/tensorstore/.eggs/setuptools_scm-7.0.4-py3.10.egg/setuptools_scm/git.py:135: UserWarning: "/tensorstore" is shallow and may cause errors
  warnings.warn(f'"{wd.path}" is shallow and may cause errors')
running develop
/usr/local/python/lib/python3.10/site-packages/setuptools/command/easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/usr/local/python/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running egg_info
writing python/tensorstore.egg-info/PKG-INFO
writing dependency_links to python/tensorstore.egg-info/dependency_links.txt
writing requirements to python/tensorstore.egg-info/requires.txt
writing top-level names to python/tensorstore.egg-info/top_level.txt
adding license file 'LICENSE'
adding license file 'AUTHORS'
writing manifest file 'python/tensorstore.egg-info/SOURCES.txt'
running build_ext
/usr/local/python/bin/python3 -u bazelisk.py build -c opt //python/tensorstore:_tensorstore__shared_objects --verbose_failures
Downloading https://releases.bazel.build/5.0.0/release/bazel-5.0.0-linux-arm64...
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
DEBUG: Rule 'org_aomedia_avif' indicated that a canonical reproducible form can be obtained by modifying arguments sha256 = "6dce70530fa750544ba842e40555825e8894e8f805cf5c458ee0642a60d160e5"
DEBUG: Repository org_aomedia_avif instantiated at:
  /tensorstore/WORKSPACE:7:25: in <toplevel>
  /tensorstore/external.bzl:20:29: in tensorstore_dependencies
  /tensorstore/third_party/third_party.bzl:54:26: in third_party_dependencies
  /tensorstore/third_party/org_aomedia_avif/workspace.bzl:26:10: in repo
  /root/.cache/bazel/_bazel_root/69db9dbb901e67963054399656f8c5ae/external/bazel_tools/tools/build_defs/repo/utils.bzl:233:18: in maybe
Repository rule third_party_http_archive defined at:
  /tensorstore/third_party/repo.bzl:116:43: in <toplevel>
INFO: Repository se_curl instantiated at:
  /tensorstore/WORKSPACE:7:25: in <toplevel>
  /tensorstore/external.bzl:20:29: in tensorstore_dependencies
  /tensorstore/third_party/third_party.bzl:64:17: in third_party_dependencies
  /tensorstore/third_party/se_curl/workspace.bzl:23:10: in repo
  /root/.cache/bazel/_bazel_root/69db9dbb901e67963054399656f8c5ae/external/bazel_tools/tools/build_defs/repo/utils.bzl:233:18: in maybe
Repository rule third_party_http_archive defined at:
  /tensorstore/third_party/repo.bzl:116:43: in <toplevel>
INFO: Repository org_tukaani_xz instantiated at:
  /tensorstore/WORKSPACE:7:25: in <toplevel>
  /tensorstore/external.bzl:20:29: in tensorstore_dependencies
  /tensorstore/third_party/third_party.bzl:59:24: in third_party_dependencies
  /tensorstore/third_party/org_tukaani_xz/workspace.bzl:20:10: in repo
  /root/.cache/bazel/_bazel_root/69db9dbb901e67963054399656f8c5ae/external/bazel_tools/tools/build_defs/repo/utils.bzl:233:18: in maybe
Repository rule third_party_http_archive defined at:
  /tensorstore/third_party/repo.bzl:116:43: in <toplevel>
ERROR: /root/.cache/bazel/_bazel_root/69db9dbb901e67963054399656f8c5ae/external/rules_perl/BUILD:67:23: While resolving toolchains for target @rules_perl//:current_toolchain: no matching toolchains found for types @rules_perl//:toolchain_type
ERROR: Analysis of target '//python/tensorstore:_tensorstore__shared_objects' failed; build aborted: 
INFO: Elapsed time: 20.217s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (117 packages loaded, 3145 targets configured)
    Fetching https://curl.se/download/curl-7.80.0.tar.gz; 21,091B
    Fetching https://tukaani.org/xz/xz-5.2.5.tar.gz
error: command '/usr/local/python/bin/python3' failed with exit code 1

It appears to be a problem with perl dependency. However I am not that proficiently with the bazel build system.

thanks

possible memory leak

This code generates a sharded image volume on the filesystem, then inserts data into the store in consecutive slabs. Memory usage goes up by ~10% every time that final saving loop completes. Is this a memory leak in the c++ code? The increasing memory consumption is a problem for a scaled-up version of this workflow.

import numpy as np
import tensorstore as ts
import os
import psutil
from tqdm import tqdm
process = psutil.Process(os.getpid())
save_step = 64

kvstore_path = '/nrs/cosem/davis'
pth = 'memory_leak_test.precomputed'
chunk_size = (96,) * 3
shape = (2**12,2**11,2**11)  

# 16 GB of data
data = np.random.randint(0, 255, shape, dtype='uint8')
key = 's0'
resolution = np.array([1.0, 1.0, 1.0])

# sharding settings based on the google copy of hemibrain
sharding = {'@type': 'neuroglancer_uint64_sharded_v1',
           'preshift_bits': 9,
            'hash': 'identity',
            'minishard_index_encoding': 'gzip',
           'minishard_bits': 2,
           'shard_bits': 15}


spec = {'driver': 'neuroglancer_precomputed',
 'kvstore': {'driver': 'file', 'path': kvstore_path},
 'path': pth,
 'scale_metadata': {'size': data.shape,
  'resolution': resolution,
  'encoding': 'jpeg',
  'jpeg_quality': 100,
    'sharding': sharding,
  'chunk_size': chunk_size,
  'key': key,
  'voxel_offset': (0, 0, 0)},
 'multiscale_metadata': {'data_type': data.dtype.name,
  'num_channels': 1,
  'type': 'image'}}


arr = ts.open(spec=spec, create=True, delete_existing=True).result()
slices = [slice(idx * save_step, save_step * (idx + 1), None) for idx in range(data.shape[0]//save_step)]

print('start bytes: ', process.memory_info().rss)
for sl in tqdm(slices):
    arr[ts.d['channel'][0]][sl] = data[sl]
print('end bytes:   ', process.memory_info().rss)

"<i1" and "<u1" unsupported by zarr driver

When trying to write an array of bytes, an error is reported: INVALID_ARGUMENT: Error parsing object member "metadata": Error parsing object member "dtype": Unsupported zarr dtype: "<i1".

When trying to read, the error is slightly different: FAILED_PRECONDITION: Error opening "zarr" driver: Error reading local file "C:/a/cthead1.zarr/image/.zarray": Error parsing object member "dtype": Unsupported zarr dtype: "<u1" [tensorstore_spec='{\"context\":{\"cache_pool\":{},\"data_copy_concurrency\":{},\"file_io_concurrency\":{}},\"driver\":\"zarr\",\"kvstore\":{\"driver\":\"file\",\"path\":\"C:/a/cthead1.zarr/image/\"},\"recheck_cached_data\":false,\"recheck_cached_metadata\":false}']

Mixed dtypes or align data with a different dtype index

Hi! TensorStore seems super amazing, can't wait to try this out!

I have lots of time series data with a pandas DateTime index. It would be nice to be able to store also the index, perhaps in a different TensorStore but aligned with the actual data TensorStore, which is just floats.

Is this currently possible?

Documentation: Performance Comparison

Cool job, unifying these python projects. I was wondering if there would be a way of documenting the performance impact of using this library over e.g a vanilla python zarr implementation. In a perfect world over a few scenarios like heavily io-bound scenarios like http-stores/drivers?

varying scale order in info

relates to google/neuroglancer#237

Occasionally I observe that the order of the scales attribute in info has an unexpected order, e.g., the s0 scale is sometimes the last element rather than the first. This has only happened twice so far, and the mis-ordering was different both times, which isn't a lot to go on I'm afraid :)

I initialize my collection of tensorstore arrays with this function, which receives a list of xarray.DataArray, a list of keys (e.g., s0, s1, etc), and other args. It basically calls tensorstore.open(spec, **kwargs) sequentially in a loop, where spec is varying.

the NicerTensorStore object is an out-dated workaround to the pickling issue that I don't really need anymore, so ignore that :)

def prepare_tensorstore(pyr: List[xarray.DataArray], level_names: List[str], jpeg_quality: int, output_chunks: Tuple[int], root_container_path: str):
    store_arrays = []
    for p, ln in zip(pyr, level_names):
        res = [float(p.coords[k][1] - p.coords[k][0]) for k in p.dims]
        spec = {'driver': 'neuroglancer_precomputed',
        'kvstore': {'driver': 'file', 
                    'path': str(Path(root_container_path).parent)},
        'path': root_container_path.parts[-1],
        'scale_metadata': {'size': p.shape,
        'resolution': res,
        'encoding': 'jpeg',
        'jpeg_quality': jpeg_quality,
        'chunk_size': output_chunks,
        'key': ln,
        'voxel_offset': (0, 0, 0)},
        'multiscale_metadata': {'data_type': p.dtype.name,
        'num_channels': 1,
        'type': 'image'}}
        try: 
            ts.open(spec=spec, open=True).result()
        except ValueError:
            try:
                ts.open(spec=spec, create=True).result()
            except ValueError:
                ts.open(spec=spec, create=True, delete_existing=True).result()

        nicer_array = NicerTensorStore(spec=spec, open_kwargs={'write': True})
        store_arrays.append(nicer_array)
    return store_arrays

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.