Git Product home page Git Product logo

Comments (8)

itsgifnotjiff avatar itsgifnotjiff commented on August 9, 2024 2

Mr. @jameshawkes and Mrs. @mathleur , this is very promissing and I will try and run some benchmarks on a 4Tb NWP Model Outputs TileDB Array. Is there any special querries you would suggest to stress test Polytope?

from polytope.

jameshawkes avatar jameshawkes commented on August 9, 2024 1

Hi @itsgifnotjiff ,

@mathleur has made a minimal example showing use of tileDB though the xarray engine:
https://gist.github.com/mathleur/7314e497d4fdd0224a3a3a8d92ba58d2

Would be good to understand if there are disadvantages to this xarray engine.

from polytope.

nkules avatar nkules commented on August 9, 2024 1

Hi @itsgifnotjiff and @jameshawkes, happy to help answer any questions regarding TileDB. You may see some decreases in performance compared to accessing TileDB directly since there are a few more steps going through the xarray backend but it should be completely doable workflow wise. The example from @mathleur looks good.

Yordan if you're using some of the datasets you've already ingested into TileDB it would just be using everything after opening the dataset in xarray (line 44 onward in the example). If you do notice that performance is outside what you'd find acceptable we're happy to help investigate to figure out where the bottleneck might be.

from polytope.

mathleur avatar mathleur commented on August 9, 2024 1

Hi @itsgifnotjiff, the main point for Polytope shapes is to define shapes on all dimensions of the datacube. In your second datacube, this would be forecast, lat and lon. So to define:

  • a flight path: Request( Path([“lat”, “lon”], Box[“lat”, “lon”], [0,0], [1,1]), flight_points), Select(“forecast”, [0])) (Note that this will be static in time and altitude because of the provided datacube, but when there are time and altitude dimensions available, a flight path should be a 4D path, which is a sweep of a 4D box along 4D flight points)

  • Canada: Request(Polygon([“lat”, “lon”], canada_points), Select(“forecast”, [0]))

  • a bounding box: Request(Box([“lat”, “lon”], [0,0], [1,1]), Select(“forecast”, [0]))

  • a point: Request(Point([“lat”, “lon”], [[0,0]], method=“surrounding”), Select(“forecast”, [0)) (Note that this will take all the surrounding points to the requested point available on the datacube)

  • a vertical profile: This is difficult to achieve with the provided datacube, but it would probably be a Point in the lat/lon dimensions “augmented” by a Select in the “forecast” dimension (like in all the other examples above) and a Span in an altitude dimension.

Note also that the datacube needs to be defined as an Xarray DataArray instead of a Dataset currently.

Finally, to solve the shown ValueError, we've updated the develop branch, so it should now work! For any similar ValueErrors, each type of datacube dimension value needs to have an associated DatacubeAxis defined in the _type_to_axis_lookup dictionary (at the bottom of the /polytope/datacube/datacube_axis.py file). In this case, for example, the type np.float32 had be added as a FloatDatacubeAxis().

I hope this helps but I would be happy to answer more questions and will update the documentation to make these steps clearer!

from polytope.

jameshawkes avatar jameshawkes commented on August 9, 2024

Hi @itsgifnotjiff, thanks for your feature request! I'm not familiar with TileDB, @mathleur will check out the docs and see if its doable.

We already have an xarray backend for polytope, is it possible to use xarray as a layer to read from a TileDB? I've seen this https://github.com/TileDB-Inc/TileDB-CF-Py but no experience with it.

from polytope.

itsgifnotjiff avatar itsgifnotjiff commented on August 9, 2024

Thank you for your work and consideration Mr. @jameshawkes.

Yes they have integrated with xarray by specifying xr.open_dataset(uri, engine="tiledb"). But I am really curious to see if the query can not be pushed down to their C++ backend itself.

I think Mrs. Julia Dark @jp-dark , Mr. Isaiah Norton @ihnorton or Mr. Nick Kules can confirm if it is doable.

from polytope.

jameshawkes avatar jameshawkes commented on August 9, 2024

I would start simple with some point-profiles (e.g. time-series, vertical profiles) and some area extractions (e.g. boxes, polygons). We know that the xarray backend is not highly optimised, so I expect we will see some scaling issues at first, but we are actively working on this.

from polytope.

itsgifnotjiff avatar itsgifnotjiff commented on August 9, 2024

@jameshawkes and @mathleur sorry to bother you but i have a TileDB Array with a couple of TB of data in RottatedPole that looks like the image below and I am trying to construct some querries but understandably Polytope documentation does not give a lot of examples of how to build different queries.

If it is not too much to ask can you help me build a

  • flight path
  • Canada
  • bounding box
  • point,
  • vertical profile

and any other query you think would be pertinent to test please?

image

or even better let's use this cube

image

slicer = HullSlicer()
polytope_API = Polytope(datacube=ds.TT, engine=slicer)

canada_bbox = [-141, -50, 39, 83],

box1 = Box(["lon", "lat"], [-141, -50], [39, 83])
request = Request(box1)
result = polytope_API.retrieve(request)
print([leaf.result for leaf in result.leaves])

but I get

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[8], line 2
      1 slicer = HullSlicer()
----> 2 polytope_API = Polytope(datacube=ds.TT, engine=slicer)
      4 canada_bbox = [-141, -50, 39, 83],
      6 box1 = Box(["lon", "lat"], [-141, -50], [39, 83])

File ~/.conda/envs/polytope_exp/lib/python3.11/site-packages/polytope/polytope.py:49, in Polytope.__init__(self, datacube, engine, axis_options, datacube_options)
     46 if datacube_options is None:
     47     datacube_options = {}
---> 49 self.datacube = Datacube.create(datacube, axis_options)
     50 self.engine = engine if engine is not None else Engine.default()

File ~/.conda/envs/polytope_exp/lib/python3.11/site-packages/polytope/datacube/backends/datacube.py:156, in Datacube.create(datacube, axis_options, datacube_options)
    153 if isinstance(datacube, (xr.core.dataarray.DataArray, xr.core.dataset.Dataset)):
    154     from .xarray import XArrayDatacube
--> 156     xadatacube = XArrayDatacube(datacube, axis_options, datacube_options)
    157     return xadatacube
    158 else:

File ~/.conda/envs/polytope_exp/lib/python3.11/site-packages/polytope/datacube/backends/xarray.py:37, in XArrayDatacube.__init__(self, dataarray, axis_options, datacube_options)
     35         if self.dataarray[name].dims == ():
     36             options = axis_options.get(name, None)
---> 37             self._check_and_add_axes(options, name, values)
     38             treated_axes.append(name)
     39 for name in dataarray.dims:

File ~/.conda/envs/polytope_exp/lib/python3.11/site-packages/polytope/datacube/backends/datacube.py:76, in Datacube._check_and_add_axes(self, options, name, values)
     74     DatacubeAxis.create_standard(name, values, self)
     75 elif name not in self._axes.keys():
---> 76     DatacubeAxis.create_standard(name, values, self)

File ~/.conda/envs/polytope_exp/lib/python3.11/site-packages/polytope/datacube/datacube_axis.py:108, in DatacubeAxis.create_standard(name, values, datacube)
    105 @staticmethod
    106 def create_standard(name, values, datacube):
    107     values = np.array(values)
--> 108     DatacubeAxis.check_axis_type(name, values)
    109     if datacube._axes is None:
    110         datacube._axes = {name: deepcopy(_type_to_axis_lookup[values.dtype.type])}

File ~/.conda/envs/polytope_exp/lib/python3.11/site-packages/polytope/datacube/datacube_axis.py:120, in DatacubeAxis.check_axis_type(name, values)
    116 @staticmethod
    117 def check_axis_type(name, values):
    118     # NOTE: The values here need to be a numpy array which has a dtype attribute
    119     if values.dtype.type not in _type_to_axis_lookup:
--> 120         raise ValueError(f"Could not create a mapper for index type {values.dtype.type} for axis {name}")

ValueError: Could not create a mapper for index type <class 'numpy.float32'> for axis pres

from polytope.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.