Git Product home page Git Product logo

polytope's Introduction


ci Documentation Status

ConceptInstallationExampleTestingDocumentation

Polytope is a library for extracting complex data from datacubes. It provides an API for non-orthogonal access to data, where the stencil used to extract data from the datacube can be any arbitrary n-dimensional polygon (called a polytope). This can be used to efficiently extract complex features from a datacube, such as polygon regions or spatio-temporal paths.

Polytope is designed to extend different datacube backends:

  • XArray dataarrays
  • FDB object stores (through the GribJump software)

Polytope supports datacubes which have branching, non-uniform indexing, and even cyclic axes. If the datacube backend supports byte-addressability and efficient random access (either in-memory or direct from storage), Polytope can be used to dramatically decrease overall I/O load.

Warning

This project is BETA and will be experimental for the forseable future. Interfaces and functionality are likely to change, and the project itself may be scrapped. DO NOT use this software in any project/software that is operational.

Concept

Polytope is designed to enable extraction of arbitrary extraction of data from a datacube. Instead of the typical range-based bounding-box approach, Polytope can extract any shape of data from a datacube using a "polytope" (n-dimensional polygon) stencil.

Polytope Concept

The Polytope algorithm can for example be used to extract:

  • 2D cut-outs, such as country cut-outs, from a datacube

    Greece cut-out

  • timeseries from a datacube

    Timeseries

  • more complicated spatio-temporal paths, such as flight paths, from a datacube

    Flight path

  • and many more high-dimensional shapes in arbitrary dimensions...

For more information about the Polytope algorithm, refer to our paper. If this project is useful for your work, please consider citing this paper.

Installation

Install the polytope software with Python 3 (>=3.7) from GitHub directly with the command

python3 -m pip install git+ssh://[email protected]/ecmwf/polytope.git@develop

or from PyPI with the command

python3 -m pip install polytope-python

Example

Here is a step-by-step example of how to use this software.

  1. In this example, we first specify the data which will be in our Xarray datacube. Note that the data here comes from the GRIB file called "winds.grib", which is 3-dimensional with dimensions: step, latitude and longitude.

        import xarray as xr
    
        array = xr.open_dataset("winds.grib", engine="cfgrib")

    We then construct the Polytope object, passing in some additional metadata describing properties of the longitude axis.

        options = {"longitude": {"cyclic": [0, 360.0]}}
    
        from polytope.polytope import Polytope
    
        p = Polytope(datacube=array, axis_options=options)
  2. Next, we create a request shape to extract from the datacube.
    In this example, we want to extract a simple 2D box in latitude and longitude at step 0. We thus create the two relevant shapes we need to build this 3-dimensional object,

        import numpy as np
        from polytope.shapes import Box, Select
    
        box = Box(["latitude", "longitude"], [0, 0], [1, 1])
        step_point = Select("step", [np.timedelta64(0, "s")])

    which we then incorporate into a Polytope request.

        from polytope.polytope import Request
    
        request = Request(box, step_point)
  3. Finally, extract the request from the datacube.

        result = p.retrieve(request)

    The result is stored as an IndexTree containing the retrieved data organised hierarchically with axis indices for each point.

        result.pprint()
        
    
        Output IndexTree: 
    
            ↳root=Nonestep=0 days 00:00:00latitude=0.0longitude=0.0longitude=1.0latitude=1.0longitude=0.0longitude=1.0

Testing

Additional Dependencies

The Polytope tests and examples require additional Python packages compared to the main Polytope algorithm. The additional dependencies are provided in the requirements_test.txt and requirements_examples.txt files, which can respectively be found in the tests and examples folders. Moreover, Polytope's tests and examples also require the installation of eccodes and GDAL. It is possible to install both of these dependencies using either a package manager or manually.

Contributing

The main repository is hosted on GitHub; testing, bug reports and contributions are highly welcomed and appreciated. Please see the Contributing document for the best way to help.

Main contributors:

See also the contributors for a more complete list.

License

Copyright 2021 European Centre for Medium-Range Weather Forecasts (ECMWF)

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0.

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.

Citing

If this software is useful in your work, please consider citing our paper as

Leuridan, M., Hawkes, J., Smart, S., Danovaro, E., and Quintino, T., “Polytope: An Algorithm for Efficient Feature Extraction on Hypercubes”, arXiv e-prints, 2023. doi:10.48550/arXiv.2306.11553.

Acknowledgements

Past and current funding and support for Polytope is listed in the adjoining Acknowledgements.

polytope's People

Contributors

figi44 avatar jameshawkes avatar mathleur avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

polytope's Issues

New transformation options

Support transformations which look like this:
transformation:
reverse:
axis: value
mapper:
in:value
out: lat,lon
reverse:
axis: lat

Handle "unsliceable" axes

For example handle a "variable" axis and slice on an uncountable value "temperature" , ie handle axes that have classes instead of continuous values

Renaming objects

  • Rename cyclic_offset and cyclic_remap to just offset and remap
  • Rename discretise to get_discrete_values
  • Rename label to axis where it needs to
  • Rename Polytope shape to ConvexPolytope
  • Rename PolytopeAPI to just Polytope

No module named 'pygribjump'

What happened?

Hi,

  This issue is more of a question than a bug report. 

 A couple of month ago someone you probably know (Dr. Quintino) gave a video conference where I learned about "polytope", earthkit, ... etc.  A couple of slides showed performance results (timings of feature extraction using polytope and GribJump).  It prompted me to install the package with conda and play with a few examples. One of them was 3D_shipping_route_wave_model.py  making use of the FDB backend ...

python 3D_shipping_route_wave_model.py

Traceback (most recent call last):
File ".../3D_shipping_route_wave_model.py", line 8, in
from polytope.datacube.backends.fdb import FDBDatacube
File .../python3.10/site-packages/polytope/datacube/backends/fdb.py", line 4, in
import pygribjump as pygj
ModuleNotFoundError: No module named 'pygribjump'

I searched for pygribjump ... it was nowhere to be found.

I realize that the software is a work in progress. I just want to check with you that the module (GribJump, pygribjump)
is indeed not publicly available and the outcome is as expected ?

I look forward to see more performance results with earthkit/polytope. Your work is interesting and appears to be really tigthly
integrated to the software infrastructure at ecmwf. This is new stuff in comparison to what is done in the context of PANGEO
for example.

Anyway, thanks for your time,

Oh, and I did succeed in running some other tests though ... :-)

What are the steps to reproduce the bug?

none

Version

polytope 1.03

Platform (OS and architecture)

cat /etc/os-release --> Ubuntu 22.04.4 LTS uname -a ---> Linux 4.18.0-240.el8.x86_64

Relevant log output

none

Accompanying data

none

Organisation

Environment and Climate Change Canada

Can this be used to optimize the extraction of grid point timeseries from grib files?

What maintenance does this project need?

I just discovered this project today and it seems to a address a problem that I had for a really long time.

For example, when reading a grib file from ecmwf ensemble with xarray

Screenshot 2023-06-26 at 15 35 49

attempting to extract a grid point with

.sel(lon=45, lat=10, method='nearest')

causes excessive memory/CPU usage and eventually takes too long.

My solution for the moment was to transfer this burden to cdo

filename = cdo.remapnn('lon=45/lat=10', input='-merge /ecmwf-ens/*.grib2', options='-P 12')
d = xr.open_dataset(filename, engine='cfgrib').squeeze().copy()

but this is far away from optimal.

Could this be solved using polytope to subset the grib file readed with xarray? As far as I know grib files do not support random access.
Do you have any example on how to extract a point in lat/lon? The documentation only shows how to extract boxes. I would like to test a little bit and see if there's any advantage over the "hybrid" CDO method :)

Organisation

No response

TileDB Support

Is your feature request related to a problem? Please describe.

I wonder if we can use something else as the data source Polytope queries. Zarrs, Kerchunked files are fine, but TileDB seems to be a pretty strong alternative.

Describe the solution you'd like

Ideally I would be able to query TileDB Arrays using Polytope's API.

Describe alternatives you've considered

I know you can use GDAl, Rasterio, Fiona, PDAL, Geopandas, Xarray and others to query data extracted from a TileDB Array, but I would love to be able to query the Array itself using Polytope.

Additional context

The GRIB Jump base is awesome but almost every Meteorological Center around the world has their own format that they need to either cast to GRIB or implement the logic you've implemented. Also from what I understand Polytope works on the GRIBs but TileDB can host Raster, Tabular, Vector, Imagery and Observational dense and sparse data in the same Array so I imagine being able to ask for the Model Outputs Raster, Warning Zones Vector, Station and Satelite Observational and Satelite Imagery data within a 4D Polygon (Flight path) and get all relevant information.

Organisation

Environment and Climate Change Canada

Add gribjump to CI workflow

Add pygribjump branch to CI workflow for FDB backend, also unmark the FDB backend datacube test so it's not skipped anymore

Use earth kit in examples to load data

Is your feature request related to a problem? Please describe.

No response

Describe the solution you'd like

No response

Describe alternatives you've considered

No response

Additional context

No response

Organisation

No response

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.