Git Product home page Git Product logo

Comments (1)

alxmrs avatar alxmrs commented on July 27, 2024

Here's an idea that should help speed up the extraction of rows: The extraction of rows should be oriented around coordinates and URIs, not just URIs.

In the BQ pipeline...

def expand(self, paths):
break up the extract_rows steps into two steps. The first of the new steps should open the dataset, filter by area, and then produce chunks of coordinate, URI pairs. (The chunks should be a range of the output of get_coordinates(data_ds, uri). Maybe ~1k coordinates is a good unit? Will likely have to experimentally verify).

The second extract step should consume these pairs (do preprocessing for the rows like filter out variables, etc.). and produce rows:

def to_row(it: t.Dict) -> t.Dict:

Other tactics to investigate:

  • Move these lines outside of the to_rows loop; instead, perform this operation once on the whole xarray dataset:
    temp_row = row_ds.to_pandas().apply(to_json_serializable_type)
  • Determine if XArray's native parallelism capabilities are a good fit to produce rows with multiple threads (https://xarray.pydata.org/en/stable/user-guide/dask.html)
  • Investigate if there's a light-weight way to get coordinate information for the first of the two processing steps (e.g. can we just get the coordinates and not the data? Can we open the dataset, but not in memory?)

from weather-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.