Git Product home page Git Product logo

Comments (6)

kylebarron avatar kylebarron commented on September 6, 2024

I think the issue here is that Arrow schema metadata and Parquet key-value file metadata are technically different concepts. And so I assume that the Rust parquet crate does not automatically write the table metadata onto the Parquet file.

Other libraries include Arrow table schema metadata onto the Parquet key-value metadata, so maybe we should do the same here when writing.

I am able to open and view the geo metadata key in pyarrow

Is this on the table schema or the parquet metadata. They're two different things.

The Parquet schema is accessible with pyarrow.parquet.read_metadata(...).metadata.get(b'geo') while the Arrow schema is stored separately in the Parquet file and is accessible with pyarrow.parquet.read_schema(...).metadata.get(b'geo'). I'm guessing only the latter one exists in the Parquet file you're writing.

from parquet-wasm.

bjyberg avatar bjyberg commented on September 6, 2024

Thanks for such a quick response! And nice guess, I just checked and you're absolutely correct - Only pyarrow.parquet.read_schema(...).metadata.get(b'geo') exists in the file I've written. So would the solution be to write the geo metadata to the parquet metadata rather than the arrow schema? Is that possible at the moment? Thanks again!

from parquet-wasm.

kylebarron avatar kylebarron commented on September 6, 2024

I think we just need to implement this method:

// /// Sets "key_value_metadata" property.
// #[wasm_bindgen(js_name = setKeyValueMetadata)]
// pub fn set_key_value_metadata(
// self,
// value: Option<Vec<parquet::file::metadata::KeyValue>>,
// ) -> Self {
// Self {
// 0: self.0.set_key_value_metadata(value),
// }
// }

from parquet-wasm.

kylebarron avatar kylebarron commented on September 6, 2024

Can you test from this branch #503? There are developer docs here for building.

Usage should be something like:

import {
  WriterProperties,
  WriterPropertiesBuilder,
} from "./pkg/esm/parquet_wasm.js";

let props = new Map<string, string>();
props.set("geo", "...");
let writerProps = new WriterPropertiesBuilder()
  .setKeyValueMetadata(props)
  .build();

from parquet-wasm.

bjyberg avatar bjyberg commented on September 6, 2024

Sorry for the delay - took a while to figure out how to build/run, etc. Good learning experience haha! It works perfectly, thanks Kyle! If I can help by contributing any documentation, etc. when it is merged into the main branch, let me know! Happy to close this now if you are :)

from parquet-wasm.

kylebarron avatar kylebarron commented on September 6, 2024

Awesome, good to hear! Ideally most people will be writing GeoParquet via the geoarrow-wasm set of tools, like @geoarrow/geoparquet-wasm.

const wkb = geos.geosGeomToWKB(geomPtr) // returns a WKB buffer

Having two sets of Wasm bundles is a lot of code for the user to download and means that you need to have memory copies out of one Wasm memory space and then into the other's.

But alas, for now, if it works that's good!

A doc example would be welcome! You can include markdown in the /// in the Rust code here:

/// Sets "key_value_metadata" property.

That gets copied into the generated .d.ts doc comments

from parquet-wasm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.