Git Product home page Git Product logo

Comments (10)

refset avatar refset commented on September 25, 2024 1

What your objective is

Given an already frozen Nippy serialization held in an existing bytebuffer, I'm looking for a capability to parse through and extract some of the inner contents without actually thawing any of the values (to avoid allocations for objects that aren't strictly needed for anything), such that I can work with additional bytebuffers that hold wrapped slices of still-frozen nested values (i.e. not copying the underlying bytes either). For example, given an already frozen map, I would like to be able to locate and (potentially later) thaw only a specific value under a known key, if that key exists in that map, as demonstrated here.

How this relates to the current issue re: support for freezing to a user-supplied bytebuffer

I can't speak for the Datalevin project but I believe the overall goals are somewhat similar: a bytebuffer API would allow for memory to be re-used in tight loops and avoid creating unnecessary garbage. I can imagine that the initial scope of this issue for thawing might only require thawing from an entire buffer at a time, but I need something slightly more specific in addition, which is to be able to parse without thawing, so that I can later decide exactly which inner values I would like to thaw, if any. I am not currently looking for support to freeze to a user-specified bytebuffer.

What kind of API/functionality would you ideally want Nippy to expose

An API similar to the get-len function in my commit, which, given a buf and offset, could return the type and length. Note I haven't returned the type in that implementation currently, but on reflection since I've realised that I need it also.

I'm not familiar with [...] Agrona off-hand

The only reason I brought Agrona up was because I used it in the code I linked to. Specifically, Agrona provides an ergonomic API for working with on-heap and off-heap bytebuffers.

Thank you for the fast response 🙂

from nippy.

ptaoussanis avatar ptaoussanis commented on September 25, 2024 1

Hi Jeremy, thanks for the clarifications - that's helpful 👍

I suppose there's quite a lot of overlap here with #147 - much (all?) of this could happily done in userspace if the codec definitions in nippy.clj were introspectable/exposed somehow.

To be clear, I'd make a distinction between:

  1. Nippy's internal schema: mostly just the set of [byte-id type length] tuples.
  2. The encoding of base types as per java.io.DataOutput and optional compression/encryption.

Exposing a public view of the internal schema (1) should in principle be relatively straight-forward.
As I understood it, #147 also concerns itself with (2) - which isn't Nippy specific, and potentially more of an undertaking depending on what the target platform offers.

For your particular use case- how far would it get you if Nippy core included something like a public nippy/type-ids, maybe with explicit length in byes?

Seems that'd allow you to cut out ~90% of your branch code, and not depend on any fragile implementation details?

from nippy.

ptaoussanis avatar ptaoussanis commented on September 25, 2024 1

@refset 👍 Created #151 for next steps on public nippy/type-ids.

Leaving this issue open specifically for custom bytebuffer support.

from nippy.

ptaoussanis avatar ptaoussanis commented on September 25, 2024

@huahaiy Hi Huaha Yang, thanks for bringing this to my attention - sounds promising!

Would be happy to see a PR for this 👍

from nippy.

refset avatar refset commented on September 25, 2024

Hi 🙂

We have been looking at this for XTDB recently in support of speeding up the ingestion pipeline and reducing unnecessary allocations. Specifically, we want to avoid the current need to thaw documents returned by the 'document-store' which then get immediately re-encoded/frozen into KVs bytebuffers for the 'index-store' (backed by RocksDB / LMDB etc.).

Instead the document-store could return a bytebuffer per document and from this XT should be able to construct the necessary KV bytebuffers by simply slicing and merging wrapped buffers (i.e. views with defined offsets and lengths) without any duplication or thawing at all.

In this branch, I have already extracted the necessary Nippy-internal codec information and created a get-len function that can satisfy our immediate requirements to avoid any thawing or copying:
https://github.com/refset/xtdb/blob/df210146d1744b14c31fa29e994ac3932c54e8d5/core/src/xtdb/nippy_utils.clj

Note that we use Agrona extensively across XT already.

Do you have any feedback or thoughts on how this approach could perhaps evolve into a PR?

The capability to freeze to bytebuffers would also be useful but is not a current focus.

from nippy.

ptaoussanis avatar ptaoussanis commented on September 25, 2024

@refset Hi Jeremy-

I'm not expecting to have significant time this week to dig into this in detail.
And heads-up that I'm not familiar with XTBD or Agrona off-hand.

Would it be possible to try give a simplified high-level (/ ELI5) explanation of:

  • What your objective is
  • How this relates to the current issue re: support for freezing to a user-supplied bytebuffer
  • What kind of API/functionality would you ideally want Nippy to expose

The easier you can make this for me to follow, the likelier I'll be able to get you a quick response.

Cheers!

from nippy.

refset avatar refset commented on September 25, 2024

I suppose there's quite a lot of overlap here with #147 - much (all?) of this could happily done in userspace if the codec definitions in nippy.clj were introspectable/exposed somehow. Again, see that branch I mentioned for the ~small sections of nippy.clj I needed to copy across so that I could write my own get-len function - essentially just the type-id mappings and all the implied lengths (calculated by hand).

from nippy.

refset avatar refset commented on September 25, 2024

For your particular use case- how far would it get you if Nippy core included something like a public nippy/type-ids, maybe with explicit length in bytes?
Seems that'd allow you to cut out ~90% of your branch code, and not depend on any fragile implementation details?

Agreed - I think that would work great 🙂

from nippy.

ptaoussanis avatar ptaoussanis commented on September 25, 2024

Just to summarise current status re: support for user-supplied bytebuffers:

  • #151 is being worked on separately, which might/not be useful for some related use cases.
  • #140 (support for user-supplied bytebuffers) is still independently interesting.
  • Next steps would be for someone to provide a sketch/PoC PR, or ideas re: what the API might look like.

from nippy.

ptaoussanis avatar ptaoussanis commented on September 25, 2024

Closing for inactivity as part of issue triage

from nippy.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.