Unlike <a class="issue-link js-issue-link" data-error-text="Failed to load title" data

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

I suppose there's quite a lot of overlap here with <a class="issue-link js-issue-link"

freeze/thaw backed by a bytebuffer about nippy HOT 10 CLOSED

huahaiy commented on September 25, 2024 1

freeze/thaw backed by a bytebuffer

from nippy.

Comments (10)

refset commented on September 25, 2024 1

What your objective is

Given an already frozen Nippy serialization held in an existing bytebuffer, I'm looking for a capability to parse through and extract some of the inner contents without actually thawing any of the values (to avoid allocations for objects that aren't strictly needed for anything), such that I can work with additional bytebuffers that hold wrapped slices of still-frozen nested values (i.e. not copying the underlying bytes either). For example, given an already frozen map, I would like to be able to locate and (potentially later) thaw only a specific value under a known key, if that key exists in that map, as demonstrated here.

How this relates to the current issue re: support for freezing to a user-supplied bytebuffer

I can't speak for the Datalevin project but I believe the overall goals are somewhat similar: a bytebuffer API would allow for memory to be re-used in tight loops and avoid creating unnecessary garbage. I can imagine that the initial scope of this issue for thawing might only require thawing from an entire buffer at a time, but I need something slightly more specific in addition, which is to be able to parse without thawing, so that I can later decide exactly which inner values I would like to thaw, if any. I am not currently looking for support to freeze to a user-specified bytebuffer.

What kind of API/functionality would you ideally want Nippy to expose

An API similar to the get-len function in my commit, which, given a buf and offset, could return the type and length. Note I haven't returned the type in that implementation currently, but on reflection since I've realised that I need it also.

I'm not familiar with [...] Agrona off-hand

The only reason I brought Agrona up was because I used it in the code I linked to. Specifically, Agrona provides an ergonomic API for working with on-heap and off-heap bytebuffers.

Thank you for the fast response 🙂

from nippy.

ptaoussanis commented on September 25, 2024 1

Hi Jeremy, thanks for the clarifications - that's helpful 👍

I suppose there's quite a lot of overlap here with #147 - much (all?) of this could happily done in userspace if the codec definitions in nippy.clj were introspectable/exposed somehow.

To be clear, I'd make a distinction between:

Nippy's internal schema: mostly just the set of [byte-id type length] tuples.
The encoding of base types as per java.io.DataOutput and optional compression/encryption.

Exposing a public view of the internal schema (1) should in principle be relatively straight-forward.
As I understood it, #147 also concerns itself with (2) - which isn't Nippy specific, and potentially more of an undertaking depending on what the target platform offers.

For your particular use case- how far would it get you if Nippy core included something like a public nippy/type-ids, maybe with explicit length in byes?

Seems that'd allow you to cut out ~90% of your branch code, and not depend on any fragile implementation details?

from nippy.

ptaoussanis commented on September 25, 2024 1

@refset 👍 Created #151 for next steps on public nippy/type-ids.

Leaving this issue open specifically for custom bytebuffer support.

from nippy.

ptaoussanis commented on September 25, 2024

@huahaiy Hi Huaha Yang, thanks for bringing this to my attention - sounds promising!

Would be happy to see a PR for this 👍

from nippy.

refset commented on September 25, 2024

Hi 🙂

We have been looking at this for XTDB recently in support of speeding up the ingestion pipeline and reducing unnecessary allocations. Specifically, we want to avoid the current need to thaw documents returned by the 'document-store' which then get immediately re-encoded/frozen into KVs bytebuffers for the 'index-store' (backed by RocksDB / LMDB etc.).

Instead the document-store could return a bytebuffer per document and from this XT should be able to construct the necessary KV bytebuffers by simply slicing and merging wrapped buffers (i.e. views with defined offsets and lengths) without any duplication or thawing at all.

In this branch, I have already extracted the necessary Nippy-internal codec information and created a get-len function that can satisfy our immediate requirements to avoid any thawing or copying:
https://github.com/refset/xtdb/blob/df210146d1744b14c31fa29e994ac3932c54e8d5/core/src/xtdb/nippy_utils.clj

Note that we use Agrona extensively across XT already.

Do you have any feedback or thoughts on how this approach could perhaps evolve into a PR?

The capability to freeze to bytebuffers would also be useful but is not a current focus.

from nippy.

ptaoussanis commented on September 25, 2024

@refset Hi Jeremy-

I'm not expecting to have significant time this week to dig into this in detail.
And heads-up that I'm not familiar with XTBD or Agrona off-hand.

Would it be possible to try give a simplified high-level (/ ELI5) explanation of:

What your objective is
How this relates to the current issue re: support for freezing to a user-supplied bytebuffer
What kind of API/functionality would you ideally want Nippy to expose

The easier you can make this for me to follow, the likelier I'll be able to get you a quick response.

Cheers!

from nippy.

refset commented on September 25, 2024

I suppose there's quite a lot of overlap here with #147 - much (all?) of this could happily done in userspace if the codec definitions in nippy.clj were introspectable/exposed somehow. Again, see that branch I mentioned for the ~small sections of nippy.clj I needed to copy across so that I could write my own get-len function - essentially just the type-id mappings and all the implied lengths (calculated by hand).

from nippy.

refset commented on September 25, 2024

For your particular use case- how far would it get you if Nippy core included something like a public nippy/type-ids, maybe with explicit length in bytes?
Seems that'd allow you to cut out ~90% of your branch code, and not depend on any fragile implementation details?

Agreed - I think that would work great 🙂

from nippy.

ptaoussanis commented on September 25, 2024

Just to summarise current status re: support for user-supplied bytebuffers:

#151 is being worked on separately, which might/not be useful for some related use cases.
#140 (support for user-supplied bytebuffers) is still independently interesting.
Next steps would be for someone to provide a sketch/PoC PR, or ideas re: what the API might look like.

from nippy.

ptaoussanis commented on September 25, 2024

Closing for inactivity as part of issue triage

from nippy.

freeze/thaw backed by a bytebuffer about nippy HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent