Git Product home page Git Product logo

Comments (15)

zapov avatar zapov commented on September 23, 2024 1

If you are returning an instance to the outside I don't think you should be holding on that instance and using it for next request. Meaning I think it's fine to keep a cache of flatbuffer instances which you deserialize into, but I don't think it's fine to keep a cache of the data.media objects

from jvm-serializers.

cakoose avatar cakoose commented on September 23, 2024

from jvm-serializers.

cowtowncoder avatar cowtowncoder commented on September 23, 2024

I think that it would be quite difficult to fairly measure even just these 2 variations across formats and libraries. So I think that while test for "partial binding" (or whatever it'd be called) can be valuable, it may make most sense as separate project, focused on such usage.

from jvm-serializers.

mzaks avatar mzaks commented on September 23, 2024

Well I guess this is the common misconception which people have towards random value access feature. It is not a lazy parsing feature, where the data is transformed only on demand.
It is rather read access friendly representation of data.
This line MediaContent.getRootAsMediaContent(bb, mc);
Turns MediaContent mc = new MediaContent(); into read only accessor of received data.
As buffer has its own local references stored in a virtual table there is no actual parsing needed.
The getters of MediaContent just do some addition and subtraction of relative offsets, to get the absolute buffer offset, but this is similar to what a runtime does when we want to access a property of an object. OK there is also the conversion of the value at the offset to a type by using java.nio.ByteBuffer class, but this hardly can be called as parsing.
Comparing to other formats, where the values are stored as text, which actually has to be parsed, or in a format where we need to perform a linear traversal to find the value, and in this case it rather better to transform everything in one go.
It is, as if we would compare balanced binary search tree to a linked list.

So what I find a bit unfair in the current implementation of the test, is the fact that serializers.flatbuffers.media.MediaContent in deserialize method is converted to data.media.MediaContent and than JavaBuiltIn.mediaTransformer is used to check the values. In suggestion I posted as gist, deserialize method returns serializers.flatbuffers.media.MediaContent and I implemented static final class Transformer extends MediaTransformer<MediaContent> which knows how to forward and reverse data given data.media.MediaContent and serializers.flatbuffers.media.MediaContent.
As I don't have a complete understanding of the test I might miss something, but I guess by having an appropriate Transformer we do not even change the semantics of the test. It is still reading all the values from the buffer, it just does not convert it into unnecessary intermediate Java objects which has to be allocated and than garbage collected.

The only use case where using serializers.flatbuffers.media.MediaContent directly would not be an option is if we would like to mutate the values. As I mentioned before it is a read only accessor. But as far as I can tell this is not a use case in this test.

So I don't plead for making a test for partial access of data, I just want to avoid intermediate Java objects.

from jvm-serializers.

cakoose avatar cakoose commented on September 23, 2024

from jvm-serializers.

mzaks avatar mzaks commented on September 23, 2024

Some of the test do have there custom transformers, like for example the winner of the benchmark 😉
https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/colfer/Colfer.java#L64

So than we are back at the "unfair" point 🙂.

from jvm-serializers.

cowtowncoder avatar cowtowncoder commented on September 23, 2024

@mzaks be that as may, I do not see point in trying to spend huge amounts of effort to make test more favorable for certain kinds of decoders. Use case being tested is a common "decode the whole message" use case, and for that things work.

If you want to create separate set of tests, or perhaps new project, that can be valuable. But I see low effort/value ratio for modifying existing tests.

from jvm-serializers.

cakoose avatar cakoose commented on September 23, 2024

from jvm-serializers.

mzaks avatar mzaks commented on September 23, 2024

Will do. I also saw that there is a new version of FlatBuffers 1.7.0 will also update it in the PR.

from jvm-serializers.

zapov avatar zapov commented on September 23, 2024

I wrote the original flatbuffer submission and while its possible that I did not write an optimal implementation I have issues with @mzaks gist.
I did not write any transformer, because there was no POJO class to use for such purpose.
This implementation creates two cached instances:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L40
and https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L112
which seems not aligned with other implementation.
While it's cool that flatbuffers can access field by lookup instead of parsing, it's more likely that it won't be much faster when it needs to create garbage like everyone else.
By reusing those instances it avoids doing the work everyone else is doing (there are other serializers which can deserialize into an instance)

from jvm-serializers.

cakoose avatar cakoose commented on September 23, 2024

from jvm-serializers.

zapov avatar zapov commented on September 23, 2024

I don't remember the actual reason why I wrote it like that instead of using a transformer. I vaguely remember something about not being able to convert it (but of course it's possible that I just didn't know how to do it).
I'm fine with the transformer as long as there are no stuff like this: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L113

Since this bench tests for a single object and this optimization is not part of the library I don't think stuff like that are "fair" to other codecs.

On an unrelated note, as far as the improvements to the bench goes, I think it would be worth to try and get numbers without the copy in forward/reverse: https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java#L93
If that was the case current implementation would be fair to Flatbuffers in the same manner as to all other codecs with custom types. While on the surface it might seem not fair to the others since they have to create more garbage for conversion into provided POJO, to me it seems that's what the average Java dev expects: "I have my class, how fast can I send it around and get it back".

If we made that change I would drop the current dsl-json implementations and just put the databinding one in, as thats the most common use case.

from jvm-serializers.

mzaks avatar mzaks commented on September 23, 2024

Here is a cleaned up gist where I removed all the "smart" techniques.
https://gist.github.com/mzaks/3e432df395fa343847701b8735a706b4

I would like to discuss if those techniques are "fair" or not though.

With a binary serialisation strategy we have always two parts:

  1. The actual binary representation.
  2. Implementation how this binary representation can be read and written.

FlatBuffers binary format allows user to do this:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L113
Because strings are stored by reference and not in place. Meaning that I can perform de-duplication when I serialise. And skip decoding of the string from binary if I see that multiple table point to the same string. Why do the work twice?
This behaviour is sadly not supported by the implementation, but can be easily added on top by the user. So the binary representation supports it, but the technique is offloaded to the user and is not par of the generated code.

Same applies to:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L112
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L40

As a matter of fact, the generated API encourage user to reuse the table accessor classes:
https://github.com/eishay/jvm-serializers/blob/master/tpc/pregen/media.fbs/serializers/flatbuffers/media/MediaContent.java#L13

Those accessor classes are just convenience to store two things:

  1. Reference to ByteBuffer
  2. Offset in the byte buffer

It would be possible to avoid them all together and just provide pure static methods which receive reference to byte buffer and offset, and return the value or reference to a table. This way we could avoid class instantiation and garbage collection all together.

Again this is a case of binary representation supporting something that the implementation does not provide.

Does restricting a driver to use only 3 gears on his 5 gears vehicle, because other drivers drive vehicles with only 3 gears fair? Hard to say 😉.

from jvm-serializers.

mzaks avatar mzaks commented on September 23, 2024

Yes absolutely. The "caches" are meant only for sequential deserialisation in the same "layer". If the instance goes to a different layer of the application, it should not be reused. Same applies for multi threaded scenario. In the test as far as I can tell the decoding is done sequentially, so there is no need to create new instances on every iteration.
Also totally agree on data.media objects.

Let's consider a real world scenario. Our endpoint get requests with a payload, where we need to read the values and spin of processes, which will result in a response. In this case the payload is processed sequentially in one layer. Only the values will leave this layer. And this is where FlatBuffers shines - extracting of the values, specially if we need only a portion of the data.

from jvm-serializers.

cakoose avatar cakoose commented on September 23, 2024

from jvm-serializers.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.