Comments (15)
If you are returning an instance to the outside I don't think you should be holding on that instance and using it for next request. Meaning I think it's fine to keep a cache of flatbuffer instances which you deserialize into, but I don't think it's fine to keep a cache of the data.media
objects
from jvm-serializers.
from jvm-serializers.
I think that it would be quite difficult to fairly measure even just these 2 variations across formats and libraries. So I think that while test for "partial binding" (or whatever it'd be called) can be valuable, it may make most sense as separate project, focused on such usage.
from jvm-serializers.
Well I guess this is the common misconception which people have towards random value access feature. It is not a lazy parsing feature, where the data is transformed only on demand.
It is rather read access friendly representation of data.
This line MediaContent.getRootAsMediaContent(bb, mc);
Turns MediaContent mc = new MediaContent();
into read only accessor of received data.
As buffer has its own local references stored in a virtual table there is no actual parsing needed.
The getters of MediaContent
just do some addition and subtraction of relative offsets, to get the absolute buffer offset, but this is similar to what a runtime does when we want to access a property of an object. OK there is also the conversion of the value at the offset to a type by using java.nio.ByteBuffer
class, but this hardly can be called as parsing.
Comparing to other formats, where the values are stored as text, which actually has to be parsed, or in a format where we need to perform a linear traversal to find the value, and in this case it rather better to transform everything in one go.
It is, as if we would compare balanced binary search tree to a linked list.
So what I find a bit unfair in the current implementation of the test, is the fact that serializers.flatbuffers.media.MediaContent
in deserialize
method is converted to data.media.MediaContent
and than JavaBuiltIn.mediaTransformer
is used to check the values. In suggestion I posted as gist, deserialize
method returns serializers.flatbuffers.media.MediaContent
and I implemented static final class Transformer extends MediaTransformer<MediaContent>
which knows how to forward and reverse data given data.media.MediaContent
and serializers.flatbuffers.media.MediaContent
.
As I don't have a complete understanding of the test I might miss something, but I guess by having an appropriate Transformer we do not even change the semantics of the test. It is still reading all the values from the buffer, it just does not convert it into unnecessary intermediate Java objects which has to be allocated and than garbage collected.
The only use case where using serializers.flatbuffers.media.MediaContent
directly would not be an option is if we would like to mutate the values. As I mentioned before it is a read only accessor. But as far as I can tell this is not a use case in this test.
So I don't plead for making a test for partial access of data, I just want to avoid intermediate Java objects.
from jvm-serializers.
from jvm-serializers.
Some of the test do have there custom transformers, like for example the winner of the benchmark 😉
https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/colfer/Colfer.java#L64
So than we are back at the "unfair" point 🙂.
from jvm-serializers.
@mzaks be that as may, I do not see point in trying to spend huge amounts of effort to make test more favorable for certain kinds of decoders. Use case being tested is a common "decode the whole message" use case, and for that things work.
If you want to create separate set of tests, or perhaps new project, that can be valuable. But I see low effort/value ratio for modifying existing tests.
from jvm-serializers.
from jvm-serializers.
Will do. I also saw that there is a new version of FlatBuffers 1.7.0 will also update it in the PR.
from jvm-serializers.
I wrote the original flatbuffer submission and while its possible that I did not write an optimal implementation I have issues with @mzaks gist.
I did not write any transformer, because there was no POJO class to use for such purpose.
This implementation creates two cached instances:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L40
and https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L112
which seems not aligned with other implementation.
While it's cool that flatbuffers can access field by lookup instead of parsing, it's more likely that it won't be much faster when it needs to create garbage like everyone else.
By reusing those instances it avoids doing the work everyone else is doing (there are other serializers which can deserialize into an instance)
from jvm-serializers.
from jvm-serializers.
I don't remember the actual reason why I wrote it like that instead of using a transformer. I vaguely remember something about not being able to convert it (but of course it's possible that I just didn't know how to do it).
I'm fine with the transformer as long as there are no stuff like this: https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L113
Since this bench tests for a single object and this optimization is not part of the library I don't think stuff like that are "fair" to other codecs.
On an unrelated note, as far as the improvements to the bench goes, I think it would be worth to try and get numbers without the copy in forward/reverse: https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java#L93
If that was the case current implementation would be fair to Flatbuffers in the same manner as to all other codecs with custom types. While on the surface it might seem not fair to the others since they have to create more garbage for conversion into provided POJO, to me it seems that's what the average Java dev expects: "I have my class, how fast can I send it around and get it back".
If we made that change I would drop the current dsl-json implementations and just put the databinding one in, as thats the most common use case.
from jvm-serializers.
Here is a cleaned up gist where I removed all the "smart" techniques.
https://gist.github.com/mzaks/3e432df395fa343847701b8735a706b4
I would like to discuss if those techniques are "fair" or not though.
With a binary serialisation strategy we have always two parts:
- The actual binary representation.
- Implementation how this binary representation can be read and written.
FlatBuffers binary format allows user to do this:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L113
Because strings are stored by reference and not in place. Meaning that I can perform de-duplication when I serialise. And skip decoding of the string from binary if I see that multiple table point to the same string. Why do the work twice?
This behaviour is sadly not supported by the implementation, but can be easily added on top by the user. So the binary representation supports it, but the technique is offloaded to the user and is not par of the generated code.
Same applies to:
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L112
https://gist.github.com/mzaks/a0dd3d68f8958da72068022749469531#file-flatbuffers-java-L40
As a matter of fact, the generated API encourage user to reuse the table accessor classes:
https://github.com/eishay/jvm-serializers/blob/master/tpc/pregen/media.fbs/serializers/flatbuffers/media/MediaContent.java#L13
Those accessor classes are just convenience to store two things:
- Reference to ByteBuffer
- Offset in the byte buffer
It would be possible to avoid them all together and just provide pure static methods which receive reference to byte buffer and offset, and return the value or reference to a table. This way we could avoid class instantiation and garbage collection all together.
Again this is a case of binary representation supporting something that the implementation does not provide.
Does restricting a driver to use only 3 gears on his 5 gears vehicle, because other drivers drive vehicles with only 3 gears fair? Hard to say 😉.
from jvm-serializers.
Yes absolutely. The "caches" are meant only for sequential deserialisation in the same "layer". If the instance goes to a different layer of the application, it should not be reused. Same applies for multi threaded scenario. In the test as far as I can tell the decoding is done sequentially, so there is no need to create new instances on every iteration.
Also totally agree on data.media
objects.
Let's consider a real world scenario. Our endpoint get requests with a payload, where we need to read the values and spin of processes, which will result in a response. In this case the payload is processed sequentially in one layer. Only the values will leave this layer. And this is where FlatBuffers shines - extracting of the values, specially if we need only a portion of the data.
from jvm-serializers.
from jvm-serializers.
Related Issues (20)
- Add codec for "universal binary json" HOT 2
- Add command-line switch to enable/disable reuse of ByteArrayOutputStream
- Jackson Avro Dataformat jar is missing HOT 1
- Add Scala Pickling? HOT 1
- Doesn't work on Java 8 HOT 4
- Compile warnings for Jackson HOT 3
- Check Results HOT 1
- Include protobuf-json as well in the benchmarks HOT 1
- include capnproto HOT 4
- Add support for Non-ASCII characters? HOT 5
- Add wiki for how to run with latest version of protocols will be helpful. HOT 3
- delete (dealloc) time missing from performance wiki HOT 1
- Make is failing HOT 2
- Add Moshi HOT 1
- Can't seem to find the referenced Avro media classes: Image, Media, and MediaContent HOT 1
- Update Wiki Stats HOT 25
- Add Amazon Ion HOT 4
- Benchmarks for Java Records using java.io.Serializable HOT 1
- Benchmark for fastjson2 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from jvm-serializers.