Comments (28)
Agreed, though rather than just comparing it against C++, it be great to compare against Protocol Buffers for Java, and other similar solution.
This would be a great thing for an external contributor to try out (hint hint :)
from flatbuffers.
I know this is ~11 months old, but are these Benchmarks something that we can use today? I'm looking at porting them to C# so we can compare C++/Java/C# - ideally add other languages too.
from flatbuffers.
I was planning on writing some internal benchmarks comparing the Java implementations for FlatBuffers and Protobuf within the next few weeks.
If you give me access to the schema and maybe the source files you've used in http://google.github.io/flatbuffers/md__benchmarks.html, I may be able to provide the (non-Android) Java numbers.
from flatbuffers.
I'll get you those files.
from flatbuffers.
I've implemented a first version. Note that,
- I currently don't have any information about GC triggers, so the numbers (especially for encoding/decoding protobufs) may be worse in real world use.
- FlatBuffers-Java still needs some optimization
- The test-data may not reflect your use case (!!!)
edit: 4) The encode/decode numbers look a bit better than I'd expect. There may be some optimizations that I'm not properly accounting for.
So far the numbers (sec for 1M operations) look like the following:
OS: Windows 8.1
JDK: Oracle JDK 1.8.0_20-b26
CPU: Intel i5-3427U @ 1.80 Ghz
Decode + Traverse
(FlatBuf direct) 0 + 0.639 = 0.639 us
(FlatBuf heap) 0 + 0.732 = 0.732 us
(ProtoBuf-Java) 6.903 + 0.037 = 6.94 us
(ProtoBuf-JavaNano) 1.101 + 0.024 = 1.125 us
Encode
(FlatBuf direct) 1.137 us
(FlatBuf heap) 1.576 us
(ProtoBuf-Java) 1.652 us
(ProtoBuf-JavaNano) 1.379 us
Preliminary Result
Traversing a ByteBuffer is very expensive compared to traversing objects directly. However, the deserialization time still ruins the overall performance of Protobuf.
[Edit 2015-12-10]
added units and results for protobuf-javanano
from flatbuffers.
Where's the benchmark code?
On Wed, Jan 28, 2015 at 11:49 AM, Florian Enner [email protected]
wrote:
I've implemented a first version. Note that
- I currently don't have any information about GC triggers, so the
numbers (especially for encoding/decoding protobufs) may be worse in real
world use.- FlatBuffers-Java still requires some optimizations
- The test-data may not reflect your use case!
So far the numbers (sec for 1M operations) look like the following:
OS: Windows 8.1
JDK: Oracle JDK 1.8.0_20-b26
CPU: Intel i5-3427U @ 1.80 GhzDecode + Traverse
(FlatBuf direct) 0 / 0.639 = 0.639
(FlatBuf heap) 0 / 0.732 = 0.732
(ProtoBuf) 6.903 / 0.037 = 6.94Encode
(FlatBuf direct) 1.137
(FlatBuf heap) 1.576
(ProtoBuf) 1.652Preliminary Result
The data looks quite a bit different than the C++ implementations.
Traversing a ByteBuffer is very expensive compared to traversing objects
directly. However, the deserialization time still ruins the overall
performance of Protobuf. As long as you don't fully traverse each message
more than 10 times, FlatBuffers will be faster (for similar use cases).
Serializing data via flatbuffers is faster as well, but not by a huge
margin.—
Reply to this email directly or view it on GitHub
#55 (comment).
When the cat is away, the mouse is alone.
- David Yu
from flatbuffers.
from flatbuffers.
Btw. there may be some optimizations that I'm not accounting for. The protobuf encode/decode numbers look better than I'd expect. Let me know in case you go through the code and find something wrong with it.
from flatbuffers.
Excellent work!
In my C++ benchmark I simply add up all numbers I read when traversing the FlatBuffers, and print it at the end, guaranteeing the optimizer can't cheat.
Yes, the accessor overhead is a lot bigger in Java than it is in C++, as to be expected. FlatBuffers is definitely optimized for use cases where access is infrequent (usually just once). I'd say that even if you access the data more than 10x, FlatBuffers may still be worth it because of the more predictable performance (no startup cost, less GC overhead).
from flatbuffers.
Thanks :) For the "use" I add the numbers up in the same way you did, but I split up the encoding/decoding into separate steps in order to get individual results. In the past few days I've tried several different variations (including one monolithic function) and the numbers are coming back roughly the same. So maybe they are what they are.
I agree that predictability is a much more important benefit than raw performance.
from flatbuffers.
There's been no progress made on improving the benchmarks. The problem with releasing them is that a) they only work on Windows (Windows timing functions and projects, shouldn't be too hard to fix) and b) they depend on a bunch of external projects (protobuf etc), for which proper dependencies (submodules) and build dependencies need to be set up, preferably in a cross platform way (CMake).
If someone wants to to work on cleaning this mess up, I'd be happy to give them the code.
For Java, there's already something here: https://github.com/ennerf/flatbuffers-java-benchmark
from flatbuffers.
As a simple first step, a unified benchmark to compare FB implementations against each other may be good
from flatbuffers.
I've based the Java benchmark on the original C++ benchmark. I unfortunately don't know whether or not the Java benchmark would still run, but please feel free to use the code in any way you want.
from flatbuffers.
Ok, I'll make an effort to get the benchmark code out there. Probably initially with the warts mentioned above, on a separate branch in the FlatBuffers repo, and then we can start cleaning it up from there (and add languages).
from flatbuffers.
I threw together a quick port of the benchmark to .NET (https://github.com/evolutional/flatbuffers-java-benchmark/commits/cs-port)
I didn't have the .fbs so crafted my own from the generated java code.
I'm not getting anywhere near the Java numbers above on the PC I ran it on (Win7 x64 Intel Xeon E5-1650v2 @ 3.5Ghz
/ .NET FX 3.5)
Name Mean StdD Unit 1M/sec
Encode 0.003 0.001 ms/op 3.4
Decode 0.000 0.000 ms/op 0.0
Traverse 0.002 0.001 ms/op 1.7
(10 warm up, 50 measurement iterations)
Either:
a) my benchmarks are bad (possible - we don't have JMH for .NET, I used a lambda for the test action, may have added overhead)
or
b) the .NET version of Flatbuffers needs optimization :) (possible)
I'll do some tinkering; re-run the benchmark on my main PC, across .NET 3.5, 4.0 & 4.5 and with/without the "unsafe" version of the bytebuffer. I'll run the java benchmark too, so I can compare them.
from flatbuffers.
Here is the corresponding fbs https://gist.github.com/ennerf/cb7999bd30973acaf794
from flatbuffers.
I've uploaded some numbers for my current Desktop machine in case you'd like more data for comparison: https://gist.github.com/ennerf/d108294623a8c52f984f
from flatbuffers.
Thanks!
Not being too familiar with the Java benchmark suite, the "score" - is that the measured average?
So 1.137 us/op would be 1 operation in 1.137 microseconds?
from flatbuffers.
The JMH score depends on the settings. In this case, it does represent the average time.
@BenchmarkMode(Mode.AverageTime)
from flatbuffers.
I think I was measuring the timings wrongly. I've adopted the approach here (http://stackoverflow.com/questions/1206367/c-sharp-time-in-microseconds) to measure in microseconds and am now getting these figures:
50 iterations - Win7 x64 Intel Xeon E5-1650v2 @ 3.5Ghz
& .NET FX 3.5
Safe ByteBuffer
Name Mean StdD Unit
Encode 1.264 0.486 us/op
Decode 0.018 0.039 us/op
Traverse 0.476 0.181 us/op
Unsafe ByteBuffer
Name Mean StdD Unit
Encode 1.068 0.377 us/op
Decode 0.022 0.042 us/op
Traverse 0.366 0.183 us/op
The unsafe bytebuffer is noticeably faster on the encode action. The performance still appears to be lower than the figures you posted for the java version, though - but I can't verify that until I can run the same Java benchmarks on this machine.
from flatbuffers.
Cracking out dotTrace from Jetbrains, it looks like the hottest function is the FlatBufferBuilder.CreateString()
method, taking over 50% of the time within the Flatbuffers assembly.
Specifically, the call to Encoding.UTF8.GetBytes()
is expensive. This is the function that takes the .NET wide strings and converts to UTF8.
The next most expensive call is FlatBufferBuilder.EndObject()
, but it's nowhere near as expensive as CreateString()
.
from flatbuffers.
Made an optimization to CreateString
(PR inbound) which changes the timings to:
Safe Bytebuffer
Name Mean StdD Unit
Encode 1.108 0.372 us/op
Decode 0.018 0.039 us/op
Traverse 0.478 0.249 us/op
Unsafe Bytebuffer
Name Mean StdD Unit
Encode 0.988 0.386 us/op
Decode 0.018 0.039 us/op
Traverse 0.380 0.218 us/op
from flatbuffers.
Making a few more tweaks to how Pad/Prep is called and I can save a few more cycles on the benchmark (I'll prepare another PR)
Safe
Name Mean StdD Unit
Encode 1.042 0.358 us/op
Decode 0.018 0.039 us/op
Traverse 0.486 0.246 us/op
Unsafe
Name Mean StdD Unit
Encode 0.906 0.374 us/op
Decode 0.020 0.040 us/op
Traverse 0.358 0.205 us/op
from flatbuffers.
Ran the Java and C# version of the benchmarks on the same machine (Intel i7-4770K @ 3.50Ghz
Win10 x64).
Results in here https://gist.github.com/evolutional/85b9c6fac33a8455945d
With the optimizations I discuss in the gist, we achieve roughly equivalent performance (with Java direct bb being slightly quicker).
from flatbuffers.
The C++ benchmark code is now in the repo (in its own branch, see comment at the end of https://google.github.io/flatbuffers/md__benchmarks.html
Anyone interested in integrating benchmarks for other languages, these could go in the same branch.
from flatbuffers.
Nice, I'll pull over the C# benchmarks
On Wed, Jan 6, 2016 at 5:02 PM -0800, "Wouter van Oortmerssen" [email protected] wrote:
The C++ benchmark code is now in the repo (in its own branch, see comment at the end of https://google.github.io/flatbuffers/md__benchmarks.html
Anyone interested in integrating benchmarks for other languages, these could go in the same branch.
Reply to this email directly or view it on GitHub:
#55 (comment)
from flatbuffers.
Would be nice to have a comparison with json parsed/serialized by LoganSquare too (this library doesn't use reflection, so a benchmark would be fair)
from flatbuffers.
This issue has been automatically marked as stale because it has not had activity for 1 year. It will be automatically closed if no further activity occurs. To keep it open, simply post a new comment. Maintainers will re-open on new activity. Thank you for your contributions.
from flatbuffers.
Related Issues (20)
- tensorflow-intel 2.15.0 requires flatbuffers>=23.5.26 HOT 1
- comparison of integer expressions of different signedness: ‘size_t’ {aka ‘long unsigned int’} and ‘int’ HOT 2
- rust-flatbuffers: Upgrading from flat 1.12 to 25.x
- rust-flatbuffers: Alignment issue
- --force-empty only support cpp ?
- Conflicts with Rust `core` module in code generated by flatc 23.5.26. HOT 1
- ReleaseRaw cannot be used with custom allocators [C++]
- [kotlin-kmp] Enums only files skipped when multiple schemas are used as input HOT 1
- Assertion `!nested' failed.
- [Python] No functional API to create a flatbuffer
- Question on using Flatbuffers for high performance research application
- Rust: FlatBuffers are impossible to return from functions
- Is it binary-compatible to delete (required) attribute?
- [Python] Use 'T' suffix object for serialization
- [swift] distinguish unknown from absent enum value for scheme evolution HOT 5
- undocumented breaking change GenerateText->GenText
- [Java, FlatBuffers 23.5.26, Mac OS 14.3] getByKey returns null while accessing using vector index works
- `get_root_as_*` functions not generated [Rust, 23.5.26, Linux] HOT 1
- Generating schemas with language adapters (as optional add-ons)?
- Streaming flatbuffers in java HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flatbuffers.