Comments (5)
Note that on ARM/graviton processors, option 2 and 3 result in 20%+ throughput increase when running grpc-go's benchmarks with concurrency=100 and a 1kb payload:
unary-networkMode_none-bufConn_false-keepalive_false-benchTime_5m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1024B-respSize_1024B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false
Title Before After Percentage
TotalOps 34150108 42452710 24.31%
Bytes/op 18312.83 18288.20 -0.13%
Allocs/op 170.81 169.13 -0.59%
ReqT/op 932525615.79 1159242001.07 24.31%
RespT/op 932525615.79 1159242001.07 24.31%
50th-Lat 856.568µs 675.408µs -21.15%
90th-Lat 976.363µs 817.218µs -16.30%
99th-Lat 1.559802ms 1.401467ms -10.15%
Avg-Lat 877.845µs 705.82µs -19.60%
GoVersion go1.22.0 go1.22.0
GrpcVersion 1.63.0-dev 1.63.0-dev
Same benchmark running on Intel gives less of a gain, but is still a clear improvement:
unary-networkMode_none-bufConn_false-keepalive_false-benchTime_5m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1024B-respSize_1024B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false
Title Before After Percentage
TotalOps 34144564 36850123 7.92%
Bytes/op 18305.01 18282.50 -0.13%
Allocs/op 170.25 168.91 -1.17%
ReqT/op 932374227.63 1006254025.39 7.92%
RespT/op 932374227.63 1006254025.39 7.92%
50th-Lat 854.773µs 783.94µs -8.29%
90th-Lat 992.547µs 946.702µs -4.62%
99th-Lat 1.564684ms 1.582244ms 1.12%
Avg-Lat 877.519µs 813.102µs -7.34%
GoVersion go1.22.0 go1.22.0
GrpcVersion 1.63.0-dev 1.63.0-dev
from grpc-go.
Thanks for looking into this/thinking about potential solutions. After reading both issue threads I like option 1 and 3, especially if 3 increases throughput and also doesn't allocate when connection is idle. However, option 1 seems very simple (I don't know too much historically about these pragmas in this codebase). Doug is out this week and back next and I trust his judgement on this so I'll defer to him on the final decision, but we'd definitely be willing to review any PR's/patches for this :).
from grpc-go.
Sorry for the delay here.
I'm fine with option (1) as a quick fix here.
What are your thoughts about this option:
- Allocate on the heap, and use a sync.Pool to re-use memory and prevent too much garbage from being created? IIUC stack allocations can permanently grow the stack and would not ever be reduced for idle connections.
from grpc-go.
What are your thoughts about this option:
Allocate on the heap, and use a sync.Pool to re-use memory and prevent too much garbage from being created? IIUC stack allocations can permanently grow the stack and would not ever be reduced for idle connections.
Thanks for asking. I agree that option 4 sounds like the best. I had run benchmarks with option 4 (sync.Pool
) and got visible performance degradation (with and without PGO). The change I tried is here. I didn't report my results here because I couldn't figure out why it was slower. One theory would be that loopy ends up not gathering as much data before flushing, resulting in more syscall -- if it's the case, then I'm not quite sure what we should do.
I did find out why the perf improvement on ARM comes thought: it is from zeroing the array:
Option 4 also gets rid of this zeroing in the hot path, so there's something else going on with sync.Pool that I couldn't explain. I'll try to finish that investigation and then decide on option 4 vs option 1.
from grpc-go.
This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.
from grpc-go.
Related Issues (20)
- subChannel triggerDelete lead to dead loop HOT 3
- xds: ADS stream failure triggers wildcard subscriptions on new stream HOT 11
- Context cancellation when request finishes HOT 1
- New `Compressor`: Support metadata other than `Name` for trained compression HOT 7
- Why does `/internal/transport/proxy.go` `doHTTPConnectHandshake` call `resp.Body.Close()`? HOT 1
- retry: Status should indicate more details when retries are enabled and an RPC fails HOT 6
- gRPC error ResourceExhausted should include payload size in error details HOT 2
- codegen: Use Go generics for stream types HOT 4
- grpc processes connection data to start multiple Goroutines, and whether Goroutines can be merged to reduce additional overhead HOT 2
- A potential deadlock or goleak HOT 3
- Breaking Change: xds `Return no matching virtual host found` error HOT 6
- Errors such as "frame too large" and "PROTOCOL_ERROR" occurred with Unix domain socket on Windows HOT 2
- Retry and Backoff Clarification / Enabling Connection Retries HOT 1
- error: "../internal/tcp_keepalive_unix.go:27:2: cannot find package" when build examples/helloworld/greeter_server/main.go HOT 3
- Convert remaining uses of gracefulswitch to use `gracefulswitch.ParseConfig` and not `SwitchTo` HOT 2
- Update docs and examples and tests to use `NewClient` instead of `Dial` HOT 14
- How can the client automate the handling of the GOAWAY signal? HOT 5
- protoc-gen-go-grpc: empty const block causing `gofmt` failure
- failed to listen: listen tcp :50051: bind: address already in use exit status 1 HOT 3
- Add support for using zap.Object HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from grpc-go.