Git Product home page Git Product logo

Comments (5)

atollena avatar atollena commented on May 23, 2024 1

Note that on ARM/graviton processors, option 2 and 3 result in 20%+ throughput increase when running grpc-go's benchmarks with concurrency=100 and a 1kb payload:

unary-networkMode_none-bufConn_false-keepalive_false-benchTime_5m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1024B-respSize_1024B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false
               Title       Before        After Percentage
            TotalOps     34150108     42452710    24.31%
            Bytes/op     18312.83     18288.20    -0.13%
           Allocs/op       170.81       169.13    -0.59%
             ReqT/op 932525615.79 1159242001.07    24.31%
            RespT/op 932525615.79 1159242001.07    24.31%
            50th-Lat    856.568µs    675.408µs   -21.15%
            90th-Lat    976.363µs    817.218µs   -16.30%
            99th-Lat   1.559802ms   1.401467ms   -10.15%
             Avg-Lat    877.845µs     705.82µs   -19.60%
           GoVersion     go1.22.0     go1.22.0
         GrpcVersion   1.63.0-dev   1.63.0-dev

Same benchmark running on Intel gives less of a gain, but is still a clear improvement:

unary-networkMode_none-bufConn_false-keepalive_false-benchTime_5m0s-trace_false-latency_0s-kbps_0-MTU_0-maxConcurrentCalls_100-reqSize_1024B-respSize_1024B-compressor_off-channelz_false-preloader_false-clientReadBufferSize_-1-clientWriteBufferSize_-1-serverReadBufferSize_-1-serverWriteBufferSize_-1-sleepBetweenRPCs_0s-connections_1-recvBufferPool_nil-sharedWriteBuffer_false
               Title       Before        After Percentage
            TotalOps     34144564     36850123     7.92%
            Bytes/op     18305.01     18282.50    -0.13%
           Allocs/op       170.25       168.91    -1.17%
             ReqT/op 932374227.63 1006254025.39     7.92%
            RespT/op 932374227.63 1006254025.39     7.92%
            50th-Lat    854.773µs     783.94µs    -8.29%
            90th-Lat    992.547µs    946.702µs    -4.62%
            99th-Lat   1.564684ms   1.582244ms     1.12%
             Avg-Lat    877.519µs    813.102µs    -7.34%
           GoVersion     go1.22.0     go1.22.0
         GrpcVersion   1.63.0-dev   1.63.0-dev

from grpc-go.

zasweq avatar zasweq commented on May 23, 2024

Thanks for looking into this/thinking about potential solutions. After reading both issue threads I like option 1 and 3, especially if 3 increases throughput and also doesn't allocate when connection is idle. However, option 1 seems very simple (I don't know too much historically about these pragmas in this codebase). Doug is out this week and back next and I trust his judgement on this so I'll defer to him on the final decision, but we'd definitely be willing to review any PR's/patches for this :).

from grpc-go.

dfawley avatar dfawley commented on May 23, 2024

Sorry for the delay here.

I'm fine with option (1) as a quick fix here.

What are your thoughts about this option:

  1. Allocate on the heap, and use a sync.Pool to re-use memory and prevent too much garbage from being created? IIUC stack allocations can permanently grow the stack and would not ever be reduced for idle connections.

from grpc-go.

atollena avatar atollena commented on May 23, 2024

What are your thoughts about this option:
Allocate on the heap, and use a sync.Pool to re-use memory and prevent too much garbage from being created? IIUC stack allocations can permanently grow the stack and would not ever be reduced for idle connections.

Thanks for asking. I agree that option 4 sounds like the best. I had run benchmarks with option 4 (sync.Pool) and got visible performance degradation (with and without PGO). The change I tried is here. I didn't report my results here because I couldn't figure out why it was slower. One theory would be that loopy ends up not gathering as much data before flushing, resulting in more syscall -- if it's the case, then I'm not quite sure what we should do.

I did find out why the perf improvement on ARM comes thought: it is from zeroing the array:

Screenshot 2024-02-22 at 13 39 16

Option 4 also gets rid of this zeroing in the hot path, so there's something else going on with sync.Pool that I couldn't explain. I'll try to finish that investigation and then decide on option 4 vs option 1.

from grpc-go.

github-actions avatar github-actions commented on May 23, 2024

This issue is labeled as requiring an update from the reporter, and no update has been received after 6 days. If no update is provided in the next 7 days, this issue will be automatically closed.

from grpc-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.