Git Product home page Git Product logo

Comments (21)

chandradeepak avatar chandradeepak commented on July 21, 2024 1

@tamalsaha, Yeah might be we also saw high cpu usage when memory growth was also high. From your call graph it seems most of the time it spend reading the setting frame , writing setting frame and writing setting frame ack. So it might be related to what i see .

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024 1

we don't see the memory leak any more . even if it is leaking else where the reason pprof is showing as cmux is because that is the where the allocation happens when a new request comes. But now we dont see the leak any more . we set both framer.SetMaxReadFrameSize(65536) and looped only 128 times as you suggested before.

i wonder they have some impact.

Thanks for your patience.

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

Well, even if the code path is the major source of allocations in pprof, it doesn't really mean there is a legit leak.

Having said that, I can have two explanations:

  1. A client or a collection of clients are sending frames that are neither settings nor header.
  2. A client or a collection of clients are sending very large frames.

I'm not sure which one is the root-cause. Can you please apply the following (hacky) patch to your project and test if you see a major change? Feel free to increase the 64K and 128 hardcoded values if needed.

diff --git a/matchers.go b/matchers.go
index 485ede8..74a3d10 100644
--- a/matchers.go
+++ b/matchers.go
@@ -146,6 +146,7 @@ func matchHTTP2Field(w io.Writer, r io.Reader, name, value string) (matched bool

        done := false
        framer := http2.NewFramer(w, r)
+       framer.SetMaxReadFrameSize(65536)
        hdec := hpack.NewDecoder(uint32(4<<10), func(hf hpack.HeaderField) {
                if hf.Name == name {
                        done = true
@@ -154,7 +155,7 @@ func matchHTTP2Field(w io.Writer, r io.Reader, name, value string) (matched bool
                        }
                }
        })
-       for {
+       for i := 0; i < 128; i++ {
                f, err := framer.ReadFrame()
                if err != nil {
                        return false
@@ -181,4 +182,5 @@ func matchHTTP2Field(w io.Writer, r io.Reader, name, value string) (matched bool
                        return matched
                }
        }
+       return matched
 }

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024

Yeah we have Java clients which connect. In the second scenario when you say large frame what is the size you expect them to be. I don't expect that to happen, because the memory growth is over a time frame , typically over a span of 15 days. So can we rule that out ?

Coming to clients which are neither sending setting nor header framer , i am suspecting may be some java clients are behaving like that. i couldn't reproduce the scenario with go clients in my tests.

Thanks for the workaround. I will try it and let you know. But it would take a while before i figure out since memory growth happens slowly .

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024

Hello soheilhy ,
I have more details about when this memory growth might happen since we see it consistently in our production. I tried using the above fix but that makes all the client having their connections rejected. That opened a interesting aspect by all the clients are sending the correct headers frames so late after 128 frames.

So we tried simulating that using the sample grpc hello world program and these were our observations. Please download the latest sample client and server from
https://github.com/chandradeepak/memgrowth.

I have your fix of code with couple of print statements in
memgrowth/server/vendor/github.com/soheilhy/cmux/matchers.go

I tweaked to found the when i would be able to find the frame when multiple connections happen. so i am doing a for ever loop but once it finds the frame i print the count of it. with multiple connections we see there is a increase in the count of loop when correct frame would appear.

And i have seen the count go up to 1000K as well. If you consider if for one connection it reads 1MM frames you can expect there would be memory growth and that would leak .

One other notable mention here is i didn't see this with the sample hello world pb. So i just added a new rpc with bidirectional stream called send. That triggered it.
That is there in the directory server/vendor/google.golang.org/grpc/examples/helloworld/helloworld/helloworld.proto
rpc Send (stream HelloRequest) returns (stream HelloReply) {}

we have a pool of connections which create multiple streams and send data one after another , assuming we a new bufferedReader for each MuxConn. So multiple connections shouldn't impact each other.

please run the sample client and server to find out more. I am not sure where the problem is . might be in grpc .

Please let us know what you think and let me know if you need help in reproducing.

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024

One thing i found out is in the
func matchHTTP2Field(w io.Writer, r io.Reader, name, value string) (matched bool) { case *http2.SettingsFrame: if !f.IsAck(){ if err := framer.WriteSettings(); err != nil { return false } } }

if we do ignore the settings frame with acks we can get rid of what i saw earlier. seems like we are trying to accept setting frames with acks as well which caused this . But setting frames are no payload frames so that doesn't answer the memory increase . still debugging

from cmux.

tamalsaha avatar tamalsaha commented on July 21, 2024

@chandradeepak , any chance #42 is also related?

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

I'm still trying to understand if this is expected or not. After all, I need to dump the setting frame, and that consumes more CPU and memory.

Can you guys test what happens if we changeWriteSettings to WriteSettingsAck in this line:

if err := framer.WriteSettings(); err != nil {

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024

sure i will test it and let you know.

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

Is there any progress on this issue?

from cmux.

tamalsaha avatar tamalsaha commented on July 21, 2024

We have stopped using cmux because of the cpu usage issue.

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

Sorry to hear that, but you are using m.MatchWithWriters(...) and that's the expected consequence. you're problem will go away, if you use m.Match.

from cmux.

tamalsaha avatar tamalsaha commented on July 21, 2024

The issue is we have Java client and that needs MatchWithWriters.

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024

Hi
in our case i deployed the fix in some nodes , but due to load balancing the traffic is not routed to the nodes where the fix is there . I will come back with the results once we get real traffic . sorry for the delay.

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

Thanks Chandra!

from cmux.

tamalsaha avatar tamalsaha commented on July 21, 2024

@soheilhy , do you know/think why Java clients need that settings frame? Do you think it will be worth filing bug to remove this requirements with the java-grpc repo?

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

@tamalsaha it's not needed to be implemented the way Java gRPC did, and it's standard conformant. It's really worth trying to fix it there. Thanks!

from cmux.

tamalsaha avatar tamalsaha commented on July 21, 2024

I filed a bug and cc-ed you. We will see what they say. Thanks!

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

I just merged a patch which might explain why you see the leak. @chandradeepak, could you try with head, again, please?

from cmux.

chandradeepak avatar chandradeepak commented on July 21, 2024

Sure @soheilhy i will try it out. Recently we haven't been seeing the issue and it rarely occurs in some machines . so i will deploy this fix in the next push of ours and let you know if that fixes the problem. But it takes some time before it reproduces .

Thanks for the fix .

from cmux.

soheilhy avatar soheilhy commented on July 21, 2024

Thank you @chandradeepak for the confirmation. I have a hunch that commit 0068a46 has resolved the root cause. would you mind reverting framer.SetMaxReadFrameSize(65536) and the 128 limit on the loop for a quick test?

from cmux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.