Git Product home page Git Product logo

Comments (6)

krapie avatar krapie commented on September 2, 2024 1

After more researches on gRPC server-side streaming usage, I have found Kubecon 2018 video: Using gRPC for Long-lived and Streaming RPCs - Eric Anderson, Google which explains about gRPC's long-lived RPC's issue and it's improvements.

This is what I have concluded based on above reference.

  1. gRPC server-side streaming can be connected for days, or more (use cases of this RPC is watch/notification).
  2. But server-side streaming can have some problem when load balancing (this is because load balancing is performed on RPC bases, already created RPC is connected to old backend throughout its life-time and will not connected to new backend even when new backend comes up).
  3. MAX_CONNECTION_AGE does not kill connection itself, so only using this option will not resolve this issue (just sending GOAWAY will not close connection).
  4. To improve this issue on load balancing, server should close the RPC occasionally, and add MAX_CONNECTION_AGE_GRACE option with MAX_CONNECTION_AGE to forcefully close connection. gRPC suggests to use these options as a backup so that connection can be eventually closed.

Therefore, I suggest two options for RPC connection close.

  1. RPC timer + MAX_CONNECTION_AGE + MAX_CONNECTION_AGE_GRACE: introduce timer on WatchDocument RPC to periodically close connection, and set MAX_CONNECTION_AGE and MAX_CONNECTION_AGE_GRACE options as a backup to close RPC.
  2. stream_idle_timeout + MAX_CONNECTION_AGE + MAX_CONNECTION_AGE_GRACE: Usestream_idle_timeout to detect idle connection and close it to minimize split-brain time of connection when upstream host changes, and set MAX_CONNECTION_AGE and MAX_CONNECTION_AGE_GRACE options as a backup to close RPC.

Option 1 is the "graceful" and "suggested" way to improve(resolve) this issue, but I think option 2 is more suitable considering our use cases. Because Yorkie is used for "real-time" collaboration, sync sensitivity between peers is very important. Therefore noticing split-brain issue and closing connection as soon as possible is more important than having graceful/long interval of connection close.

This stream_idle_timeout option will emit errors periodically when only one user just keep the document opened but not doing anything. So I think we should catch and hide this P2_PROTOCOL_ERROR error caused by stream_idle_timeout from clients.

from yorkie.

krapie avatar krapie commented on September 2, 2024

I've confirmed that server is sending GOAWAY frame when stream exceeds MAX_CONNECTION_AGE by setting GODEBUG=http2debug=2 environment value for http2 tracing.


--- After stream exceeds MAX_CONNECTION_AGE ---
2023/04/28 18:16:31 http2: Framer 0x14000188000: wrote GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
2023/04/28 18:16:31 http2: Framer 0x14000188000: wrote PING len=8 ping="\x01\x06\x01\b\x00\x03\x03\t"
2023/04/28 18:16:31 http2: Framer 0x140006b2000: read GOAWAY len=8 LastStreamID=2147483647 ErrCode=NO_ERROR Debug=""
2023/04/28 18:16:31 http2: Framer 0x140006b2000: read PING len=8 ping="\x01\x06\x01\b\x00\x03\x03\t"
2023/04/28 18:16:31 http2: Framer 0x140006b2000: wrote PING flags=ACK len=8 ping="\x01\x06\x01\b\x00\x03\x03\t"
2023/04/28 18:16:31 http2: Framer 0x14000188000: read PING flags=ACK len=8 ping="\x01\x06\x01\b\x00\x03\x03\t"
2023/04/28 18:16:31 http2: Framer 0x14000188000: wrote GOAWAY len=8 LastStreamID=5 ErrCode=NO_ERROR Debug=""
2023/04/28 18:16:31 http2: Framer 0x140006b2000: read GOAWAY len=8 LastStreamID=5 ErrCode=NO_ERROR Debug=""

But I'm still searching for how to capture http2 GOAWAY frame in gRPC.

from yorkie.

krapie avatar krapie commented on September 2, 2024

As of my understanding, gRPC's HTTP/2 transport layer(http2_server and http2_client) is handling GOAWAY, but It is not closing stream on receiving GOAWAY. Also I couldn't find a way to get HTTP/2 frame with gRPC go-sdk.

So I left a question in grpc/grpc-go, hoping gRPC members can provide me a good explanation for my questions.

from yorkie.

krapie avatar krapie commented on September 2, 2024

I have discussed this issue with grpc community, and I found out that our WatchDoument RPC handler is not properly coded.

Graceful close of connections wait for existing streams to be closed before the connection is closed. If your server RPC handler never returns, then existing streams will not be closed, and therefore graceful connection close will not happen.

Since our WatchDoument server-side streaming RPC handler never returns, there will be no "graceful close" of connection, so even when GOAWAY is sent, there will be no additional graceful connection close.

Therefore, we might need to add timer in our WatchDocument RPC handler to return when timer expires, and perform graceful connection close. I think we can combine this method with MaxConnectionAge and MaxConnectionAgeGrace to ensure proper connection close (eg: MaxConnectionAge set to 60, RPC timer to 70, and MaxConnectionAgeGrace to 80 or so).

from yorkie.

krapie avatar krapie commented on September 2, 2024

To conclude:

  • Keep using stream_idle_timeout + MAX_CONNECTION_AGE + MAX_CONNECTION_AGE_GRACE options.
  • But it will be better to catch and hide P2_PROTOCOL_ERROR error caused by forceful connection close of stream_idle_timeout in our clients.

from yorkie.

hackerwins avatar hackerwins commented on September 2, 2024

Related to https://github.com/yorkie-team/devops/issues/21

from yorkie.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.