Git Product home page Git Product logo

Comments (15)

serbrech avatar serbrech commented on July 18, 2024

So, the way this works is that, when a local "peer" connection is created, the "remote" peer communicates its idle threshold in the responding frame.
The local peer should then heartbeat at half that rate to make the remote peer happy.

You can see these values when enabling debug on this library :

TX: Open{
  ContainerID : kfGUo_XNZWC6sPfAY_XU1gLrP3btGtZ9wZoupp5vEs_pIlksex5OJQ, 
  Hostname: xxxx.servicebus.windows.net, 
  MaxFrameSize: 65536, 
  ChannelMax: 65534, 
  **IdleTimeout: 1m0s,** 
  OutgoingLocales: [], 
  IncomingLocales: [], 
  OfferedCapabilities: [], 
  DesiredCapabilities: [], 
  Properties: map[framework:go1.14.14 platform:linux product:MSGolangClient user-agent:/golang-service-bus version:0.10.3]
}
21:36:16.171162 RX: Open{
  ContainerID : c668cc23453c4f8489a26897f8f38793_G1, 
  Hostname: , 
  MaxFrameSize: 65536, 
  ChannelMax: 4999, 
  **IdleTimeout: 4m0s,** 
  OutgoingLocales: [], 
  IncomingLocales: [], 
  OfferedCapabilities: [], 
  DesiredCapabilities: [], 
  Properties: map[]
}

So I think the code is correct, you can see that L579 is setting a keepalive timer at half the remote peer's idle timeout value, to make sure we don't reach it.
And if the remote peer does not send a heartbeat to the local peer within the communicated idle-timeout, it should close the connection.

I agree that the spec wording isn't perfect, but the section ends with :

If during operation a peer exceeds the remote peer's idle timeout's threshold, e.g., because it is heavily loaded, it SHOULD gracefully close the connection by using a close frame with an error explaining why."

Which I think makes it clear that, the threshold is a hard threshold, but the heartbeat should be sent at half the peer's timeout.

from go-amqp.

imirkin avatar imirkin commented on July 18, 2024

OK, so then the library is too strict with the idle timeout (not peer idle timeout)? It does:

https://github.com/Azure/go-amqp/blob/master/conn.go#L465

which sets a read deadline of e.g. 1 minute. But then it sends the same 1-minute timeout to the remote side:

https://github.com/Azure/go-amqp/blob/master/conn.go#L808

That should be 1/2 c.idleTimeout (or the read deadline should be set to 2x). Or am I misreading?

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

no the timeout that it sends as its threshold is a hard timeout, it's the peer's job to heartbeat at half the rate.
So, it sends 1 min. the peer should hearbeat every 30 sec. if after 1 min it does not get a heartbeat, it's over.

*my understanding from the spec...

from go-amqp.

imirkin avatar imirkin commented on July 18, 2024

Right yeah. Assuming your interpretation is correct, this lib does the right thing.

OK, so what do you think about the quote from section 2.4.5, i.e.:

To avoid spurious timeouts, the value in idle-time-out SHOULD be half the peer’s
actual timeout threshold.

Which to my understanding says not that you should heartbeat at 2x the idle timeout (although you are welcome to), but that you should allow a 2x grace period over the idle timeout specified in the protocol? Or are they referring to something else?

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

this sentence is poorly worded I think. I think it means it needs to heartbeat at half the rate communicated by the peer.
All the other parts of the spec seem to point at this lib's implementation.

it's a peer to peer protocol, so the peer is the "remote" one. your local max idling should be half of the peer's timeout threshold.

i.e :

"If a peer needs to satisfy the need to send traffic to prevent idle timeout, and has nothing to send, it MAY send an empty frame"

This also suggest that the threshold is hard.

from go-amqp.

imirkin avatar imirkin commented on July 18, 2024

As I read it, it means the threshold is allowed to be hard (so this lib is not out of spec), but does not implement the "SHOULD" recommendation. i.e. the "actual timeout threshold" * 1/2 = "value in idle-time-out sent over the wire". But like I said, I'm somewhat new to AMQP 1.0.

"If a peer needs to satisfy the need to send traffic to prevent idle timeout, and has nothing to send, it MAY send an empty frame"

This also suggest that the threshold is hard.

OK. That's not really how I read it -- I see that as mentioning how heartbeats should work, and that heartbeats can be used to prevent idle timeouts.

from go-amqp.

imirkin avatar imirkin commented on July 18, 2024

BTW, it should be noted that IBM MQ does exactly that -- you set the AMQPKA value to x, and it puts x/2 into the IdleTimeout over the wire. I believe it will only time out after x time passes.

I've worked around this issue in the meanwhile by setting the AMQPKA on the IBM MQ side, and setting the idle timeout to 0 on the golang side.

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

hmm, thanks for sharing.
That seems strange to me. It means they would recommend to both :

  • heartbeat at half the timeout communicated by your peer
  • communicate half your timeout threshold to your peer

so you would get

  • effectiveTimeout : T
  • communicatedTimeout: T/2
  • peerHeartbeat: T/4 (communicatedTimeout/2)

That seems excessive, and a bit undeterministic.

from go-amqp.

imirkin avatar imirkin commented on July 18, 2024

I'm not sure I see where the spec recommends heartbeating at 1/2 the Idle Timeout interval. Instead I think it recommends being forgiving, i.e. heartbeat at <interval>, but allow some variance on it.

Either way, what's important is what other AMQP 1.0 implementations do, since they all have to interop together. With IBM MQ (whose implementation of AMQP 1.0 is a relatively recent feature), I receive heartbeats at the rate specified in IdleTimeout (I did the debug prints, and perhaps added some of my own, I forget precisely). So the go code would sometimes close the connection due to slight differences in timing, so I had to set it to 0. But this is one implementation... not sure what the others do. I know rabbitmq has a AMQP 1.0 plugin, perhaps others out there too. Worth an audit, IMO.

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

understood.
this other issue on a different library seems to agree with go-amqp : Azure/amqpnetlite#238 (comment)

i.e :

-Developer can configure AMQP idle-timeout for peer 1 (done)
-idle-timeout is communicated to peer 2 from peer 1
-peer 2 sends heartbeat messages to peer 1 at a rate of half of the idle-timeout period
-peer 1 closes the connection if it has not received messages in the specified idle-timeout period

here is the PR implementing the heartbeat at half the peer's timeout value
Azure/amqpnetlite#239

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

So, I dove into the qpid project, which can be seen as the "canonical" implementation for amqp.
They had the same disagreement between the different implementation.

It seems that they understood the spec as you did, and I stand corrected : peers advertise half of their timeout value, instead of halving it on the heartbeat side:

https://github.com/apache/qpid-proton/pull/15/files

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

This would indeed make this library close the connection, as the peers would send the heartbeat only on the timeout. In other words, the connection will drop almost always. (but it gets re-established right after).

So

  • sending heartbeat at half the timeout rate is good, it makes sure that this kind of spec misunderstanding is covered.
  • the timeout we send out in the connection frame should be divided by 2

I think that's where we need to divide by 2 :

go-amqp/conn.go

Line 808 in 1a8b5f4

IdleTimeout: c.idleTimeout,

It bothers me, because from a client's perspective, I advertise a timeout, and I don't get a heartbeat, so I close my connection. it should be the peer's responsibility to make sure that a heartbeat is sent in time.
I would say that amqp-go is implemented right, but relies on other clients to treat the timeout as a timeout and not as a signal to send the heartbeat. Advertising half your timeout is safer, as it ensures that you cover other client's laziness :)

from go-amqp.

serbrech avatar serbrech commented on July 18, 2024

@devigned @jhendrixMSFT @catalinaperalta FYI

from go-amqp.

devigned avatar devigned commented on July 18, 2024

@xinchen10 and @clemensv do you folks have insight you would be willing to share about the spec and disagreement across implementations. Thank you ahead of time.

from go-amqp.

kwdubuc avatar kwdubuc commented on July 18, 2024

For what it's worth, the Solace PubSub+ broker will send something (an empty frame, if nothing else) at the interval specified in the idle-time-out value received from its peer's open frame. In the open frame sent by the Solace PubSub+ broker, the idle-time-out value is half of the actual Solace timeout period.

So, when this library connects to a Solace PubSub+ broker, this library does indeed do many spurious disconnections.

from go-amqp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.