Git Product home page Git Product logo

Comments (6)

violetagg avatar violetagg commented on June 13, 2024

@jtorkkel I can reproduce it and I'm working on a fix. Thanks for the detailed explanation!

from reactor-netty.

jtorkkel avatar jtorkkel commented on June 13, 2024

Thanks, please consider having separate label for proxy and remote_address. It would also show if proxy in use.

I also noticed that "reactor_netty_tcp_client_errors_total" but probably also "reactor_netty_http_client_errors_total" are also inconsistent.

I can see from logs 6 separate errors, 2 first are 1s from each other and remaining 4 few min apart from each.

 r.netty.http.client.HttpClientConnect    -[ece1fcd2-3, L:/11.11.11.11:11111! R:remote_address=proxy:443/10.10.10.10:8080] The connection observed an error reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response

And I can see from spring FW I can also see same amount of "WebClientRequestException" in http client and http server, 2 change in one scrape and 4 change in 4 other scrape.

# error label in SpringBoot 3, in Spring 2 no error label in http client
http_client_requests_seconds_count{dnsname="service", status=~"", error="WebClientRequestException", status="CLIENT_ERROR", method="POST", remote_address="xxx-api:443"} 6

http_server_requests_seconds_count{dnsname="service", status=~"5..", error="WebClientRequestException", method="POST"} 6

But in reactor metrics I can see only 3 and they happen at same time as last of above 6, but in logs only one error seen.

reactor_netty_tcp_client_errors_total{dnsname="service", remote_address="xxx-api:443", uri="tcp"} 3

So sounds that "reactor_netty_*_client_errors_total" is not counting all errors, and sometimes counting error multiple times.

Would be great also to have label for error reason like exception as spring FW seems to aggregate all request exceptions into "WebClientRequestException" resulting you cannot differentiate "connectTimeout", "readtimeout", queueTimeout. But apparently might be hard as exception happens on so many layers, read timeout happen in "io.netty.handler.timeout.ReadTimeoutException"

from reactor-netty.

violetagg avatar violetagg commented on June 13, 2024

@jtorkkel For Reactor Netty version 1.0.x I'm gonna fix it as I will guarantee the remote address is always the real one and not the proxy address. The new tag with proxy address information I'm gonna add to Reactor Netty version 1.1.x. Wdyt?

For the issue with the errors number can you provide some reproducible example?

from reactor-netty.

jtorkkel avatar jtorkkel commented on June 13, 2024

Thanks, make sense to add new label only to 1.1.x.

We were seeing "The connection observed an error reactor.netty.http.client.PrematureCloseException: Connection prematurely closed BEFORE response".

We never found the root cause but we noticed that increasing web client "maxIdleTime" from 2min to 10min and turning on eviction (2min, instead of checking age in acquire/release) we started to get 100x more of "prematurely closed" errors.
It turned out that our loadbalancer had 180s maxIdle timeout, plus 120s background eviction resulting that requests sent after being 180s idle were immediately closed, but loadbalancer actually close idle sockets with delay ~181-301s after being idle (180 + 0-120s + jitter). Thus there was "race" condition.
Most can be eliminated by decreasing idle timeout or by adding TCP_KEEPALIVE < 180s

While root causing I was hoping to see errors in "reactor_netty_tcp_client_errors_total" metrics but as said sometimes "prematurely closed" were reported, sometimes not reported and sometimes single exception resulted multiple errors (no other exception seen) .

I then tested 5 different test cases

  1. connect error (configure localhost:port address which result RST)
  2. connect error after timeout use 1.1.1.1:11111
  3. readTimeout (delay response more than read timeout)
  4. pendingAcquireMaxCount
  5. pendingAcquireTimeout (set pool size to 1 and make multiple parallel calls while delaying responses resulting queing)

1 and 2 resulted connect error correctly, and naturally no error on "reactor_netty_tcp_client_errors_total"

reactor_netty_http_client_connect_time_seconds_count{remote_address="localhost:31111",springBoot="3.2.2",status="ERROR"} 1.0
reactor_netty_http_client_connect_time_seconds_count{remote_address="1.1.1.1:11111",springBoot="3.2.2",status="ERROR"} 1.0

I was expecting that 3-5 would be also seen on "reactor_netty_tcp_client_errors_total" but did not see counter increased (actually missing).
And naturally if error reported would be nice to see if

  • pendingAcquireTimeout
  • writeTimeout
  • readTimeout
  • PrematureCloseException
  • etc.
    as long as cardinality not exploding too much (read < ~50)

Apparently reactor netty is not handling most of the exception and WebClient handle them instead, and aggregate to single WebClientRequestException.

org.springframework.web.reactive.function.client.WebClientRequestException: Pool#acquire(Duration) has been pending for more than the configured timeout of 1000ms
reactor.netty.internal.shaded.reactor.pool.PoolAcquireTimeoutException: Pool#acquire(Duration) has been pending for more than the configured timeout of 1000ms

Caused by: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: no further information: localhost/127.0.0.1:11111
Caused by: java.net.ConnectException: Connection refused: no further information
org.springframework.web.reactive.function.client.ExchangeFunctions$DefaultExchangeFunction.lambda$wrapException$9(ExchangeFunctions.java:136) ~[spring-webflux-6.1.3.jar:6.1.3]
io.netty.handler.timeout.ReadTimeoutException: null

from reactor-netty.

violetagg avatar violetagg commented on June 13, 2024

@jtorkkel PR #3081 adds support for proxy address

from reactor-netty.

violetagg avatar violetagg commented on June 13, 2024

@jtorkkel

  • Errors related to connection establishment are reported by reactor.netty.http.client.connect.time with status ERROR
  • Errors related to connection acquisition are reported by reactor.netty.connection.provider.pending.connections.time with status ERROR. This is a new metric introduced in https://github.com/reactor/reactor-netty/releases/tag/v1.1.14 with #2980
  • Errors related to read timeout - there is an issue and this PR #3090 should fix it

For the moment we do not plan to add the type of the error.

from reactor-netty.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.