Git Product home page Git Product logo

Comments (3)

jto avatar jto commented on August 25, 2024

Hey! This is rather odd. Looks like there may be a network issue somewhere. Could you look into your namespace events see if anything pops ?

from flink-on-k8s-operator.

pzim avatar pzim commented on August 25, 2024

@jto - appreciate you responding so quick. The networking is fine I believe. I deployed an ubuntu container with networking tools in the same namespace and am able to access each of the job/taskmanager pods/ports. The only interesting events are related to the taskmanager experiencing the gating issue, where its readiness probes are failing with 500 response.
Logs in this failing taskmanager still showing:

Tried to associate with unreachable remote address [akka.tcp://flink@flinkjobcluster-sample-jobmanager:6123]. Address is now gated for 50 ms, all messages to this address will be delivered to dead letters. Reason: [The remote system has quarantined this system. No further associations to the remote system are possible until this system is restarted.]

Another data point: I can delete the failing taskmanager pod and when it comes back up it becomes the healthy one and the other, previously healthy taskmanager pod becomes unhealthy with the above gated message.

from flink-on-k8s-operator.

pzim avatar pzim commented on August 25, 2024

After doing some pcaps, the network traffic between the taskmanagers and and jobmanager looks pretty healthy/normal.

After seeing heartbeat timeouts like below led to the thought that our envoy proxies were blocking the heartbeat communication.

org.apache.flink.runtime.taskexecutor.exceptions.TaskManagerException: The heartbeat of ResourceManager with id cf8116891cd5aec6f77ac3f897359ff6 timed out.

After adding an istio annotation to exclude traffic for port 6123, both taskmanagers were able to register and stay up.
Closing this issue now as this has resolved the issue.

from flink-on-k8s-operator.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.