Git Product home page Git Product logo

Comments (6)

manugarg avatar manugarg commented on April 28, 2024 2

I found the bug. I'll send a fix and add more details shortly.

from cloudprober.

manugarg avatar manugarg commented on April 28, 2024 1

@bekriebel I am in the process of exporting the change, but in the mean time, problem was in the fact that we were resetting pktbuf inside the loop to the size of the bytes read:

pktbuf = pktbuf[:n]

This is not a problem when using datagram sockets because in that case we don't receive packets not belonging to us (kernel takes care of sending only the relevant packets), hence we never receive a smaller packet and pktbuf size doesn't go further down. In the case of raw sockets, we end up reading all ICMP packets. As soon as we'll read a smaller packet (possibly sent by something else on the network), pktbuf will become smaller.

Regarding your diff, I actually wanted to keep memory allocation outside the for loop. For small number of targets it doesn't matter much, but for large number of targets (say 1000+), like we have in some of our setups, it matters quite a bit.

from cloudprober.

manugarg avatar manugarg commented on April 28, 2024

@bekriebel Thanks for reporting the issue. We did make some changes to the ping probe's code recently (https://github.com/google/cloudprober/commits/master/probes/ping) to improve the memory performance. I'll take a look at what might be going on here.

from cloudprober.

bekriebel avatar bekriebel commented on April 28, 2024

Thanks, @manugarg. I just did a bisect and did indeed find that the first point where the error presents itself is in commit 8932ffb. I'm going to dig into it a little further, but may not have time to find the root cause right now.

In case it makes a difference, I'm building on Windows Subsystem for Linux. I've attempted to run on both WSL and an ARM board, both platforms have the same issue.

from cloudprober.

manugarg avatar manugarg commented on April 28, 2024

Thanks @bekriebel. I confirmed that it's the commit 8932ffb#diff-c9bd39f324facba8690a0ef9b7be50d6 that introduced the bug. In this change, we started parsing ICMP messages ourselves instead of relying on icmp.ParseMessage as icmp.ParseMessage is not memory efficient. There should be a problem somewhere in that parsing logic. I'll continue to look.

Very much appreciate the bug report. It's clear that we need to enhance our testing. I'll do that after finding the root cause of this bug.

from cloudprober.

bekriebel avatar bekriebel commented on April 28, 2024

Taking a guess that it has to do with the pktbuf channel. It looks like it isn't being reused cleanly. I noticed if I move its creation into the tracker channel's for loop, it fixes the issue; but I'm not sure if that is the cleanest fix.

diff --git a/probes/ping/ping.go b/probes/ping/ping.go
index 6b6b2be..c2ade86 100644
--- a/probes/ping/ping.go
+++ b/probes/ping/ping.go
@@ -306,7 +306,6 @@ func (p *Probe) recvPackets(runID uint16, tracker chan bool) {
        received := make(map[packetKey]bool, int(p.c.GetPacketsPerProbe())*len(p.targets))
        outstandingPkts := 0
        p.conn.setReadDeadline(time.Now().Add(p.opts.Timeout))
-       pktbuf := make([]byte, 1500)
        for {
                // To make sure that we have picked up all the packets sent by the sender, we
                // use a tracker channel. Whenever sender successfully sends a packet, it notifies
@@ -325,6 +324,7 @@ func (p *Probe) recvPackets(runID uint16, tracker chan bool) {
                }

                // Read packet from the socket
+               pktbuf := make([]byte, icmpHeaderSize+p.c.GetPayloadSize())
                n, peer, err := p.conn.read(pktbuf)

                if err != nil {

from cloudprober.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.