Comments (6)
I found the bug. I'll send a fix and add more details shortly.
from cloudprober.
@bekriebel I am in the process of exporting the change, but in the mean time, problem was in the fact that we were resetting pktbuf inside the loop to the size of the bytes read:
cloudprober/probes/ping/ping.go
Line 343 in 667a6db
This is not a problem when using datagram sockets because in that case we don't receive packets not belonging to us (kernel takes care of sending only the relevant packets), hence we never receive a smaller packet and pktbuf size doesn't go further down. In the case of raw sockets, we end up reading all ICMP packets. As soon as we'll read a smaller packet (possibly sent by something else on the network), pktbuf will become smaller.
Regarding your diff, I actually wanted to keep memory allocation outside the for loop. For small number of targets it doesn't matter much, but for large number of targets (say 1000+), like we have in some of our setups, it matters quite a bit.
from cloudprober.
@bekriebel Thanks for reporting the issue. We did make some changes to the ping probe's code recently (https://github.com/google/cloudprober/commits/master/probes/ping) to improve the memory performance. I'll take a look at what might be going on here.
from cloudprober.
Thanks, @manugarg. I just did a bisect and did indeed find that the first point where the error presents itself is in commit 8932ffb. I'm going to dig into it a little further, but may not have time to find the root cause right now.
In case it makes a difference, I'm building on Windows Subsystem for Linux. I've attempted to run on both WSL and an ARM board, both platforms have the same issue.
from cloudprober.
Thanks @bekriebel. I confirmed that it's the commit 8932ffb#diff-c9bd39f324facba8690a0ef9b7be50d6 that introduced the bug. In this change, we started parsing ICMP messages ourselves instead of relying on icmp.ParseMessage as icmp.ParseMessage is not memory efficient. There should be a problem somewhere in that parsing logic. I'll continue to look.
Very much appreciate the bug report. It's clear that we need to enhance our testing. I'll do that after finding the root cause of this bug.
from cloudprober.
Taking a guess that it has to do with the pktbuf
channel. It looks like it isn't being reused cleanly. I noticed if I move its creation into the tracker channel's for loop, it fixes the issue; but I'm not sure if that is the cleanest fix.
diff --git a/probes/ping/ping.go b/probes/ping/ping.go
index 6b6b2be..c2ade86 100644
--- a/probes/ping/ping.go
+++ b/probes/ping/ping.go
@@ -306,7 +306,6 @@ func (p *Probe) recvPackets(runID uint16, tracker chan bool) {
received := make(map[packetKey]bool, int(p.c.GetPacketsPerProbe())*len(p.targets))
outstandingPkts := 0
p.conn.setReadDeadline(time.Now().Add(p.opts.Timeout))
- pktbuf := make([]byte, 1500)
for {
// To make sure that we have picked up all the packets sent by the sender, we
// use a tracker channel. Whenever sender successfully sends a packet, it notifies
@@ -325,6 +324,7 @@ func (p *Probe) recvPackets(runID uint16, tracker chan bool) {
}
// Read packet from the socket
+ pktbuf := make([]byte, icmpHeaderSize+p.c.GetPayloadSize())
n, peer, err := p.conn.read(pktbuf)
if err != nil {
from cloudprober.
Related Issues (20)
- Running On Kubernetes pod encounter an error on OpenStack HOT 6
- Support allowed_metrics_regex in Prometheus surfacer HOT 1
- Unable to connect gRPC for dynamic configuration of cloudprober
- Support specifying prober interval / timeouts as durations HOT 1
- RDS Kubernetes Endpoints include pod name HOT 5
- HTTP probe with `file_targets` can begin first iteration with 0 targets HOT 1
- GKE Logging StackDriver Metadata error HOT 11
- Default behaviour of RDS Filter and Probing multiple matched services HOT 4
- Datadog surfacer makes cloudprober binary too big HOT 2
- [Documentation] Document file surfacer output HOT 1
- Reduce resource consumption when using file discovery with same file in multiple probes HOT 23
- Document templating language HOT 5
- Implement caching option in RDS protocol HOT 1
- Allow debugging configuration templates, and playbook HOT 3
- file_targets for probe options HOT 5
- Metrics not updated on external probe timeout HOT 6
- `additional_label`'s are missing in custom metrics of `EXTERNAL` probe HOT 3
- Socket: permission denied when run in Kubernetes HOT 2
- Failed to publish metrics to cloudwatch: MissingRegion: could not find region configuration HOT 3
- Improve documentation for cloudwatch surfacer
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from cloudprober.