Git Product home page Git Product logo

nux's Issues

cmdline 如果被进程重写,现有逻辑将无法忽略空格。

nux/proc.go

Line 67 in debb382

if cmdlineBytes[j] != 0 {

有一些进程,例如nginx会重写cmdline:

[root@vm-vm114 falcon-agent]# ps aux|grep nginx
root     14811  0.0  0.0 108964  1884 ?        Ss   19:18   0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf
nginx    14812  0.0  0.1 109388  2736 ?        S    19:18   0:00 nginx: worker process
nginx    14813  0.0  0.1 109388  2656 ?        S    19:18   0:00 nginx: worker process
root     15182  0.0  0.0 103336   864 pts/2    S+   19:57   0:00 grep nginx
[root@vm-vm114 falcon-agent]#
[root@vm-vm114 falcon-agent]# xxd /proc/14811/cmdline
0000000: 6e67 696e 783a 206d 6173 7465 7220 7072  nginx: master pr
0000010: 6f63 6573 7320 2f75 7372 2f73 6269 6e2f  ocess /usr/sbin/
0000020: 6e67 696e 7820 2d63 202f 6574 632f 6e67  nginx -c /etc/ng
0000030: 696e 782f 6e67 696e 782e 636f 6e66       inx/nginx.conf

可以看到,原本看起来应该是0x00的地方,却用0x20(空格字符),falcon-agent在使用这个库获取到的进程含有空格,会导致cmdline无法匹配上。

crash on Fedora

ss_s.go

arr := strings.Split(content, ", ")
                        for _, val := range arr {
                                fields := strings.Fields(val)
                                if fields[0] == "timewait" {
                                        timewait_arr := strings.Split(fields[1], "/")
                                        m["timewait"], _ = strconv.ParseUint(timewait_arr[0], 10, 64)
                                        m["slabinfo.timewait"], _ = strconv.ParseUint(timewait_arr[1], 10, 64)
                                        continue
                                }
                                m[fields[0]], _ = strconv.ParseUint(fields[1], 10, 64)
                        }
                        return

ss -s on Fedora returns:

Total: 843
TCP:   89 (estab 56, closed 11, orphaned 0, timewait 2)

Transport Total     IP        IPv6
RAW	  1         0         1        
UDP	  8         5         3        
TCP	  78        48        30       
INET	  87        53        34       
FRAG	  0         0         0

while on CentOS, it returns:

Total: 525 (kernel 0)
TCP:   14 (estab 8, closed 0, orphaned 0, synrecv 0, timewait 0/0), ports 0

Transport Total     IP        IPv6
*	  0         -         -        
RAW	  0         0         0        
UDP	  7         4         3        
TCP	  14        11        3        
INET	  21        15        6        
FRAG	  0         0         0

So, it always crashes on Fedora with:

panic: runtime error: index out of range

Double count Guest Time in Total CPU Time

When running Virtual Machine(KVM) in host, I find that the value of host cpu usage is different between falcon agent and top cmd. Actually, falcon agent use nux package to calculate cpu usage.
https://github.com/open-falcon-archive/agent/blob/master/funcs/cpustat.go#L19

In Linux Kernel, User time is already include Guest Time. So when read cpu info from /proc/stat, we should not add guest time once more.

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/kernel/sched/cputime.c#n138

ignore file type: squashfs

Ubuntu 18.04 default running snapd process, which auto mount squashfs files.

http://squashfs.sourceforge.net/

Squashfs is a compressed read-only filesystem for Linux. Squashfs is intended for general read-only filesystem use, for archival use (i.e. in cases where a .tar.gz file may be used), and in constrained block device/memory systems (e.g. embedded systems) where low overhead is needed.

as squashfs is a read-only filesystem, mounting a squashfs file likes mounting a read-pnly cd/dvd iso file. may be we just ignore this filetype.

#df -h
Filesystem                         Size  Used Avail Use% Mounted on
udev                               962M     0  962M   0% /dev
tmpfs                              199M  1.1M  198M   1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv  118G   11G  108G  10% /
tmpfs                              992M   12K  992M   1% /dev/shm
tmpfs                              5.0M     0  5.0M   0% /run/lock
tmpfs                              992M     0  992M   0% /sys/fs/cgroup
/dev/loop0                          90M   90M     0 100% /snap/core/6130
/dev/loop1                          90M   90M     0 100% /snap/core/6673
/dev/loop2                          89M   89M     0 100% /snap/core/7396
/dev/xvda2                         976M  143M  767M  16% /boot
tmpfs                              199M     0  199M   0% /run/user/47483
#df -i
Filesystem                          Inodes  IUsed    IFree IUse% Mounted on
udev                                246153    430   245723    1% /dev
tmpfs                               253889    670   253219    1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 61865984 181453 61684531    1% /
tmpfs                               253889      4   253885    1% /dev/shm
tmpfs                               253889      3   253886    1% /run/lock
tmpfs                               253889     18   253871    1% /sys/fs/cgroup
/dev/loop0                           12810  12810        0  100% /snap/core/6130
/dev/loop1                           12819  12819        0  100% /snap/core/6673
/dev/loop2                           12823  12823        0  100% /snap/core/7396
/dev/xvda2                           65536    309    65227    1% /boot
tmpfs                               253889     10   253879    1% /run/user/47483

ignore file type: iso9660

mount an iso file, the filesystem info show below.
as iso files are read-only, mayby we can just ignore them.

mount -o loop ubuntu-18.04.2-server-amd64.iso  /mnt/

df -h
/dev/loop0                  883M  883M     0 100% /mnt

df -i
/dev/loop0                        0      0        0     - /mnt


fs_spec: /dev/loop0
fs_file: /mnt
fs_vfstype: iso9660

dfstat 文件类型过滤

dfstat.go

centos:

proc /run/docker/netns/1fea574dae2e proc rw,nosuid,nodev,noexec,relatime 0 0

Ubuntu:

nsfs /run/docker/netns/b6fb8a43129f nsfs rw 0 0

Ubuntu 多了一个 nsfs 的类型
是否考虑加 FSTYPE_IGNORE 列表中

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.