Git Product home page Git Product logo

trace-noschedule's Introduction

Trace-noschedule

在实际问题中,业务经常会遇到延迟高导致的问题。延迟可能来自某个方面。我们的内核默认配成内核态不支持抢占。如果A进程陷入内核态执行时间过长,必然影响其他希望在该核上运行的进程。此时就会导致调度延迟。针对这种case,我们开发了一款工具专门跟踪陷入内核态长时间不调度的进程。这对于我们排查问题可以有一定的指导方向。目前工具已经开发完成,命名为Trace-noschedule。

如何安装

安装 trace-noschedule 工具很简单,git clone代码后执行如下命令即可安装。

make -j8
make install

如何使用

安装trace-noschedule工具成功后。会创建如下 /proc/trace_noschedule 目录。

ls /proc/trace_noschedule
distribution  enable  stack_trace  threshold

/proc/trace_noschedule目录下存在 4 个文件,分别:distribution, enable, stack_trace和threshold。工具安装后,默认是关闭状态。

1. 打开tracer

执行以下命令打开tracer。

echo 1 > /proc/trace_noschedule/enable
2. 关闭tracer

执行如下命令关闭tracer。

echo 0 > /proc/trace_noschedule/enable

Note: debug问题后请记得关闭tracer。因为模块内部实现基于sched tracepoint,overhead不能忽略。

3. 设置阈值

trace_noschedule只会针对内核态执行时间超过阈值不调度的进程记录stack trace。为了更高效的运作,我们有必要设定一个合理阈值。例如设置60ms的阈值(单位:ns):

echo 60000000 > /proc/trace_noschedule/threshold
4. 查看内核态长时间未调度进程执行的时间分布。
cat /proc/trace_noschedule/distribution

Trace noschedule thread:
     msecs      : count   distribution
    20 -> 39    : 1     |**********                              |
    40 -> 79    : 0     |                                        |
    80 -> 159   : 4     |****************************************|
   160 -> 319   : 2     |********************                    |

在内核态有4次执行时间在[80, 159]ms范围内没有调度。

5. 是谁占用CPU不调度

stack_trace记录占用CPU时间超过阈值不调度进程的栈。

cat /proc/trace_noschedule/stack_trace

 cpu: 0
   COMM: sh PID: 1270013 DURATION: 100ms
   delay_tsc+0x21/0x50
   nosched_test_write+0x53/0x90 [trace_noschedule]
   proc_reg_write+0x36/0x60
   __vfs_write+0x33/0x190
   vfs_write+0xb0/0x190
   ksys_write+0x52/0xc0
   do_syscall_64+0x4f/0xe0
   entry_SYSCALL_64_after_hwframe+0x44/0xa9

这是个内核态测试的case,在内核态执行mdelay(100)占用CPU 100ms不调度。此时记录的栈如上面所示。"DURATION"记录的就是执行持续时间。

6. 清除stack trace

如果我们需要清除stack trace记录的信息(stack trace buffer是有大小限制的,必要的时候需要clear)。

echo 0 > /proc/trace_noschedule/stack_trace

trace-noschedule's People

Contributors

bh1scw avatar smcdef avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

trace-noschedule's Issues

编译报错

3.10.0-1062.12.1.el7.x86_64:

#2
make -C /lib/modules/3.10.0-1062.12.1.el7.x86_64/build M=/root/trace-noschedule modules
make[1]: Entering directory `/usr/src/kernels/3.10.0-1062.12.1.el7.x86_64'
CC [M] /root/trace-noschedule/trace_noschedule.o
/root/trace-noschedule/trace_noschedule.c:116:12: error: static declaration of ‘kstrtobool_from_user’ follows non-static declaration
static int kstrtobool_from_user(const char __user *s, size_t count, bool *res)
^
In file included from include/linux/rbtree.h:32:0,
from include/linux/hrtimer.h:18,
from /root/trace-noschedule/trace_noschedule.c:13:
include/linux/kernel.h:380:18: note: previous declaration of ‘kstrtobool_from_user’ was here
int __must_check kstrtobool_from_user(const char __user *s, size_t count, bool *res);
^
/root/trace-noschedule/trace_noschedule.c: In function ‘trace_nosched_register_tp’:
/root/trace-noschedule/trace_noschedule.c:445:7: warning: passing argument 1 of ‘tracepoint_probe_register’ from incompatible pointer type [enabled by default]
info->stack_trace);
^
In file included from include/linux/module.h:18:0,
from /root/trace-noschedule/trace_noschedule.c:16:
include/linux/tracepoint.h:48:12: note: expected ‘const char *’ but argument is of type ‘struct tracepoint *’
extern int tracepoint_probe_register(const char *name, void *probe, void *data);
^
/root/trace-noschedule/trace_noschedule.c:454:12: warning: passing argument 1 of ‘tracepoint_probe_unregister’ from incompatible pointer type [enabled by default]
info->stack_trace);

make -j8编译报错

make -C /lib/modules/3.10.0-514.16.1.el7.x86_64/build M=/root/trace-noschedule modules
make[1]: Entering directory /usr/src/kernels/3.10.0-514.16.1.el7.x86_64' CC [M] /root/trace-noschedule/trace_noschedule.o /root/trace-noschedule/trace_noschedule.c: In function ‘trace_nosched_register_tp’: /root/trace-noschedule/trace_noschedule.c:444:7: warning: passing argument 1 of ‘tracepoint_probe_register’ from incompatible pointer type [enabled by default] info->stack_trace); ^ In file included from include/linux/module.h:18:0, from /root/trace-noschedule/trace_noschedule.c:16: include/linux/tracepoint.h:42:12: note: expected ‘const char *’ but argument is of type ‘struct tracepoint *’ extern int tracepoint_probe_register(const char *name, void *probe, void *data); ^ /root/trace-noschedule/trace_noschedule.c:453:12: warning: passing argument 1 of ‘tracepoint_probe_unregister’ from incompatible pointer type [enabled by default] info->stack_trace); ^ In file included from include/linux/module.h:18:0, from /root/trace-noschedule/trace_noschedule.c:16: include/linux/tracepoint.h:49:1: note: expected ‘const char *’ but argument is of type ‘struct tracepoint *’ tracepoint_probe_unregister(const char *name, void *probe, void *data); ^ /root/trace-noschedule/trace_noschedule.c: In function ‘trace_nosched_unregister_tp’: /root/trace-noschedule/trace_noschedule.c:472:9: warning: passing argument 1 of ‘tracepoint_probe_unregister’ from incompatible pointer type [enabled by default] info->stack_trace); ^ In file included from include/linux/module.h:18:0, from /root/trace-noschedule/trace_noschedule.c:16: include/linux/tracepoint.h:49:1: note: expected ‘const char *’ but argument is of type ‘struct tracepoint *’ tracepoint_probe_unregister(const char *name, void *probe, void *data); ^ In file included from include/asm-generic/percpu.h:6:0, from ./arch/x86/include/asm/percpu.h:530, from ./arch/x86/include/asm/current.h:5, from ./arch/x86/include/asm/processor.h:15, from ./arch/x86/include/asm/thread_info.h:22, from include/linux/thread_info.h:54, from include/linux/preempt.h:9, from include/linux/spinlock.h:50, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/linux/ktime.h:24, from include/linux/hrtimer.h:19, from /root/trace-noschedule/trace_noschedule.c:13: /root/trace-noschedule/trace_noschedule.c: In function ‘distribution_show’: include/linux/percpu-defs.h:27:38: error: cast specifies array type const void __percpu *__vpp_verify = (typeof(ptr))NULL; \ ^ include/asm-generic/percpu.h:46:2: note: in expansion of macro ‘__verify_pcpu_ptr’ __verify_pcpu_ptr((__p)); \ ^ include/linux/percpu.h:149:31: note: in expansion of macro ‘SHIFT_PERCPU_PTR’ #define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu))) ^ /root/trace-noschedule/trace_noschedule.c:567:14: note: in expansion of macro ‘per_cpu_ptr’ hist_cpu = per_cpu_ptr(stack_trace->hist, cpu); ^ /root/trace-noschedule/trace_noschedule.c: In function ‘trace_noschedule_init’: /root/trace-noschedule/trace_noschedule.c:690:2: error: implicit declaration of function ‘for_each_kernel_tracepoint’ [-Werror=implicit-function-declaration] for_each_kernel_tracepoint(tracepoint_lookup, info); ^ cc1: some warnings being treated as errors make[2]: *** [/root/trace-noschedule/trace_noschedule.o] Error 1 make[1]: *** [_module_/root/trace-noschedule] Error 2 make[1]: Leaving directory /usr/src/kernels/3.10.0-514.16.1.el7.x86_64'
make: *** [all] Error 2

schedule计算方法的疑问

在已经有sched_switch tracepoint probe的情况下,为什么还需要hrtimer辅助计算时间呢,直接计算切入切出的时间有什么样的问题吗?

make err

In file included from include/asm-generic/percpu.h:6:0,
from ./arch/x86/include/asm/percpu.h:530,
from ./arch/x86/include/asm/current.h:5,
from ./arch/x86/include/asm/processor.h:15,
from ./arch/x86/include/asm/thread_info.h:22,
from include/linux/thread_info.h:54,
from include/linux/preempt.h:9,
from include/linux/spinlock.h:50,
from include/linux/seqlock.h:35,
from include/linux/time.h:5,
from include/linux/ktime.h:24,
from include/linux/hrtimer.h:19,
from /home/xxx/code/trace-noschedule/trace_noschedule.c:13:
/home/xxx/code/trace-noschedule/trace_noschedule.c: In function ‘distribution_show’:
include/linux/percpu-defs.h:27:38: error: cast specifies array type
const void __percpu *__vpp_verify = (typeof(ptr))NULL;
^
include/asm-generic/percpu.h:46:2: note: in expansion of macro ‘__verify_pcpu_ptr’
__verify_pcpu_ptr((__p));
^
include/linux/percpu.h:149:31: note: in expansion of macro ‘SHIFT_PERCPU_PTR’
#define per_cpu_ptr(ptr, cpu) SHIFT_PERCPU_PTR((ptr), per_cpu_offset((cpu)))
^
/home/xxx/code/trace-noschedule/trace_noschedule.c:567:14: note: in expansion of macro ‘per_cpu_ptr’
hist_cpu = per_cpu_ptr(stack_trace->hist, cpu);
^
/home/xxx/code/trace-noschedule/trace_noschedule.c: In function ‘trace_noschedule_init’:
/home/xxx/code/trace-noschedule/trace_noschedule.c:690:2: error: implicit declaration of function ‘for_each_kernel_tracepoint’ [-Werror=implicit-function-declaration]
for_each_kernel_tracepoint(tracepoint_lookup, info);

在centos 7.2 内核 3.10.0-514.16.1.el7.x86_64下编译失败了

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.