andikleen / pmu-tools Goto Github PK
View Code? Open in Web Editor NEWIntel PMU profiling tools
License: GNU General Public License v2.0
Intel PMU profiling tools
License: GNU General Public License v2.0
Hi, thanks for the great tool!
simple-pebs/simple-pebs.c
uses CPU_STARTING and CPU_DYING to allow CPUs to be hot-plugged, but these macros are no longer used in Linux 4.9.
http://lxr.free-electrons.com/ident?v=4.9;i=CPU_STARTING
As this commit (https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git/commit/?id=ee1e714b94521b0bb27b04dfd1728ec51b19d4f0) suggests, we should move to the new state machine mechanism to support hot-plugging for kernel 4.9 or later versions.
For most of the cases where CPU hot-plugging never happens, just delete the notifier call-backs like soramichi@d175a0b should work.
I'm trying to run toplev.py with a docker container as a workload, I use the following command:
python toplev.py --core C0 -l1 -I 1000 -x, -o ../benchmarks/mediaStreamingLevel1I1000msC0.csv taskset -c 0 docker run -t --name=streaming_client -v /path/to/output:/output --volumes-from streaming_dataset --net streaming_network cloudsuite/media-streaming:client 172.18.0.2
When I run this the toplev is removing the '-v' flag present in the docker command which is causing errors. The output is:
Will measure complete system
Using level 1.
perf stat -x\; -e
'{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,any=1,cmask=1/,cpu/event=0xc2,umask=0x2/}' -I 1000 -C 0,18,36,54 -A -a taskset -c 0 docker run -t --name=streaming_client /path/to/output:/output --volumes-from streaming_dataset --net streaming_network cloudsuite/media-streaming:client 172.18.0.2
Unable to find image '/path/to/output:/output:latest' locally
This might be happening as toplev also has the '-v ' flag (--verbose or -v). Without using toplev the docker container runs fine.
I'm confused with the remote write counter.
I've used ocperf.py list
to find out the remote write counter , and I found the events--offcore_response_corewb_llc_miss_any_dram
and offcore_response_corewb_llc_hit_any_response
may be the write counter.
So I use the command ocperf.py stat -e offcore_response.corewb.llc_miss.any_dram,offcore_response.all_reads.llc_miss.remote_dram,offcore_response.corewb.llc_hit.any_response,mem-stores -I 1000 -C 8
to monitor the system. Than I use numactl
to bind milc to physcpu 8 and remote memory, but the result is so confusing.
253.030788963 0 offcore_response_corewb_llc_miss_any_dram (36.40%)
253.030788963 36,177,045 offcore_response_all_reads_llc_miss_remote_dram (36.40%)
253.030788963 0 offcore_response_corewb_llc_hit_any_response (18.18%)
253.030788963 224,853,825 mem-stores (27.21%)
254.030893213 0 offcore_response_corewb_llc_miss_any_dram (36.40%)
254.030893213 35,695,552 offcore_response_all_reads_llc_miss_remote_dram (36.39%)
254.030893213 0 offcore_response_corewb_llc_hit_any_response (18.11%)
254.030893213 230,275,843 mem-stores (27.21%)
255.031004841 0 offcore_response_corewb_llc_miss_any_dram (36.39%)
255.031004841 35,970,716 offcore_response_all_reads_llc_miss_remote_dram (36.31%)
255.031004841 0 offcore_response_corewb_llc_hit_any_response (18.11%)
255.031004841 219,686,387 mem-stores (27.21%)
The result shows the llc miss and llc hit are both 0 .....
So I wonder if the events I chose are wrong?
Hoping for your reply.
Haswell Xeon CBO.LLC_{DDIO,PCIE}_* events are missing. Any plans to add them in the near future?
ubuntu
cpu: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
kernel :Linux ubuntu 3.16.0-31-generic
root@ubuntu:~/pmu-tools# toplev.py -l2 -p 2004
Running in HyperThreading mode. Will measure complete system.
Using level 2.
perf stat -x, -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1,any=1/,cpu/event=0xc2,umask=0x2/},{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xa3,umask=0x6,cmask=6/,cycles,cpu/event=0xa2,umask=0x8/,cpu/event=0x9c,umask=0x1,cmask=4/},{cpu/event=0x3c,umask=0x0,any=1/,instructions,cpu/event=0x9c,umask=0x1/,cycles,cpu/event=0x9c,umask=0x1,cmask=4/},{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x79,umask=0x30/,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xb1,umask=0x1,cmask=2/,cycles,cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/,cpu/event=0xb1,umask=0x1,cmask=3/},{cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0xa2,umask=0x8/,cpu/event=0x5e,umask=0x1/,instructions}' -A -a -p 2004
7 ---->print index in code./pmu-tools/toplev.py 915 line
(11946234493.0,) ----->print res[index]
0 ---->print cpuoff
then repeat
6
(3199005707.0,)
0
7
(11946234493.0,)
1
Traceback (most recent call last):
File "/root/pmu-tools/toplev.py", line 1377, in
ret = execute(runner, out, rest)
File "/root/pmu-tools/toplev.py", line 728, in execute
print_keys(runner, res, rev, out, interval, env)
File "/root/pmu-tools/toplev.py", line 682, in print_keys
runner.print_res(r, rev[cpus[0]], out, interval, core_fmt(core), env, Runner.SMT_yes, stat)
File "/root/pmu-tools/toplev.py", line 1161, in print_res
obj.compute(lambda e, level:
File "/root/pmu-tools/ivb_server_ratios.py", line 637, in compute
self.val = (STALLS_MEM_ANY(EV, 2) + EV("RESOURCE_STALLS.SB", 2)) / CLKS(EV, 2 )
File "/root/pmu-tools/ivb_server_ratios.py", line 71, in STALLS_MEM_ANY
return EV(lambda EV , level : min(EV("CPU_CLK_UNHALTED.THREAD", level) , EV("CYCLE_ACTIVITY.STALLS_LDM_PENDING", level)) , level )
File "/root/pmu-tools/toplev.py", line 1162, in
lookup_res(res, rev, e, obj, env, level, stat.referenced))
File "/root/pmu-tools/toplev.py", line 902, in lookup_res
for off in range(cpu.threads)])
File "/root/pmu-tools/ivb_server_ratios.py", line 71, in
return EV(lambda EV , level : min(EV("CPU_CLK_UNHALTED.THREAD", level) , EV("CYCLE_ACTIVITY.STALLS_LDM_PENDING", level)) , level )
File "/root/pmu-tools/toplev.py", line 901, in
lookup_res(res, rev, ev, obj, env, level, referenced, off), level)
File "/root/pmu-tools/toplev.py", line 919, in lookup_res
return res[index][cpuoff]
IndexError: tuple index out of range
we think the cpuoff =1 out of rang because of that the res[index] only have one member.
How can we fix the bug?
Thanks
Any plan to add BDX support? Is it safe to use bdxde_*.py files instead, in the meantime?
BDX events are now available on https://download.01.org/perfmon/.
I am trying to run this command - sudo ../pmu-tools/toplev.py -I 100 -l3 --title "GNU grep" --graph grep -r asdf /etc/*
. On an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
This is setting off an AssertionError inside toplev.
Traceback (most recent call last):
File "../pmu-tools/toplev.py", line 950, in <module>
ret = execute(runner, out, rest)
File "../pmu-tools/toplev.py", line 509, in execute
env)
File "../pmu-tools/toplev.py", line 549, in do_execute
runner.print_res(res[j], rev[j], out, prev_interval, j, env)
File "../pmu-tools/toplev.py", line 806, in print_res
obj.compute(lambda e, level:
File "/home/subho/pmu-tools/hsw_client_ratios.py", line 713, in compute
self.val = BackendBoundAtEXE(EV, 2)- self.MemoryBound.compute(EV )
File "/home/subho/pmu-tools/hsw_client_ratios.py", line 30, in BackendBoundAtEXE
return BackendBoundAtEXE_stalls(EV, level) / CLKS(EV, level)
File "/home/subho/pmu-tools/hsw_client_ratios.py", line 28, in BackendBoundAtEXE_stalls
return ( EV("CYCLE_ACTIVITY.CYCLES_NO_EXECUTE", level) + EV("UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC", level) - FewUopsExecutedThreshold(EV, level) - EV("RS_EVENTS.EMPTY_CYCLES", level) + EV("RESOURCE_STALLS.SB", level) )
File "../pmu-tools/toplev.py", line 807, in <lambda>
lookup_res(res, rev, e, obj, env, level))
File "../pmu-tools/toplev.py", line 631, in lookup_res
assert event_rmap(rev[index]) == canon_event(ev)
AssertionError
This might be related to #7. I downloaded the https://download.01.org/perfmon/HSW/Haswell_core_V15.json
and put it in my pmu_events folder as GenuineIntel-6-3C-core.json
(instead of the V14 file which does not exist there and which event_download.py
was looking for).
I've been trying to do something similar to the interrupts.c code, but was having trouble with rdpmc_read() giving seemingly nonsense results. I've narrowed it down to the buf->offset field sometimes having a high bit set (1L << 48). PERF_EVENT_IOC_RESET will both the counter and the offset to 0, but at some point buf->offset will jump back. Masking off the high bits (or just ignoring offset) solves the problem but I can't find any reason it should be necessary.
I don't know if this is a bug in rdpmc_read(), a bug in the kernel, or something I'm doing wrong. I'm using a slightly older kernel (4.2.0) on Skylake, so it's also possible this is something that has already been fixed. Test code against current pmu-tools master is here: https://github.com/nkurz/pmu-tools/tree/test-offset. Help or suggestions of a better venue would be appreciated. I can upgrade to the current kernel and retest if necessary. Thanks!
minor issue,
alp@ws207:~/wrk$ ./pmu-tools/list-events.py
bash: ./pmu-tools/list-events.py: /usr/bin/python^M: ะฟะปะพั
ะพะน ะธะฝัะตัะฟัะตัะฐัะพั: ะะตั ัะฐะบะพะณะพ ัะฐะนะปะฐ ะธะปะธ ะบะฐัะฐะปะพะณะฐ
CSV-enabled output ignores the scale argument. If this is intentional the README should be updated, otherwise a quick workaround would be to modify self.vals in OutputCSV.flush() if args.scale is set.
addr.c
uses the macro PERF_ATTR_SIZE_VER1
, which isn't available in CentOS's version of /usr/include/linux/perf_events.h
. However, it does have PERF_ATTR_SIZE_VER0
, and compiles correctly when that is changed. Not sure if that will cause issues with usage, however.
Also, addr.c
doesn't compile with gcc 4.4.7
, and I'm reasonably certain that it's because that file uses an anonymous union in a struct, which that version of GCC doesn't support (not even with -std=c11
, which isn't a supported option in this version). Upgrading my GCC to one that was compiled in this decade fixes the issue, without any source code modification other than the replaced macro above.
Currently, I am trying to analyze my application by using toplev.py. However, it seems like that Xeon E5-2630 v3 is not supported. Specifically, I could not get the frontend
, retiring
and Bad speculation
information. In addition, the backend
information could not generate the detailed information such as memory bound and core bound information.
$] python toplev.py -l5 my_app
28 events not supported
0 BE Backend_Bound: 67.04%
This category reflects slots where no uops are being
delivered due to a lack of required resources for accepting
more uops in the Backend of the pipeline...
0 CPU utilization: 0.89 CPUs
Number of CPUs used...
1 BE Backend_Bound: 67.43%
1 CPU utilization: 0.89 CPUs
(I am sorry to disturb the issue article.)
I believe I've found two issues with --power functionality in current HEAD.
First, commit 9458aea925a20c19c9d15056c5dc623dc3fdbf12 appears to break power events (and likely some others too), because after that change valid_events_str is computed too early, before valid_events are populated in Runner::collect.
Reverting that commit seems to fix the issue for me on a HT CPU. However, on a non-HT CPU another issue remains: the metrics are not printed, unless I add -A flag to perf command line.
It seems that hsx_server_ratios is not included in the repo. Could you please add it?
I'm currently testing toplev.py on a Xeon v3 (Haswell) and see this output in level 2 system-wide test:
# ./toplev.py -l2 sleep 5
Will measure complete system.
Using level 2.
warning: removing Memory_Bound Core_Bound due to unsupported events in kernel:
CYCLE_ACTIVITY.CYCLES_NO_EXECUTE CYCLE_ACTIVITY.STALLS_LDM_PENDING
Use --force-events to override (may result in wrong measurements)
Nodes Memory_Bound Core_Bound have errata HSM31 and were disabled.
Override with --ignore-errata
Using the --force-events
and --ignore-errata
options works as a workaround. However, I'm wondering if the error is valid in the first place?
I see HSM31 in the ~/.cache/pmu-events/GenuineIntel-6-3C-core.json
file and find it in the Intel 4th-gen-core mobile specification update
as HSM31: Performance Monitor UOPS_EXECUTED Event May Undercount.
However, I don't see HSM31 or anything UOPS_EXECUTED-related mentioned in the Intel Xeon E5 v3 specification update for my processor at all.
So is this warning valid?
The CPU of my Haswell test system:
# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 63
Model name: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
Stepping: 2
CPU MHz: 2888.281
BogoMIPS: 5010.61
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 30720K
NUMA node0 CPU(s): 0-11,24-35
NUMA node1 CPU(s): 12-23,36-47
Running the latest CentOS 7.2 kernel:
# uname -a
Linux haswell1 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
Hi all,
I have noticed that on my CPU Intel i7-3537U ocperf lists the following events that seem to be the same (event, umask and any flag), what's the meaning of _p?
Thanks
cpu_clk_unhalted.thread: Core cycles when the thread is not in halt state
cpu/event=0x3c,umask=0x0,name=cpu_clk_unhalted_thread/
cpu_clk_unhalted.thread_p: Thread cycles when thread is not in halt state
cpu/event=0x3c,umask=0x0,name=cpu_clk_unhalted_thread_p/
cpu_clk_unhalted.thread_any: Core cycles when at least one thread on the physical core is not in halt state
cpu/event=0x3c,umask=0x0,any=1,name=cpu_clk_unhalted_thread_any/
cpu_clk_unhalted.thread_p_any: Core cycles when at least one thread on the physical core is not in halt state
cpu/event=0x3c,umask=0x0,any=1,name=cpu_clk_unhalted_thread_p_any/
inst_retired.any: Instructions retired from execution.
cpu/event=0xc0,umask=0x0,name=inst_retired_any/
inst_retired.any_p: Number of instructions retired. General Counter - architectural event
cpu/event=0xc0,umask=0x0,name=inst_retired_any_p/
1.why the metric "ICache_Misses" don't be include in jkt_server_ratios.py ?
2.what is your thinking about that "ICache Misses" formula difference with vtune?
vtune, "ICache Misses" formula is event("ICACHE.MISSES") / query("InstructionsRetired")
pmu-tools, EV("ICACHE.IFETCH_STALL", 3) / CLKS(EV, 3) - ITLB_Miss_Cycles(EV, 3) / CLKS(EV, 3 )
Thank you!
On my machine (running Linux 3.19) imc uncore pmu dev names are not consecutive: i.e., instead of /sys/devices/uncore_imc_{0..3} I have /sys/devices/uncore_imc_{0,1,4,5}. But expand_events (among possibly other places in the code) assume there is no gap in naming, so I end up only two values instead of four.
As a quick/dirty workaround I modified ucexpr.py>expand_events to:
for n in range(10):
if ucevent.box_exists(...):
l.append(...)
Running perfpd doesn't seem to properly symbolize perf.data files that contain MMAP2
Hallo,
I have a little fix to propose for the process_args function in ocperf.py:
From 21b152a29f59da03769d4db33df720123218de80 Mon Sep 17 00:00:00 2001
From: Omar Awile <[email protected]>
Date: Mon, 12 Sep 2016 10:17:11 +0200
Subject: [PATCH] Pass along optional argv parameter for this case too
---
ocperf.py | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ocperf.py b/ocperf.py
index f9bc904..7b1c068 100755
--- a/ocperf.py
+++ b/ocperf.py
@@ -790,7 +790,7 @@ def process_args(emap, argv=sys.argv):
True if record == yes else False, emap)
cmd.append(prefix + event)
elif argv[i][0:2] == '-c':
- oarg, i, prefix = getarg(i, cmd)
+ oarg, i, prefix = getarg(i, cmd, argv=argv)
if oarg == "default":
if overflow is None:
print >>sys.stderr,"""
cheers!
I was able to build and install simple-pebs without issue.
This is on kernel 4.2.0-19
when I try pebs-grabber, I get an error.
dmegs says:
pebs_grabber: PEBS version 2
pebs_grabber: Cannot register kprobe: -2
I am running PMU-Tools on a Haswell i7 processor with 3.13.0-35-generic kernel (Ubuntu). I am getting some odd behavior in the output of ocperf
and toplev
.
ocperf.py stat
with the same events as toplev.py
. It seems to show that many of the counters are <not counted>
. Is this normal behavior? As I understood it, ocperf.py
shouldn't show this behavior because it uses the events directly from Intel's description of the micro-architecture on my computer.'{cycles,cpu/event=0xc2,umask=0x2/,ref-cycles,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/},{cpu/event=0xa2,umask=0x8/,cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0x9c,umask=0x1/,cpu/event=0x9c,umask=0x1,cmask=4/,cycles,instructions},{cpu/event=0xe,umask=0x1/,cycles,cpu/event=0x79,umask=0x30/,cpu/event=0xc2,umask=0x2/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xb1,umask=0x1,cmask=2/,cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/,cpu/event=0xb1,umask=0x1,cmask=3/},{cpu/event=0xa3,umask=0x6,cmask=6/,cycles,cpu/event=0xa2,umask=0x8/,cpu/event=0x5e,umask=0x1/,instructions},{cpu/event=0xab,umask=0x2/,cpu/event=0x87,umask=0x1/,cycles,cpu/event=0x79,umask=0x30,edge=1,cmask=1/,cpu/event=0x85,umask=0x10/},{cpu/event=0x80,umask=0x4/,cpu/event=0x79,umask=0x24,cmask=4/,cycles,cpu/event=0x79,umask=0x24,cmask=1/,cpu/event=0x85,umask=0x10/},{cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0x79,umask=0x18,cmask=1/,cycles,cpu/event=0xa3,umask=0xc,cmask=12/,cpu/event=0x79,umask=0x18,cmask=4/},{cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0xa3,umask=0xc,cmask=12/,cpu/event=0xa3,umask=0x5,cmask=5/,cpu/event=0xa2,umask=0x8/,cycles},{cpu/event=0xa3,umask=0x5,cmask=5/,cpu/event=0xd1,umask=0x4/,cpu/event=0xd1,umask=0x20/,cycles},{cpu/event=0xc5,umask=0x0/,cpu/event=0xe6,umask=0x1f/,cpu/event=0x5e,umask=0x1/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0x80,umask=0x4/,cycles,cpu/event=0x5e,umask=0x1,edge=1,inv=1,cmask=1/},{cpu/event=0xd1,umask=0x4/,cycles,cpu/event=0xd2,umask=0x2/,cpu/event=0x7,umask=0x1/,cpu/event=0x3,umask=0x2/},{cpu/event=0x8,umask=0x10/,cycles,cpu/event=0x8,umask=0x60/,cpu/event=0x60,umask=0x1,cmask=6/},{cpu/event=0xd2,umask=0x1/,cpu/event=0x60,umask=0x1,cmask=1/,cycles,cpu/event=0xd2,umask=0x4/,cpu/event=0x60,umask=0x1,cmask=6/},{cpu/event=0xb7,umask=0x1,offcore_rsp=0x10003c0002/,cycles,cpu/event=0xd0,umask=0x42/,cpu/event=0xd2,umask=0x4/,cpu/event=0xd0,umask=0x82/},{cpu/event=0x49,umask=0x60/,cycles,cpu/event=0x49,umask=0x10/},{cpu/event=0xd1,umask=0x8/,cpu/event=0x3,umask=0x8/,cycles,cpu/event=0x48,umask=0x1/}
When I see the output of this command, a lot of events show up as <not counted>
Here is a sample of the output -
1.196793465,<not counted>,cpu/event=0xab,umask=0x2/
1.196793465,<not counted>,cpu/event=0x87,umask=0x1/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0x79,umask=0x30,edge=1,cmask=1/
1.196793465,<not counted>,cpu/event=0x85,umask=0x10/
1.196793465,<not counted>,cpu/event=0x80,umask=0x4/
1.196793465,<not counted>,cpu/event=0x79,umask=0x24,cmask=4/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0x79,umask=0x24,cmask=1/
1.196793465,<not counted>,cpu/event=0x85,umask=0x10/
1.196793465,<not counted>,cpu/event=0xa3,umask=0x6,cmask=6/
1.196793465,<not counted>,cpu/event=0x79,umask=0x18,cmask=1/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0xa3,umask=0xc,cmask=12/
1.196793465,<not counted>,cpu/event=0x79,umask=0x18,cmask=4/
1.196793465,<not counted>,cpu/event=0xa3,umask=0x6,cmask=6/
1.196793465,<not counted>,cpu/event=0xa3,umask=0xc,cmask=12/
1.196793465,<not counted>,cpu/event=0xa3,umask=0x5,cmask=5/
1.196793465,<not counted>,cpu/event=0xa2,umask=0x8/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0xa3,umask=0x5,cmask=5/
1.196793465,<not counted>,cpu/event=0xd1,umask=0x4/
1.196793465,<not counted>,cpu/event=0xd1,umask=0x20/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0xc5,umask=0x0/
1.196793465,<not counted>,cpu/event=0xe6,umask=0x1f/
1.196793465,<not counted>,cpu/event=0x5e,umask=0x1/
1.196793465,<not counted>,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/
....
....
1.196793465,<not counted>,cpu/event=0xd1,umask=0x4/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0xd2,umask=0x2/
1.196793465,<not counted>,cpu/event=0x7,umask=0x1/
1.196793465,<not counted>,cpu/event=0x3,umask=0x2/
1.196793465,<not counted>,cpu/event=0x8,umask=0x10/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0x8,umask=0x60/
1.196793465,<not counted>,cpu/event=0x60,umask=0x1,cmask=6/
1.196793465,<not counted>,cpu/event=0xd2,umask=0x1/
1.196793465,<not counted>,cpu/event=0x60,umask=0x1,cmask=1/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0xd2,umask=0x4/
1.196793465,<not counted>,cpu/event=0x60,umask=0x1,cmask=6/
1.196793465,<not counted>,cpu/event=0xb7,umask=0x1,offcore_rsp=0x10003c0002/
1.196793465,<not counted>,cycles
1.196793465,<not counted>,cpu/event=0xd0,umask=0x42/
1.196793465,<not counted>,cpu/event=0xd2,umask=0x4/
1.196793465,<not counted>,cpu/event=0xd0,umask=0x82/
toplev.py
seems to be producing stacked bar-plots that do not sum to 100%. For example, what does it mean in the first level figure is zero, but the back-end bound metrics in level2 is non-zero.I am running PMU tools on an Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz
When I run the event_download.py
script, it tries to fecth https://download.01.org/perfmon/HSW/Haswell_core_V14.json
. It seems that causes a 404 error.
The file it should fetch seems to be https://download.01.org/perfmon/HSW/Haswell_core_V15.json
. Is there any workaround this problem?
$ python2 ~/shared/pack/pmu-tools/toplev.py -l4 --user ls
yields
Using level 4.
Nodes Data_Sharing Memory_Bound 1_Port_Utilized Split_Stores L3_Bound
2_Ports_Utilized Contested_Accesses 3m_Ports_Utilized Store_Latency
Lock_Latency L3_Hit_Latency Split_Loads Ports_Utilization Core_Bound
MEM_Bound FB_Full have errata HSM30 HSM31 HSM26, HSM30
perf stat -x\; -e '{cpu/event=0x9c,umask=0x1/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xe,umask=0x1/u,cycles:u,cpu/event=0x79,umask=0x30/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0xc5,umask=0x0/u,cpu/event=0xd,umask=0x3,cmask=1/u,instructions:u},{cpu/event=0xa2,umask=0x8/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xb1,umask=0x2,cmask=1/u,cpu/event=0xb1,umask=0x2,cmask=2/u,cpu/event=0xb1,umask=0x2,cmask=3/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cycles:u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0x5e,umask=0x1/u,instructions:u},{cpu/event=0x80,umask=0x4/u,cpu/event=0xab,umask=0x2/u,cpu/event=0xa2,umask=0x8/u,cpu/event=0x87,umask=0x1/u,cpu/event=0x14,umask=0x2/u,cpu/event=0x79,umask=0x30,edge=1,cmask=1/u,cpu/event=0xc1,umask=0x40/u,cycles:u},{cpu/event=0x79,umask=0x24,cmask=4/u,cpu/event=0xa8,umask=0x1,cmask=1/u,cpu/event=0x79,umask=0x24,cmask=1/u,cpu/event=0x85,umask=0x60/u,cpu/event=0x79,umask=0x18,cmask=1/u,cpu/event=0xa8,umask=0x1,cmask=4/u,cycles:u,cpu/event=0x79,umask=0x18,cmask=4/u,cpu/event=0x85,umask=0x10/u},{cpu/event=0xa3,umask=0xc,cmask=12/u,cpu/event=0xd1,umask=0x20/u,cpu/event=0xa3,umask=0x6,cmask=6/u,cpu/event=0xd1,umask=0x4/u,cpu/event=0xa3,umask=0x5,cmask=5/u,cycles:u},{cpu/event=0x80,umask=0x4/u,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/u,cpu/event=0xe6,umask=0x1f/u,cpu/event=0x5e,umask=0x1,edge=1,inv=1,cmask=1/u,cpu/event=0x85,umask=0x60/u,cpu/event=0xc5,umask=0x0/u,cycles:u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x85,umask=0x10/u},{cpu/event=0x60,umask=0x8,cmask=6/u,cpu/event=0x7,umask=0x1/u,cpu/event=0xb7,umask=0x1/puhu,cpu/event=0xd0,umask=0x42/u,cpu/event=0x3,umask=0x2/u,cpu/event=0xb1,umask=0x2,cmask=3/u,cycles:u,cpu/event=0xb2,umask=0x1/u},{cpu/event=0x60,umask=0x8,cmask=1/u,cpu/event=0x8,umask=0x60/u,cpu/event=0xb1,umask=0x2,cmask=2/u,cpu/event=0x60,umask=0x8,cmask=6/u,cpu/event=0xb1,umask=0x2,cmask=1/u,cpu/event=0x49,umask=0x60/u,cpu/event=0x49,umask=0x10/u,cpu/event=0x8,umask=0x10/u,cycles:u},{cpu/event=0x60,umask=0x4,cmask=1/u,cpu/event=0xc2,umask=0x2/u,cpu/event=0xb1,umask=0x2,cmask=2/u,cpu/event=0xb1,umask=0x2,cmask=3/u,cpu/event=0xd0,umask=0x82/u,cpu/event=0xc0,umask=0x2/u,cycles:u,cpu/event=0xd0,umask=0x21/u,instructions:u},{cpu/event=0x3,umask=0x8/u,cpu/event=0xd1,umask=0x8/u,cpu/event=0xd1,umask=0x40/u,cpu/event=0x9c,umask=0x1,cmask=4/u,cpu/event=0x48,umask=0x2,cmask=1/u,cycles:u,cpu/event=0xa3,umask=0x4,cmask=4/u,cpu/event=0x5e,umask=0x1/u,cpu/event=0x48,umask=0x1/u},{cpu/event=0xd3,umask=0x1/u,cpu/event=0xd1,umask=0x4/u,cpu/event=0xd2,umask=0x4/u,cpu/event=0xd3,umask=0x4/u,cpu/event=0xd2,umask=0x1/u,cpu/event=0xd1,umask=0x40/u,cpu/event=0xd2,umask=0x2/u,cpu/event=0xd1,umask=0x2/u},{cpu/event=0xd3,umask=0x10/u,cpu/event=0xd3,umask=0x20/u,cycles:u}' ls
invalid or unsupported event: '{[snip]}'
Run 'perf list' for a list of valid events
Usage: perf stat [<options>] [<command>]
-e, --event <event> event selector. use 'perf list' to list available events
The issue appears to be this event:
cpu/event=0xb7,umask=0x1/puhu
which has a duplicate u
specifier.
cc @lcw
perf record -b --call-graph dwarf -- sleep 3
python perfdata.py perf.data
Traceback (most recent call last):
File "/home/ubuntu/Source/pmu-tools/parser/perfdata.py", line 575, in
h = perf_file.parse_stream(f)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 197, in parse_stream
return self._parse(stream, Container())
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 960, in _parse
obj = self.subcon._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 287, in _parse
return self._decode(self.subcon._parse(stream, context), context)
File "/usr/lib/python2.7/dist-packages/construct/adapters.py", line 261, in _decode
return self.inner_subcon._parse(BytesIO(obj), context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 519, in _parse
obj.append(self.subcon._parse(stream, context))
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 659, in _parse
sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 840, in _parse
obj = self.cases.get(key, self.default)._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 270, in _parse
return self.subcon._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 840, in _parse
obj = self.cases.get(key, self.default)._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 661, in _parse
subobj = sc._parse(stream, context)
File "/usr/lib/python2.7/dist-packages/construct/core.py", line 430, in _parse
count = self.countfunc(context)
File "/home/ubuntu/Source/pmu-tools/parser/perfdata.py", line 127, in
Array(lambda ctx: sample_regs_user,
NameError: global name 'sample_regs_user' is not defined
Hi,
I have some troubles to run toplev on a new box so I wanted to let you know.
Here is an output, from a fresh clone of master
branch :
satin@satin-phyexp1:/tmp/pmu-tools$ python --version
Python 2.7.6
satin@satin-phyexp1:/tmp/pmu-tools$ uname -r
3.13.0-37-generic
satin@satin-phyexp1:/tmp/pmu-tools$ ./toplev.py -I 100 -l3 --title "GNU grep" --graph grep -r foo /usr/*
Using level 3.
UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC not found
satin@satin-phyexp1:/tmp/pmu-tools$ []
Traceback (most recent call last):
File "/tmp/pmu-tools//tl-barplot.py", line 185, in <module>
plt.subplot(numplots, 1, 1)
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 897, in subplot
a = fig.add_subplot(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/figure.py", line 914, in add_subplot
a = subplot_class_factory(projection_class)(self, *args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 9251, in __init__
self._subplotspec = GridSpec(rows, cols)[int(num) - 1]
File "/usr/lib/pymodules/python2.7/matplotlib/gridspec.py", line 176, in __getitem__
raise IndexError("index out of range")
IndexError: index out of range
Is that a bug in toplev ?
Obviously UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC not found
looks like a culprit.This is used in hsw_client_ratios.py
.
Or am I missing some additional perf libraries or is due to my processor not fully supported (Intel(R) Xeon(R) CPU E3-1246 v3 @ 3.50GHz) ?
Thanks in advance for any hints...
PERF=perf315 ./tester works
PERF=perf316 ./tester
does not read any samples and fails
Example:
$ ocperf.py record --event offcore_response.all_reads.l3_hit.hitm_other_core sleep 1
$ perf evlist
offcore_response_all_reads_l3_hit_hitm_other_core
So, ocperf.py event name contains dots, but perf's event name contains only underscore.
It confuses tools which uses perf and doesn't let to use ocperf.py as perf's wrapper.
I believe there are few solutions:
Hi,
I just tried pmu-tools/ocerf.py on a haswell box:
$ ./ocperf.py
Traceback (most recent call last):
File "./ocperf.py", line 774, in <module>
emap = find_emap()
File "./ocperf.py", line 599, in find_emap
emap = json_with_extra(el)
File "./ocperf.py", line 557, in json_with_extra
add_extra_env(emap, el)
File "./ocperf.py", line 574, in add_extra_env
emap.add_uncore(uc)
File "./ocperf.py", line 551, in add_uncore
self.uncore_events[name] = UncoreEvent(name, row)
File "./ocperf.py", line 241, in __init__
e.desc = row['Description'].strip()
KeyError: 'Description'
it's trying to open ${HOME}/.cache//pmu-events/GenuineIntel-6-3F-uncore.json
which does not contain this field Description
at any point.
Any ideas on how to proceed?
Best -
$ uname -a
Linux islay.mpi-cbg.de 3.10.0-229.14.1.el7.x86_64 #1 SMP Tue Sep 15 15:05:51 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch:cxx-4.1-amd64:cxx-4.1-noarch:desktop-4.1-amd64:desktop-4.1-noarch:languages-4.1-amd64:languages-4.1-noarch:printing-4.1-amd64:printing-4.1-noarch
Distributor ID: CentOS
Description: CentOS Linux release 7.1.1503 (Core)
Release: 7.1.1503
Codename: Core
$ cat /proc/cpuinfo|grep -i "name"|head -n1
model name : Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz
Hi Andi,
I'm trying to measure the unhalted cycles on a core basis. I therefore used ocperf and selected the above mentioned event.
However, what I'm getting from ocperf is sort of weird. I was expecting to get the same values for the two threads that share the same core, however this not seem to be true.
$ sudo ./ocperf.py stat -e cpu_clk_unhalted.thread_any -a -A sleep 5
perf stat -e cpu/event=0x3c,umask=0x0,any=1,name=cpu_clk_unhalted_thread_any/ -a -A sleep 5
Performance counter stats for 'system wide':
CPU0 627.912.025 cpu_clk_unhalted_thread_any
CPU1 627.248.055 cpu_clk_unhalted_thread_any
CPU2 529.161.153 cpu_clk_unhalted_thread_any
CPU3 812.752.677 cpu_clk_unhalted_thread_any
5,001079353 seconds time elapsed
Any hint at what might be the culprit here?
Some info on my system:
OS: Ubuntu 14.04
CPU : Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
Thank you!
on a Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
Downloading https://download.01.org/perfmon/mapfile.csv to mapfile.csv
Downloading https://download.01.org/perfmon/BDX/BroadwellX_core_V10.json to GenuineIntel-6-4F-core.json
Cannot access event server: HTTP Error 404: Not Found
still
ocperf.py list does produce a reasonable list of events supported on Broadwell
and
ocperf.py stat works with my "standard" list of Broadwell events...
The Xeon D-1540 appears to have a problem where only 4 of the 8 perf counters per core actually count, whereas the other 4 remain zero (with hyperthreading disabled). I experienced this issue and then saw that other people have had the same issue: https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/560536
I don't know if this affects other processor in that family, but it obviously ends up giving bogus pmu-tools results for this family of processors when hyperthreading is disabled. Might want to check for that particular processor and then limit the number of counters per perf set to only 4.
Note that in addition to only 4 out of 8 counters available, the LLC counter values also have their own set of problems as described at that page (also confirmed with my CPU). There's actually a lot of counter-related problems with this processor...
http://www.intel.com/content/www/us/en/processors/xeon/xeon-d-1500-specification-update.html
Hi,
ocperf
fails on my machine:
$ ocperf.py stat -e arith.div:k
Downloading https://download.01.org/perfmon/mapfile.csv to mapfile.csv
Downloading https://download.01.org/perfmon/HSW/Haswell_core_V15.json to GenuineIntel-6-3C-core.json
Downloading https://download.01.org/perfmon/HSW/Haswell_matrix_bit_definitions_V15.json to GenuineIntel-6-3C-offcore.json
Downloading https://download.01.org/perfmon/readme.txt to readme.txt
Traceback (most recent call last):
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 690, in <module>
emap = find_emap()
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 536, in find_emap
return json_with_extra(el)
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 482, in json_with_extra
add_extra_env(emap, el)
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 492, in add_extra_env
emap.add_offcore(oc)
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 452, in add_offcore
if row[u"MATRIX_REQUEST"].upper() != "NULL":
KeyError: u'MATRIX_REQUEST'
Same for ocperf.py list
:
$ ocperf.py list
Traceback (most recent call last):
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 690, in <module>
emap = find_emap()
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 524, in find_emap
emap = json_with_extra(el)
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 482, in json_with_extra
add_extra_env(emap, el)
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 492, in add_extra_env
emap.add_offcore(oc)
File "/home/tgrabiec/src/pmu-tools/ocperf.py", line 452, in add_offcore
if row[u"MATRIX_REQUEST"].upper() != "NULL":
KeyError: u'MATRIX_REQUEST'
The CPU is Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
Hi community!
I am using perf as :sudo perf stat -e r00c0,r01c0,r01c0:p,r01c0:pp sleep 1
but the result is
Performance counter stats for 'sleep 1':
464,229 r00c0
464,229 r01c0
<not counted> r01c0:p
<not counted> r01c0:pp
1.001639901 seconds time elapsed
Whys is PEBS not counted here?
Sorry to ask my question here...
Please help me on this.
saw this with a76c89a
$ ~/software/pmu-tools/repo/toplev.py -l1 sleep 10
Using level 1.
perf stat -x, -e 'task-clock,{cpu/event=0xc2,umask=0x2/,cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cpu/event=0x9c,umask=0x1/,cycles}' sleep 10
Traceback (most recent call last):
File "/home/steinbac/software/pmu-tools/repo/toplev.py", line 1748, in <module>
ret = execute(runner, out, rest)
File "/home/steinbac/software/pmu-tools/repo/toplev.py", line 960, in execute
print_keys(runner, res, rev, valstats, out, interval, env)
File "/home/steinbac/software/pmu-tools/repo/toplev.py", line 885, in print_keys
cores = [key_to_coreid(x) for x in res.keys() if int(x) in runner.allowed_threads]
ValueError: invalid literal for int() with base 10: ''
Hi there. First of all, thank you for the tools. I've learned a lot about how to use perf just by looking at how the pmu-tools do it.
I'm seeing this strange output in commit d70840b, using command line options ../pmu-tools/toplev.py --verbose --no-multiplex -l3 --single-thread -- ./myprogram
I consistently get this this printed output whose % is > 100 on a particular test program I am running.
BE Backend_Bound: 82.25 % [100.00%]
BE/Mem Backend_Bound.Memory_Bound: 57.41 % [100.00%]
BE/Mem Backend_Bound.Memory_Bound.L1_Bound: 5.58 % [100.00%]
This metric estimates how often the CPU was stalled without
loads missing the L1 data cache...
Sampling events: mem_load_retired.l1_hit:pp mem_load_retired.fb_hit:pp
BE/Mem Backend_Bound.Memory_Bound.L1_Bound.DTLB_Load: _ 196.05 %below _ [100.00%]
This metric represents cycles fraction where the TLB was
missed by load instructions...
Sampling events: mem_inst_retired.stlb_miss_loads:p
Hi, I want to get the level 2 matrics and some level 3 metrics using the tool "toplev".
I want to confirm about which cpus framework the "toplev" tools now supports on ? SNB, IVB, HSW, BDW ?
[tgrabiec@muninn ~]$ toplev.py -C 0 sleep 2 --level 2
Using level 2.
perf stat -x, -e '{cycles,cpu/event=0xc2,umask=0x2/,ref-cycles,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/},{cpu/event=0xa2,umask=0x8/,cpu/event=0xa3,umask=0x6,cmask=6/,cpu/event=0x9c,umask=0x1/,cpu/event=0x9c,umask=0x1,cmask=4/,cycles,instructions},{cpu/event=0xe,umask=0x1/,cycles,cpu/event=0x79,umask=0x30/,cpu/event=0xc2,umask=0x2/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xe,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/,cycles,cpu/event=0xc2,umask=0x2/},{cpu/event=0xc5,umask=0x0/,cpu/event=0xc3,umask=0x1,edge=1,cmask=1/},{cpu/event=0xb1,umask=0x1,cmask=2/,cpu/event=0xa3,umask=0x4,cmask=4/,cpu/event=0xb1,umask=0x1,cmask=1/,cpu/event=0xb1,umask=0x1,cmask=3/},{cpu/event=0xa3,umask=0x6,cmask=6/,cycles,cpu/event=0xa2,umask=0x8/,cpu/event=0x5e,umask=0x1/,instructions}' --cpu 0 sleep 2
Traceback (most recent call last):
File "/home/tgrabiec/src/pmu-tools/toplev.py", line 950, in <module>
ret = execute(runner, out, rest)
File "/home/tgrabiec/src/pmu-tools/toplev.py", line 511, in execute
runner.print_res(res[j], rev[j], out, interval, j, env)
File "/home/tgrabiec/src/pmu-tools/toplev.py", line 806, in print_res
obj.compute(lambda e, level:
File "/home/tgrabiec/src/pmu-tools/hsw_client_ratios.py", line 713, in compute
self.val = BackendBoundAtEXE(EV, 2)- self.MemoryBound.compute(EV )
File "/home/tgrabiec/src/pmu-tools/hsw_client_ratios.py", line 30, in BackendBoundAtEXE
return BackendBoundAtEXE_stalls(EV, level) / CLKS(EV, level)
File "/home/tgrabiec/src/pmu-tools/hsw_client_ratios.py", line 28, in BackendBoundAtEXE_stalls
return ( EV("CYCLE_ACTIVITY.CYCLES_NO_EXECUTE", level) + EV("UOPS_EXECUTED.CYCLES_GE_1_UOPS_EXEC", level) - FewUopsExecutedThreshold(EV, level) - EV("RS_EVENTS.EMPTY_CYCLES", level) + EV("RESOURCE_STALLS.SB", level) )
File "/home/tgrabiec/src/pmu-tools/toplev.py", line 807, in <lambda>
lookup_res(res, rev, e, obj, env, level))
File "/home/tgrabiec/src/pmu-tools/toplev.py", line 631, in lookup_res
assert event_rmap(rev[index]) == canon_event(ev)
AssertionError
--level 1
seems to work:
[tgrabiec@muninn ~]$ toplev.py -C 0 sleep 2 --level 1
WARNING: HT enabled
Measuring multiple processes/threads on the same core may is not reliable.
Using level 1.
perf stat -x, -e '{cycles,cpu/event=0xc2,umask=0x2/,ref-cycles,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1/}' --cpu 0 sleep 2
Backend Bound: 49.06%
This category reflects slots where no uops are being delivered due to a lack
of required resources for accepting more uops in the Backend of the pipeline.
Frequency: 1.12 metric
Frequency in Ghz
Should be different events.
1 level :toplev.py sleep 60
Using level 1.
perf stat -x, -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,cmask=1,any=1/,cpu/event=0xc2,umask=0x2/}' -A -a sleep 60
S0-C0 FE Frontend_Bound: 35.21%
S0-C0 BE Backend_Bound: 50.81% //maybe the bound.
S0-C1 FE Frontend_Bound: 34.63%
S0-C1 BE Backend_Bound: 43.94%
.....
2 level:toplev.py -l2 sleep 60
S0-C0 FE Frontend_Bound: 32.92%
S0-C0 FE Frontend_Bound.Frontend_Latency: 27.60%
S0-C0 BE Backend_Bound: 52.92%
S0-C1 FE Frontend_Bound: 36.04%
S0-C1 FE Frontend_Bound.Frontend_Latency: 29.17%
......
S0-C0-T1BE/Mem Backend_Bound.Memory_Bound: 0.00% mismeasured
look,we cannot found the S0-C0 BE 's sub item,such as Frontend_Bound.Frontend_Latency:.
?
my kernel is ubuntu 3.16.0-31-generic.
The new perf stat csv output (https://lwn.net/Articles/653941/) breaks ucevent.py.
The assertion on line 601 of ucevent.py (assert evp[0] == j
) fails because measure()
includes the new stats printed after the event name as part of the event name: e.g., in a sample run evp[0]
is 'uncore_imc_0/event=0x4,umask=0x3/,103357003,10.39'
rather than 'uncore_imc_0/event=0x4,umask=0x3/'
.
def CORE_CLKS(EV, level):
return (EV("CPU_CLK_UNHALTED.THREAD:amt1", level) / 2) if smt_enabled else CLKS(EV, level)
Thank you! :)
Hi,
From a fresh checkout from master:
File "pmu-tools/toplev.py", line 147
e = e[:e.find(":")]
^
TabError: inconsistent use of tabs and spaces in indentation
And indeed, sometimes there are tabs, and sometimes spaces, and Python 3 doesn't like it.
I noticed that level 3 stats printed for memory bound workloads are incorrect on my machine (Xeon E5-2658 v3, Linux 3.19). Here is a sample output with a program that is DRAM bound (Intel MLC):
BE Backend_Bound: 90.68%
BE/Mem Backend_Bound.Memory_Bound: 84.30%
BE/Mem Backend_Bound.Memory_Bound.L1_Bound: 84.35%
BE/Mem Backend_Bound.Memory_Bound.L3_Bound: 22.48%
BE/Mem Backend_Bound.Memory_Bound.MEM_Bound: 61.69%
L1_Bound value is incorrect. I traced the issue to perf always reporting zero for CYCLE_ACTIVITY.STALLS_L1D_PENDING. Here is a sample perf output for that event:
perf stat -I 1000 -e cpu/event=0xa3,umask=0xc,cmask=12/ -a sleep 5
# time counts unit events
1.000206434 0 cpu/event=0xa3,umask=0xc,cmask=12/
2.000452095 0 cpu/event=0xa3,umask=0xc,cmask=12/
3.000657316 0 cpu/event=0xa3,umask=0xc,cmask=12/
4.000875653 0 cpu/event=0xa3,umask=0xc,cmask=12/
5.001068298 0 cpu/event=0xa3,umask=0xc,cmask=12/
With cmask=4, a value that seems correct is returned. I double checked SDM Vol3b and it seems that cmask value of 12 (0xc) should be correct. I understand this is not directly a pmu-tools bug, but was hoping to hear back if others are affected too.
I'm running pmu-tools on Intel Xeon E5-2660 (Sandy Bridge). ocperf.py runs fine, but toplev.py always gives me an error "IndexError: list index out of range".
Traceback (most recent call last):
File "./pmu-tools/toplev.py", line 765, in
sys.exit(execute(runner.evnum, runner, out, rest))
File "./pmu-tools/toplev.py", line 461, in execute
runner.print_res(res[j], rev[j], out, interval, j)
File "./pmu-tools/toplev.py", line 654, in print_res
obj.compute(lambda e, level:
File "/home/fei/pmu-tools/simple_ratios.py", line 36, in compute
self.val = EV("IDQ_UOPS_NOT_DELIVERED.CORE", 1) / SLOTS(EV)
File "./pmu-tools/toplev.py", line 655, in
lookup_res(res, rev, e, obj.res_map[(e, level)]))
File "./pmu-tools/toplev.py", line 482, in lookup_res
return res[index]
IndexError: list index out of range
This fails:
$ ./ocperf.py record --event mem_load_uops_retired.l1_hit echo 1
perf record --event mem_load_uops_retired.l1_hit echo 1
event syntax error: 'mem_load_uops_retired.l1_hit'
\___ parser error
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Passes:
$ ocperf.py record -e mem_load_uops_retired.l1_hit echo 1
The parsing code in ocperf.py does not handle long "--event" properly, apparently:
elif sys.argv[i][0:2] == '-e': # <--- oops, this is not for "--event"
event, i, prefix = getarg(i, cmd)
event, overflow = process_events(event, print_only,
True if record == yes else False)
cmd.append(prefix + event)
Everything below was run as root.
# toplev.py -l1 sleep 10
Will measure complete system.
Using level 1.
perf stat -x\; -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,any=1,cmask=1/,cpu/event=0xc2,umask=0x2/}' -A -a sleep 10
Traceback (most recent call last):
File "/media/dc/B2B200EFB200B9BD/inz/pmu-tools/toplev.py", line 1617, in <module>
ret = execute(runner, out, rest)
File "/media/dc/B2B200EFB200B9BD/inz/pmu-tools/toplev.py", line 792, in execute
env)
File "/media/dc/B2B200EFB200B9BD/inz/pmu-tools/toplev.py", line 907, in do_execute
multiplex = float(n[off + 1])
ValueError: invalid literal for float(): 100,00
Below is the result of perf stat ...
:
# perf stat -x\; -e '{cpu/event=0x3c,umask=0x0,any=1/,cpu/event=0xe,umask=0x1/,cpu/event=0x9c,umask=0x1/,cpu/event=0xd,umask=0x3,any=1,cmask=1/,cpu/event=0xc2,umask=0x2/}' -A -a sleep 10
CPU0;2241793414;;cpu/event=0x3c,umask=0x0,any=1/;10007127837;100,00
CPU1;2241051974;;cpu/event=0x3c,umask=0x0,any=1/;10007126109;100,00
CPU2;798878574;;cpu/event=0x3c,umask=0x0,any=1/;10007122595;100,00
CPU3;798025029;;cpu/event=0x3c,umask=0x0,any=1/;10007121927;100,00
CPU4;1479869080;;cpu/event=0x3c,umask=0x0,any=1/;10007136940;100,00
CPU5;1479260102;;cpu/event=0x3c,umask=0x0,any=1/;10007135470;100,00
CPU6;1637764499;;cpu/event=0x3c,umask=0x0,any=1/;10007133938;100,00
CPU7;1637043424;;cpu/event=0x3c,umask=0x0,any=1/;10007132916;100,00
CPU0;1778359179;;cpu/event=0xe,umask=0x1/;10007225730;100,00
CPU1;372610005;;cpu/event=0xe,umask=0x1/;10007224789;100,00
CPU2;423892267;;cpu/event=0xe,umask=0x1/;10007221503;100,00
CPU3;159917631;;cpu/event=0xe,umask=0x1/;10007219288;100,00
CPU4;457584393;;cpu/event=0xe,umask=0x1/;10007232528;100,00
CPU5;741543029;;cpu/event=0xe,umask=0x1/;10007230406;100,00
CPU6;1260524783;;cpu/event=0xe,umask=0x1/;10007228798;100,00
CPU7;402408452;;cpu/event=0xe,umask=0x1/;10007227198;100,00
CPU0;3625922836;;cpu/event=0x9c,umask=0x1/;10007284308;100,00
CPU1;153504280;;cpu/event=0x9c,umask=0x1/;10007281630;100,00
CPU2;1325774321;;cpu/event=0x9c,umask=0x1/;10007277765;100,00
CPU3;74342815;;cpu/event=0x9c,umask=0x1/;10007275369;100,00
CPU4;1632602740;;cpu/event=0x9c,umask=0x1/;10007287236;100,00
CPU5;268262892;;cpu/event=0x9c,umask=0x1/;10007284804;100,00
CPU6;2650705954;;cpu/event=0x9c,umask=0x1/;10007284336;100,00
CPU7;154401725;;cpu/event=0x9c,umask=0x1/;10007282072;100,00
CPU0;61193398;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007317348;100,00
CPU1;61193275;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007314139;100,00
CPU2;22095380;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007310171;100,00
CPU3;22095271;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007307061;100,00
CPU4;32942258;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007305560;100,00
CPU5;32942366;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007302497;100,00
CPU6;46513674;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007288378;100,00
CPU7;46513732;;cpu/event=0xd,umask=0x3,any=1,cmask=1/;10007284929;100,00
CPU0;1471503714;;cpu/event=0xc2,umask=0x2/;10007304010;100,00
CPU1;340643421;;cpu/event=0xc2,umask=0x2/;10007300601;100,00
CPU2;357872568;;cpu/event=0xc2,umask=0x2/;10007296155;100,00
CPU3;126618075;;cpu/event=0xc2,umask=0x2/;10007292269;100,00
CPU4;397747237;;cpu/event=0xc2,umask=0x2/;10007290478;100,00
CPU5;628580576;;cpu/event=0xc2,umask=0x2/;10007286803;100,00
CPU6;1041831779;;cpu/event=0xc2,umask=0x2/;10007272537;100,00
CPU7;355401725;;cpu/event=0xc2,umask=0x2/;10007268529;100,00
I am not sure what's the case, but maybe locale?
# locale
LANG=pl_PL.utf8
LANGUAGE=en_US
LC_CTYPE="pl_PL.utf8"
LC_NUMERIC="pl_PL.utf8"
LC_TIME="pl_PL.utf8"
LC_COLLATE="pl_PL.utf8"
LC_MONETARY="pl_PL.utf8"
LC_MESSAGES="pl_PL.utf8"
LC_PAPER="pl_PL.utf8"
LC_NAME="pl_PL.utf8"
LC_ADDRESS="pl_PL.utf8"
LC_TELEPHONE="pl_PL.utf8"
LC_MEASUREMENT="pl_PL.utf8"
LC_IDENTIFICATION="pl_PL.utf8"
LC_ALL=pl_PL.utf8
The /usr/bin/python
version (shouldn't you use /usr/bin/env python
instead?):
Python 2.7.10 (default, Oct 14 2015, 16:09:02)
[GCC 5.2.1 20151010] on linux2
Probably the issue can be solved setting locale in Python to the system one:
Python 2.7.10 (default, Oct 14 2015, 16:09:02)
Type "copyright", "credits" or "license" for more information.
IPython 2.3.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import locale
In [2]: locale.getdefaultlocale()
Out[2]: ('pl_PL', 'UTF-8')
In [3]: locale.atof("23.3")
Out[3]: 23.3
In [4]: locale.atof("23,3")
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-132b6afaec24> in <module>()
----> 1 locale.atof("23,3")
/usr/lib/python2.7/locale.pyc in atof(string, func)
314 string = string.replace(dd, '.')
315 #finally, parse the string
--> 316 return func(string)
317
318 def atoi(str):
ValueError: invalid literal for float(): 23,3
In [5]: locale.setlocale(locale.LC_ALL, '.'.join(locale.getdefaultlocale())
...: )
Out[5]: 'pl_PL.UTF-8'
In [6]: locale.atof("23,3")
Out[6]: 23.3
In [7]: locale.atof("23.3")
Out[7]: 23.3
So in the end it seems that locale.atof
should be used instead of float
when casting str
to float
.
gen-dot.py can't work with latest ratios files, because Runner instance has no attribute 'metric' and 'parent':
Traceback (most recent call last):
File "./gen-dot.py", line 45, in <module>
m.Setup(runner)
File "/home/yefeng/pmu-tools-master/ivb_client_ratios.py", line 1604, in __init__
n = Metric_IPC() ; r.metric(n)
AttributeError: Runner instance has no attribute 'metric'
and
Traceback (most recent call last):
File "./gen-dot.py", line 48, in <module>
runner.fix_parents()
File "./gen-dot.py", line 32, in fix_parents
if not obj.parent:
AttributeError: Frontend_Bound instance has no attribute 'parent'
I think runner.fix_parents()
is not need, and modfied runner.finish()
, it works
class Runner:
def finish(self):
for n in self.olist:
if n.level > 1:
print '"%s" -> "%s";' % (n.parent.name, n.name)
else:
print '"%s";' % (n.name)
def metric(self, n):
pass
runner = Runner()
m.Setup(runner)
print >>sys.stderr, runner.olist
#runner.fix_parents()
print "digraph {"
print "fontname=\"Courier\";"
runner.finish()
print "}"
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.