Comments (3)
Can you give the full toplev command line?
The PEBS events don't necessarily count in perf stat.
The problem seems to be this perf error message:
WARNING: A requested CPU in '0' is not supported by PMU 'cpu_atom' (CPUs 8-23) for event 'cycles:pp'
cycles:pp should really work so that's some kind of upstream perf bug.
Does a plain perf record -e cycles:pp ./run.sh
export HYPERVISOR=1 should work around it (will disable PEBS, but also some other features)
from pmu-tools.
Thanks for the reply.
Can you give the full toplev command line?
~/pmu-tools/toplev.py --core S0-C0 -l3 --run-sample --no-desc taskset -c 0 ./run.sh
Playing a bit more with it:
~/pmu-tools/toplev.py --core S0-C0 -l3 --run-sample --no-desc taskset -c 0 sleep 1
# 4.7-full on Intel(R) Core(TM) i9-14900K [adl]
core FE Frontend_Bound % Slots 43.4 [11.0%]
core BE Backend_Bound % Slots 31.8 [22.0%]
core FE Frontend_Bound.Fetch_Latency % Slots 34.7 [22.0%]
core BAD Bad_Speculation.Machine_Clears % Slots 0.8 [11.0%]
core BE/Core Backend_Bound.Core_Bound % Slots 21.2 [22.0%]
core FE Frontend_Bound.Fetch_Latency.ICache_Misses % Clocks 17.1 [22.0%]
core FE Frontend_Bound.Fetch_Latency.ITLB_Misses % Clocks 6.5 [22.0%]
core FE Frontend_Bound.Fetch_Latency.Branch_Resteers % Clocks 32.4 [11.0%]<==
core FE Frontend_Bound.Fetch_Latency.MS_Switches % Clocks_est 11.5 [11.0%]
core BE/Mem Backend_Bound.Memory_Bound.L1_Bound % Stalls 5.9 [11.0%]
core BE/Core Backend_Bound.Core_Bound.Serializing_Operation % Clocks 35.1 [11.0%]
core BE/Core Backend_Bound.Core_Bound.Ports_Utilization % Clocks 45.2 [11.0%]
core RET Retiring.Heavy_Operations.Microcode_Sequencer % Slots 2.7 [11.0%]
core MUX % 11.00
Run toplev --describe Branch_Resteers^ to get more information on bottleneck for core
Add --nodes '!+Branch_Resteers*/4,+Frontend_Bound.Fetch_Latency,+Frontend_Bound,+MUX' for breakdown.
Sampling:
perf record -g -e cpu_core/event=0xc5,umask=0x0,name=Branch_Resteers_BR_MISP_RETIRED_ALL_BRANCHES,period=400009/,cpu_core/event=0xc6,umask=0x1,frontend=0x14,name=ITLB_Misses_FRONTEND_RETIRED_ITLB_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x12,name=ICache_Misses_FRONTEND_RETIRED_L1I_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x13,name=ICache_Misses_FRONTEND_RETIRED_L2_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x601006,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_16,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600406,name=Frontend_Bound_FRONTEND_RETIRED_LATENCY_GE_4,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600806,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_8,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x8,name=MS_Switches_FRONTEND_RETIRED_MS_FLOWS,period=100007/,cpu_core/event=0xc6,umask=0x1,frontend=0x15,name=ITLB_Misses_FRONTEND_RETIRED_STLB_MISS,period=100007/pp,cpu_core/event=0xc3,umask=0x1,edge=1,cmask=1,name=Machine_Clears_MACHINE_CLEARS_COUNT,period=100003/,cpu_core/event=0xd1,umask=0x40,name=L1_Bound_MEM_LOAD_RETIRED_FB_HIT,period=100007/pp,cpu_core/event=0xd1,umask=0x1,name=L1_Bound_MEM_LOAD_RETIRED_L1_HIT,period=1000003/pp,cpu_core/event=0xa2,umask=0x2,name=Serializing_Operation_RESOURCE_STALLS_SCOREBOARD,period=100003/,cpu_core/event=0xa4,umask=0x2,name=Backend_Bound_TOPDOWN_BACKEND_BOUND_SLOTS,period=10000003/,cpu_core/event=0xc2,umask=0x4,frontend=0x8,name=Microcode_Sequencer_UOPS_RETIRED_MS,period=2000003/,cycles:pp -o perf.data -C 0 taskset -c 0 sleep 1
WARNING: A requested CPU in '0' is not supported by PMU 'cpu_atom' (CPUs 8-23) for event 'cycles:pp'
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_atom/cycles:pp/).
/bin/dmesg | grep -i perf may provide additional information.
Sampling failed
but if I remove the --core S0-C0
part it works:
~/pmu-tools/toplev.py -l3 --run-sample --no-desc taskset -c 0 sleep 1
70 events not counted
# 4.7-full, 3.51 on Intel(R) Core(TM) i9-14900K [adl]
core FE Frontend_Bound % Slots 42.0
core BE Backend_Bound % Slots 26.3 [28.0%]
core FE Frontend_Bound.Fetch_Latency % Slots 28.5 [28.0%]
core BE/Core Backend_Bound.Core_Bound % Slots 15.3 [28.0%]
core FE Frontend_Bound.Fetch_Latency.ICache_Misses % Clocks 12.6 [75.0%]<==
core FE Frontend_Bound.Fetch_Latency.ITLB_Misses % Clocks 6.0 [75.0%]
warning: 16 nodes had zero counts: Branch_Resteers DRAM_Bound DSB DSB_Switches Divider L1_Bound L2_Bound L3_Bound LSD MITE MS_Switches Other_Mispredicts Other_Nukes Ports_Utilization Serializing_Operation Store_Bound
atom FE Frontend_Bound % Slots 34.0 [28.0%]<==
atom FE Frontend_Bound.Fetch_Latency % Slots 16.5 [28.0%]
atom FE Frontend_Bound.Fetch_Bandwidth % Slots 17.5 [28.0%]
atom BAD Bad_Speculation % Slots 18.0 [28.0%]
atom BAD Bad_Speculation.Branch_Mispredicts % Slots 17.4 [28.0%]
warning: 22 nodes had zero counts: Base Branch_Detect Branch_Resteer Cisc DRAM_Bound Decode FPDIV_uops Fast_Nuke ICache_Misses ITLB_Misses L1_Bound L2_Bound L3_Bound MS_uops Machine_Clears Mem_Scheduler Memory_Bound Nuke Other_FB Other_Ret Predecode Store_Bound
Run toplev --describe ICache_Misses^ to get more information on bottleneck for core
Run toplev --describe Frontend_Bound^ to get more information on bottleneck for atom
Add --nodes '!+Frontend_Bound*/2,+MUX' for breakdown.
Sampling:
perf record -g -e cpu_core/event=0xc6,umask=0x1,frontend=0x14,name=ITLB_Misses_FRONTEND_RETIRED_ITLB_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x12,name=ICache_Misses_FRONTEND_RETIRED_L1I_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x13,name=ICache_Misses_FRONTEND_RETIRED_L2_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x601006,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_16,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600406,name=Frontend_Bound_FRONTEND_RETIRED_LATENCY_GE_4,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600806,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_8,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x15,name=ITLB_Misses_FRONTEND_RETIRED_STLB_MISS,period=100007/pp,cpu_core/event=0xa4,umask=0x2,name=Backend_Bound_TOPDOWN_BACKEND_BOUND_SLOTS,period=10000003/,cycles:pp -o perf.data taskset -c 0 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (7 samples) ]
Run `perf report' to show the sampling results
Sampling:
perf record -g -e cycles:pp -o perf.data taskset -c 0 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (15 samples) ]
Run `perf report' to show the sampling results
(~/pmu-tools/toplev.py --core S0-C0 -l3 --run-sample --no-desc sleep 1
also fails)
Does a plain perf record -e cycles:pp ./run.sh
Yes, it does. E.g.:
perf stat -e cycles:pp sleep 1
Performance counter stats for 'sleep 1':
<not supported> cpu_core/cycles:pp/
<not supported> cpu_atom/cycles:pp/
1.003134906 seconds time elapsed
0.002847000 seconds user
0.000000000 seconds sys
perf record -e cycles:pp sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (7 samples) ]
perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 7 of event 'cpu_atom/cycles:pp/'
# Event count (approx.): 5357041
#
# Overhead Command Shared Object Symbol
# ........ ....... ................. .................................
#
89.11% sleep [kernel.kallsyms] [k] __get_user_8
10.59% sleep [kernel.kallsyms] [k] tlb_gather_mmu
0.29% perf-ex [kernel.kallsyms] [k] nmi_restore
0.01% perf-ex [kernel.kallsyms] [k] __intel_pmu_enable_all.isra.0
0.00% perf-ex [kernel.kallsyms] [k] native_write_msr
#
# (Tip: To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node`)
from pmu-tools.
but if I remove the --core S0-C0 part it works:
my guess is that the difference boils down to passing a -C 0
to perf
. Without it everything seems to work fine.
from pmu-tools.
Related Issues (20)
- toplev: Make defaults less verbose
- toplev: Add simple options for standard metrics HOT 1
- toplev: Info_Bottlenecks reports negative Scaled_Slots on SKX HOT 4
- toplev: Add option to only show bottleneck HOT 2
- toplev: Fix misaligned values in columns when area is too long
- tl-tester sometimes has comparison failures
- toplev does not print multiplex information for all metrics
- toplev: Compute minimum column widths for numbers / units
- toplev should print % for Scaled_Slots
- toplev should support filtering metrics by their threshold HOT 1
- ADL 100% Machine_Clears
- toplev should accumulate and report running time
- Add support for Meteor Lake HOT 1
- Test suite should cover both models for hybrid targets HOT 1
- toplev add option to only collect bottlenecks, not L1/L2
- MTL support misses Info.Bottlenecks HOT 1
- Trunk version of toplev regressed in generation of valid groups HOT 1
- CLTRAMP3D workload is not available
- Incorrect event for IpCall metric
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pmu-tools.