Git Product home page Git Product logo

Comments (3)

andikleen avatar andikleen commented on July 21, 2024

Can you give the full toplev command line?

The PEBS events don't necessarily count in perf stat.

The problem seems to be this perf error message:
WARNING: A requested CPU in '0' is not supported by PMU 'cpu_atom' (CPUs 8-23) for event 'cycles:pp'

cycles:pp should really work so that's some kind of upstream perf bug.
Does a plain perf record -e cycles:pp ./run.sh

export HYPERVISOR=1 should work around it (will disable PEBS, but also some other features)

from pmu-tools.

thetheodor avatar thetheodor commented on July 21, 2024

Thanks for the reply.

Can you give the full toplev command line?

~/pmu-tools/toplev.py --core S0-C0 -l3 --run-sample --no-desc taskset -c 0 ./run.sh

Playing a bit more with it:

~/pmu-tools/toplev.py --core S0-C0 -l3 --run-sample --no-desc taskset -c 0 sleep 1
# 4.7-full on Intel(R) Core(TM) i9-14900K [adl]
core FE               Frontend_Bound                                  % Slots                       43.4   [11.0%]
core BE               Backend_Bound                                   % Slots                       31.8   [22.0%]
core FE               Frontend_Bound.Fetch_Latency                    % Slots                       34.7   [22.0%]
core BAD              Bad_Speculation.Machine_Clears                  % Slots                        0.8   [11.0%]
core BE/Core          Backend_Bound.Core_Bound                        % Slots                       21.2   [22.0%]
core FE               Frontend_Bound.Fetch_Latency.ICache_Misses      % Clocks                      17.1   [22.0%]
core FE               Frontend_Bound.Fetch_Latency.ITLB_Misses        % Clocks                       6.5   [22.0%]
core FE               Frontend_Bound.Fetch_Latency.Branch_Resteers    % Clocks                      32.4   [11.0%]<==
core FE               Frontend_Bound.Fetch_Latency.MS_Switches        % Clocks_est                  11.5   [11.0%]
core BE/Mem           Backend_Bound.Memory_Bound.L1_Bound             % Stalls                       5.9   [11.0%]
core BE/Core          Backend_Bound.Core_Bound.Serializing_Operation  % Clocks                      35.1   [11.0%]
core BE/Core          Backend_Bound.Core_Bound.Ports_Utilization      % Clocks                      45.2   [11.0%]
core RET              Retiring.Heavy_Operations.Microcode_Sequencer   % Slots                        2.7   [11.0%]
core MUX                                                              %                             11.00
Run toplev --describe Branch_Resteers^ to get more information on bottleneck for core
Add --nodes '!+Branch_Resteers*/4,+Frontend_Bound.Fetch_Latency,+Frontend_Bound,+MUX' for breakdown.
Sampling:
perf record -g -e cpu_core/event=0xc5,umask=0x0,name=Branch_Resteers_BR_MISP_RETIRED_ALL_BRANCHES,period=400009/,cpu_core/event=0xc6,umask=0x1,frontend=0x14,name=ITLB_Misses_FRONTEND_RETIRED_ITLB_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x12,name=ICache_Misses_FRONTEND_RETIRED_L1I_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x13,name=ICache_Misses_FRONTEND_RETIRED_L2_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x601006,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_16,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600406,name=Frontend_Bound_FRONTEND_RETIRED_LATENCY_GE_4,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600806,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_8,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x8,name=MS_Switches_FRONTEND_RETIRED_MS_FLOWS,period=100007/,cpu_core/event=0xc6,umask=0x1,frontend=0x15,name=ITLB_Misses_FRONTEND_RETIRED_STLB_MISS,period=100007/pp,cpu_core/event=0xc3,umask=0x1,edge=1,cmask=1,name=Machine_Clears_MACHINE_CLEARS_COUNT,period=100003/,cpu_core/event=0xd1,umask=0x40,name=L1_Bound_MEM_LOAD_RETIRED_FB_HIT,period=100007/pp,cpu_core/event=0xd1,umask=0x1,name=L1_Bound_MEM_LOAD_RETIRED_L1_HIT,period=1000003/pp,cpu_core/event=0xa2,umask=0x2,name=Serializing_Operation_RESOURCE_STALLS_SCOREBOARD,period=100003/,cpu_core/event=0xa4,umask=0x2,name=Backend_Bound_TOPDOWN_BACKEND_BOUND_SLOTS,period=10000003/,cpu_core/event=0xc2,umask=0x4,frontend=0x8,name=Microcode_Sequencer_UOPS_RETIRED_MS,period=2000003/,cycles:pp -o perf.data -C 0 taskset -c 0 sleep 1
WARNING: A requested CPU in '0' is not supported by PMU 'cpu_atom' (CPUs 8-23) for event 'cycles:pp'
Error:
The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (cpu_atom/cycles:pp/).
/bin/dmesg | grep -i perf may provide additional information.

Sampling failed

but if I remove the --core S0-C0 part it works:

~/pmu-tools/toplev.py  -l3 --run-sample --no-desc taskset -c 0 sleep 1
70 events not counted
# 4.7-full, 3.51 on Intel(R) Core(TM) i9-14900K [adl]
core FE               Frontend_Bound                              % Slots                       42.0
core BE               Backend_Bound                               % Slots                       26.3   [28.0%]
core FE               Frontend_Bound.Fetch_Latency                % Slots                       28.5   [28.0%]
core BE/Core          Backend_Bound.Core_Bound                    % Slots                       15.3   [28.0%]
core FE               Frontend_Bound.Fetch_Latency.ICache_Misses  % Clocks                      12.6   [75.0%]<==
core FE               Frontend_Bound.Fetch_Latency.ITLB_Misses    % Clocks                       6.0   [75.0%]
warning: 16 nodes had zero counts: Branch_Resteers DRAM_Bound DSB DSB_Switches Divider L1_Bound L2_Bound L3_Bound LSD MITE MS_Switches Other_Mispredicts Other_Nukes Ports_Utilization Serializing_Operation Store_Bound
atom FE               Frontend_Bound                              % Slots                       34.0   [28.0%]<==
atom FE               Frontend_Bound.Fetch_Latency                % Slots                       16.5   [28.0%]
atom FE               Frontend_Bound.Fetch_Bandwidth              % Slots                       17.5   [28.0%]
atom BAD              Bad_Speculation                             % Slots                       18.0   [28.0%]
atom BAD              Bad_Speculation.Branch_Mispredicts          % Slots                       17.4   [28.0%]
warning: 22 nodes had zero counts: Base Branch_Detect Branch_Resteer Cisc DRAM_Bound Decode FPDIV_uops Fast_Nuke ICache_Misses ITLB_Misses L1_Bound L2_Bound L3_Bound MS_uops Machine_Clears Mem_Scheduler Memory_Bound Nuke Other_FB Other_Ret Predecode Store_Bound
Run toplev --describe ICache_Misses^ to get more information on bottleneck for core
Run toplev --describe Frontend_Bound^ to get more information on bottleneck for atom
Add --nodes '!+Frontend_Bound*/2,+MUX' for breakdown.
Sampling:
perf record -g -e cpu_core/event=0xc6,umask=0x1,frontend=0x14,name=ITLB_Misses_FRONTEND_RETIRED_ITLB_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x12,name=ICache_Misses_FRONTEND_RETIRED_L1I_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x13,name=ICache_Misses_FRONTEND_RETIRED_L2_MISS,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x601006,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_16,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600406,name=Frontend_Bound_FRONTEND_RETIRED_LATENCY_GE_4,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x600806,name=Fetch_Latency_FRONTEND_RETIRED_LATENCY_GE_8,period=100007/pp,cpu_core/event=0xc6,umask=0x1,frontend=0x15,name=ITLB_Misses_FRONTEND_RETIRED_STLB_MISS,period=100007/pp,cpu_core/event=0xa4,umask=0x2,name=Backend_Bound_TOPDOWN_BACKEND_BOUND_SLOTS,period=10000003/,cycles:pp -o perf.data taskset -c 0 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (7 samples) ]
Run `perf report' to show the sampling results
Sampling:
perf record -g -e cycles:pp -o perf.data taskset -c 0 sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (15 samples) ]
Run `perf report' to show the sampling results

(~/pmu-tools/toplev.py --core S0-C0 -l3 --run-sample --no-desc sleep 1 also fails)

Does a plain perf record -e cycles:pp ./run.sh

Yes, it does. E.g.:

perf stat -e cycles:pp sleep 1

 Performance counter stats for 'sleep 1':

   <not supported>      cpu_core/cycles:pp/
   <not supported>      cpu_atom/cycles:pp/

       1.003134906 seconds time elapsed

       0.002847000 seconds user
       0.000000000 seconds sys


 perf record -e cycles:pp sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (7 samples) ]

perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 7  of event 'cpu_atom/cycles:pp/'
# Event count (approx.): 5357041
#
# Overhead  Command  Shared Object      Symbol
# ........  .......  .................  .................................
#
    89.11%  sleep    [kernel.kallsyms]  [k] __get_user_8
    10.59%  sleep    [kernel.kallsyms]  [k] tlb_gather_mmu
     0.29%  perf-ex  [kernel.kallsyms]  [k] nmi_restore
     0.01%  perf-ex  [kernel.kallsyms]  [k] __intel_pmu_enable_all.isra.0
     0.00%  perf-ex  [kernel.kallsyms]  [k] native_write_msr


#
# (Tip: To add Node.js USDT(User-Level Statically Defined Tracing): perf buildid-cache --add `which node`)

from pmu-tools.

thetheodor avatar thetheodor commented on July 21, 2024

but if I remove the --core S0-C0 part it works:

my guess is that the difference boils down to passing a -C 0 to perf. Without it everything seems to work fine.

from pmu-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.