<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-ur

toplev: Info_Bottlenecks reports negative Scaled_Slots on SKX about pmu-tools HOT 4 OPEN

andikleen commented on June 22, 2024

toplev: Info_Bottlenecks reports negative Scaled_Slots on SKX

from pmu-tools.

Comments (4)

aayasin commented on June 22, 2024

There are at least two problems with this test workload & recent toplev:

The Bottlenecks View required at least level 4 tree
The run time is too short of ~1 second which runs into multiplexing issues
Trunk toplev stops to list the nodes with zero counts; which is used by perf-tools. revert that please.

Here is a reproducer. First line is the command to run inside perf-tools folder, followed by its output on ICX.

The first run with trunk pmu-tools and --no-multiplex shows no negative bottlenecks. Actual toplev command kept for reference.

./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 no-mux' -pm 10 -v1 --pmu-tools ../pmu-tools --toplev-args ' --no-multiplex'                                                                                                                                                    
INFO: App: ./workloads/GITGREP pmu-tools1 no-mux .                                                                                                                                
topdown full tree + All Bottlenecks ..                                                                                                                                            
../pmu-tools/toplev.py --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Cache_Memory_Bandwidth,+Cache_Memory_Latency,+Memory_Data_TLBs,+Memory_Synchronization,+Irregular_Overhead,+Other_Bottlenecks,+Base_Non_Br' -V GITGREP-pmu-tools1-no-mux.toplev-vl6-perf.csv --no-multiplex --tune 'DEDUP_NODE = "MEM_Parallel_Reads,Lock_Latency,Slots_Utilization,Power,L2_Bound,Big_Code,DSB_Misses,IC_Misses,Contested_Accesses,Data_Sharing,PMM_Bound,Memory_Operations,DRAM_Bound,Other_Light_Ops,Mispredictions,Cache_Memory_Bandwidth,Cache_Memory_Latency,Memory_Data_TLBs,Memory_Synchronization,Base_Non_Br,Instruction_Fetch_BW,Irregular_Overhead,Core_Bound_Likely,Branch_Misprediction_Cost,Other_Bottlenecks"' -- ./workloads/GITGREP pmu-tools1 no-mux 2>&1 | tee GITGREP-pmu-tools1-no-mux.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort                                                                                                                          
BE/Core          Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_2                                   % Clocks                           18.2   <==                      
Info.Botlnk.L2   DSB_Misses                                                                                      Scaled_Slots                      2.38                           
Info.Bottleneck  Base_Non_Br                                                                                     Scaled_Slots                     32.35                           
Info.Bottleneck  Big_Code                                                                                        Scaled_Slots                      1.67                           
Info.Bottleneck  Branching_Overhead                                                                              Scaled_Slots                      9.56                           
Info.Bottleneck  Cache_Memory_Bandwidth                                                                          Scaled_Slots                      1.26                           
Info.Bottleneck  Cache_Memory_Latency                                                                            Scaled_Slots                      1.55                           
Info.Bottleneck  Instruction_Fetch_BW                                                                            Scaled_Slots                      9.60                           
Info.Bottleneck  Irregular_Overhead                                                                              Scaled_Slots                      4.69                           
Info.Bottleneck  Memory_Data_TLBs                                                                                Scaled_Slots                      1.42                           
Info.Bottleneck  Memory_Synchronization                                                                          Scaled_Slots                      0.01                           
Info.Bottleneck  Mispredictions                                                                                  Scaled_Slots                     19.24                           
Info.Bottleneck  Other_Bottlenecks                                                                               Scaled_Slots                     18.64                           
Info.System      Time                                                                                            Seconds                           1.77                           
MUX                                                                                                            %                                 100.00

This is the failure by default using pmu-tools at 4.6 release point.

./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 do-mux' -pm 10 -v1
INFO: App: ./workloads/GITGREP pmu-tools1 do-mux .
topdown full tree + All Bottlenecks ..
/usr/bin/python /home/admin1/ayasin/perf-tools/pmu-tools/toplev.py --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Cache_Memory_Bandwidth,+Cache_Memory_Latency,+Memory_Data_TLBs,+Memory_Synchronization,+Irregular_Overhead,+Other_Bottlenecks,+Base_Non_Br' -V GITGREP-pmu-tools1-do-mux.toplev-vl6-perf.csv --frequency --metric-group +Summary --tune 'DEDUP_NODE = "MEM_Parallel_Reads,Lock_Latency,Slots_Utilization,Power,L2_Bound,Big_Code,DSB_Misses,IC_Misses,Contested_Accesses,Data_Sharing,PMM_Bound,Memory_Operations,DRAM_Bound,Other_Light_Ops,Mispredictions,Cache_Memory_Bandwidth,Cache_Memory_Latency,Memory_Data_TLBs,Memory_Synchronization,Base_Non_Br,Instruction_Fetch_BW,Irregular_Overhead,Core_Bound_Likely,Branch_Misprediction_Cost,Other_Bottlenecks"' -- ./workloads/GITGREP pmu-tools1 do-mux 2>&1 | tee GITGREP-pmu-tools1-do-mux.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                           21.2    [30.0%]<==
Info.Botlnk.L2 DSB_Misses                                                                                      Scaled_Slots                     0.58   [ 6.1%]
Info.Bottleneck Base_Non_Br                                                                                    Scaled_Slots                   -75.96   [ 7.5%]
Info.Bottleneck Big_Code                                                                                       Scaled_Slots                     5.49   [85.8%]
Info.Bottleneck Branching_Overhead                                                                             Scaled_Slots                   114.85   [ 7.5%]
Info.Bottleneck Cache_Memory_Bandwidth                                                                         Scaled_Slots                     2.24   [ 7.5%]
Info.Bottleneck Cache_Memory_Latency                                                                           Scaled_Slots                     1.25   [12.0%]
Info.Bottleneck Instruction_Fetch_BW                                                                           Scaled_Slots                     9.59   [23.1%]
Info.Bottleneck Irregular_Overhead                                                                             Scaled_Slots                     8.49   [ 7.0%]
Info.Bottleneck Memory_Data_TLBs                                                                               Scaled_Slots                     0.42   [ 7.0%]
Info.Bottleneck Memory_Synchronization                                                                         Scaled_Slots                     0.02   [ 7.0%]
Info.Bottleneck Mispredictions                                                                                 Scaled_Slots                    14.51   [85.8%]
Info.Bottleneck Other_Bottlenecks                                                                              Scaled_Slots                    19.11   [ 7.0%]
Info.System    Time                                                                                            Seconds                          1.77
MUX                                                                                                          %                                  0.00
warning: 35 nodes had zero counts: ALU_Op_Utilization Clears_Resteers DSB DTLB_Load DTLB_Store Decoder0_Alone L1_Bound L3_Hit_Latency Load_Op_Utilization Local_DRAM MITE MITE_4wide Microcode_Sequencer Mispredicts_Resteers Mixing_Vectors Other_Mispredicts Other_Nukes Port_0 Port_1 Port_5 Port_6 Ports_Utilization Ports_Utilized_0 Ports_Utilized_1 Remote_Cache Remote_DRAM Serializing_Operation Slow_Pause Split_Loads Split_Stores Store_Latency Store_Op_Utilization Store_STLB_Miss Unknown_Branches X87_Use
ERROR: Too many metrics with zero counts; 35 unexpected (ALU_Op_Utilization Clears_Resteers DSB DTLB_Load DTLB_Store Decoder0_Alone L1_Bound L3_Hit_Latency Load_Op_Utilization Local_DRAM MITE MITE_4wide Microcode_Sequencer Mispredicts_Resteers Mixing_Vectors Other_Mispredicts Other_Nukes Port_0 Port_1 Port_5 Port_6 Ports_Utilization Ports_Utilized_0 Ports_Utilized_1 Remote_Cache Remote_DRAM Serializing_Operation Slow_Pause Split_Loads Split_Stores Store_Latency Store_Op_Utilization Store_STLB_Miss Unknown_Branches X87_Use). Run longer or use: --toplev-args ' --no-multiplex' !
 !
ERROR: Command "./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 do-mux' -pm 10 -v1" failed with '256' !
 !

perf-tools flags the zero counts & suggests to run longer or use no-multiplex.

from pmu-tools.

andikleen commented on June 22, 2024

But even with multiplex issues shouldn't the formula guard against bad values? These are not uncommon.

I have a open bug on detecting too short run time for multiplexing in toplev

from pmu-tools.

andikleen commented on June 22, 2024

Also I'm surprised that 1s is not enough anymore to get through all the groups. It must have really grown a lot.

from pmu-tools.

aayasin commented on June 22, 2024

1s is too short.

There are around a couple dozen groups for the full tree with current toplev each group get sample <5% of time.

from pmu-tools.

toplev: Info_Bottlenecks reports negative Scaled_Slots on SKX about pmu-tools HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent