Git Product home page Git Product logo

Comments (4)

aayasin avatar aayasin commented on June 22, 2024

There are at least two problems with this test workload & recent toplev:

  1. The Bottlenecks View required at least level 4 tree
  2. The run time is too short of ~1 second which runs into multiplexing issues
  3. Trunk toplev stops to list the nodes with zero counts; which is used by perf-tools. revert that please.

Here is a reproducer. First line is the command to run inside perf-tools folder, followed by its output on ICX.

The first run with trunk pmu-tools and --no-multiplex shows no negative bottlenecks. Actual toplev command kept for reference.

./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 no-mux' -pm 10 -v1 --pmu-tools ../pmu-tools --toplev-args ' --no-multiplex'                                                                                                                                                    
INFO: App: ./workloads/GITGREP pmu-tools1 no-mux .                                                                                                                                
topdown full tree + All Bottlenecks ..                                                                                                                                            
../pmu-tools/toplev.py --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Cache_Memory_Bandwidth,+Cache_Memory_Latency,+Memory_Data_TLBs,+Memory_Synchronization,+Irregular_Overhead,+Other_Bottlenecks,+Base_Non_Br' -V GITGREP-pmu-tools1-no-mux.toplev-vl6-perf.csv --no-multiplex --tune 'DEDUP_NODE = "MEM_Parallel_Reads,Lock_Latency,Slots_Utilization,Power,L2_Bound,Big_Code,DSB_Misses,IC_Misses,Contested_Accesses,Data_Sharing,PMM_Bound,Memory_Operations,DRAM_Bound,Other_Light_Ops,Mispredictions,Cache_Memory_Bandwidth,Cache_Memory_Latency,Memory_Data_TLBs,Memory_Synchronization,Base_Non_Br,Instruction_Fetch_BW,Irregular_Overhead,Core_Bound_Likely,Branch_Misprediction_Cost,Other_Bottlenecks"' -- ./workloads/GITGREP pmu-tools1 no-mux 2>&1 | tee GITGREP-pmu-tools1-no-mux.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort                                                                                                                          
BE/Core          Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_2                                   % Clocks                           18.2   <==                      
Info.Botlnk.L2   DSB_Misses                                                                                      Scaled_Slots                      2.38                           
Info.Bottleneck  Base_Non_Br                                                                                     Scaled_Slots                     32.35                           
Info.Bottleneck  Big_Code                                                                                        Scaled_Slots                      1.67                           
Info.Bottleneck  Branching_Overhead                                                                              Scaled_Slots                      9.56                           
Info.Bottleneck  Cache_Memory_Bandwidth                                                                          Scaled_Slots                      1.26                           
Info.Bottleneck  Cache_Memory_Latency                                                                            Scaled_Slots                      1.55                           
Info.Bottleneck  Instruction_Fetch_BW                                                                            Scaled_Slots                      9.60                           
Info.Bottleneck  Irregular_Overhead                                                                              Scaled_Slots                      4.69                           
Info.Bottleneck  Memory_Data_TLBs                                                                                Scaled_Slots                      1.42                           
Info.Bottleneck  Memory_Synchronization                                                                          Scaled_Slots                      0.01                           
Info.Bottleneck  Mispredictions                                                                                  Scaled_Slots                     19.24                           
Info.Bottleneck  Other_Bottlenecks                                                                               Scaled_Slots                     18.64                           
Info.System      Time                                                                                            Seconds                           1.77                           
MUX                                                                                                            %                                 100.00                           

This is the failure by default using pmu-tools at 4.6 release point.

./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 do-mux' -pm 10 -v1
INFO: App: ./workloads/GITGREP pmu-tools1 do-mux .
topdown full tree + All Bottlenecks ..
/usr/bin/python /home/admin1/ayasin/perf-tools/pmu-tools/toplev.py --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Cache_Memory_Bandwidth,+Cache_Memory_Latency,+Memory_Data_TLBs,+Memory_Synchronization,+Irregular_Overhead,+Other_Bottlenecks,+Base_Non_Br' -V GITGREP-pmu-tools1-do-mux.toplev-vl6-perf.csv --frequency --metric-group +Summary --tune 'DEDUP_NODE = "MEM_Parallel_Reads,Lock_Latency,Slots_Utilization,Power,L2_Bound,Big_Code,DSB_Misses,IC_Misses,Contested_Accesses,Data_Sharing,PMM_Bound,Memory_Operations,DRAM_Bound,Other_Light_Ops,Mispredictions,Cache_Memory_Bandwidth,Cache_Memory_Latency,Memory_Data_TLBs,Memory_Synchronization,Base_Non_Br,Instruction_Fetch_BW,Irregular_Overhead,Core_Bound_Likely,Branch_Misprediction_Cost,Other_Bottlenecks"' -- ./workloads/GITGREP pmu-tools1 do-mux 2>&1 | tee GITGREP-pmu-tools1-do-mux.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort
BE/Core        Backend_Bound.Core_Bound                                                                      % Slots                           21.2    [30.0%]<==
Info.Botlnk.L2 DSB_Misses                                                                                      Scaled_Slots                     0.58   [ 6.1%]
Info.Bottleneck Base_Non_Br                                                                                    Scaled_Slots                   -75.96   [ 7.5%]
Info.Bottleneck Big_Code                                                                                       Scaled_Slots                     5.49   [85.8%]
Info.Bottleneck Branching_Overhead                                                                             Scaled_Slots                   114.85   [ 7.5%]
Info.Bottleneck Cache_Memory_Bandwidth                                                                         Scaled_Slots                     2.24   [ 7.5%]
Info.Bottleneck Cache_Memory_Latency                                                                           Scaled_Slots                     1.25   [12.0%]
Info.Bottleneck Instruction_Fetch_BW                                                                           Scaled_Slots                     9.59   [23.1%]
Info.Bottleneck Irregular_Overhead                                                                             Scaled_Slots                     8.49   [ 7.0%]
Info.Bottleneck Memory_Data_TLBs                                                                               Scaled_Slots                     0.42   [ 7.0%]
Info.Bottleneck Memory_Synchronization                                                                         Scaled_Slots                     0.02   [ 7.0%]
Info.Bottleneck Mispredictions                                                                                 Scaled_Slots                    14.51   [85.8%]
Info.Bottleneck Other_Bottlenecks                                                                              Scaled_Slots                    19.11   [ 7.0%]
Info.System    Time                                                                                            Seconds                          1.77
MUX                                                                                                          %                                  0.00
warning: 35 nodes had zero counts: ALU_Op_Utilization Clears_Resteers DSB DTLB_Load DTLB_Store Decoder0_Alone L1_Bound L3_Hit_Latency Load_Op_Utilization Local_DRAM MITE MITE_4wide Microcode_Sequencer Mispredicts_Resteers Mixing_Vectors Other_Mispredicts Other_Nukes Port_0 Port_1 Port_5 Port_6 Ports_Utilization Ports_Utilized_0 Ports_Utilized_1 Remote_Cache Remote_DRAM Serializing_Operation Slow_Pause Split_Loads Split_Stores Store_Latency Store_Op_Utilization Store_STLB_Miss Unknown_Branches X87_Use
ERROR: Too many metrics with zero counts; 35 unexpected (ALU_Op_Utilization Clears_Resteers DSB DTLB_Load DTLB_Store Decoder0_Alone L1_Bound L3_Hit_Latency Load_Op_Utilization Local_DRAM MITE MITE_4wide Microcode_Sequencer Mispredicts_Resteers Mixing_Vectors Other_Mispredicts Other_Nukes Port_0 Port_1 Port_5 Port_6 Ports_Utilization Ports_Utilized_0 Ports_Utilized_1 Remote_Cache Remote_DRAM Serializing_Operation Slow_Pause Split_Loads Split_Stores Store_Latency Store_Op_Utilization Store_STLB_Miss Unknown_Branches X87_Use). Run longer or use: --toplev-args ' --no-multiplex' !
 !
ERROR: Command "./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 do-mux' -pm 10 -v1" failed with '256' !
 !

perf-tools flags the zero counts & suggests to run longer or use no-multiplex.

from pmu-tools.

andikleen avatar andikleen commented on June 22, 2024

But even with multiplex issues shouldn't the formula guard against bad values? These are not uncommon.

I have a open bug on detecting too short run time for multiplexing in toplev

from pmu-tools.

andikleen avatar andikleen commented on June 22, 2024

Also I'm surprised that 1s is not enough anymore to get through all the groups. It must have really grown a lot.

from pmu-tools.

aayasin avatar aayasin commented on June 22, 2024

1s is too short.

There are around a couple dozen groups for the full tree with current toplev each group get sample <5% of time.

from pmu-tools.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.