Comments (4)
There are at least two problems with this test workload & recent toplev:
- The Bottlenecks View required at least level 4 tree
- The run time is too short of ~1 second which runs into multiplexing issues
- Trunk toplev stops to list the nodes with zero counts; which is used by perf-tools. revert that please.
Here is a reproducer. First line is the command to run inside perf-tools folder, followed by its output on ICX.
The first run with trunk pmu-tools and --no-multiplex shows no negative bottlenecks. Actual toplev command kept for reference.
./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 no-mux' -pm 10 -v1 --pmu-tools ../pmu-tools --toplev-args ' --no-multiplex'
INFO: App: ./workloads/GITGREP pmu-tools1 no-mux .
topdown full tree + All Bottlenecks ..
../pmu-tools/toplev.py --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Cache_Memory_Bandwidth,+Cache_Memory_Latency,+Memory_Data_TLBs,+Memory_Synchronization,+Irregular_Overhead,+Other_Bottlenecks,+Base_Non_Br' -V GITGREP-pmu-tools1-no-mux.toplev-vl6-perf.csv --no-multiplex --tune 'DEDUP_NODE = "MEM_Parallel_Reads,Lock_Latency,Slots_Utilization,Power,L2_Bound,Big_Code,DSB_Misses,IC_Misses,Contested_Accesses,Data_Sharing,PMM_Bound,Memory_Operations,DRAM_Bound,Other_Light_Ops,Mispredictions,Cache_Memory_Bandwidth,Cache_Memory_Latency,Memory_Data_TLBs,Memory_Synchronization,Base_Non_Br,Instruction_Fetch_BW,Irregular_Overhead,Core_Bound_Likely,Branch_Misprediction_Cost,Other_Bottlenecks"' -- ./workloads/GITGREP pmu-tools1 no-mux 2>&1 | tee GITGREP-pmu-tools1-no-mux.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort
BE/Core Backend_Bound.Core_Bound.Ports_Utilization.Ports_Utilized_2 % Clocks 18.2 <==
Info.Botlnk.L2 DSB_Misses Scaled_Slots 2.38
Info.Bottleneck Base_Non_Br Scaled_Slots 32.35
Info.Bottleneck Big_Code Scaled_Slots 1.67
Info.Bottleneck Branching_Overhead Scaled_Slots 9.56
Info.Bottleneck Cache_Memory_Bandwidth Scaled_Slots 1.26
Info.Bottleneck Cache_Memory_Latency Scaled_Slots 1.55
Info.Bottleneck Instruction_Fetch_BW Scaled_Slots 9.60
Info.Bottleneck Irregular_Overhead Scaled_Slots 4.69
Info.Bottleneck Memory_Data_TLBs Scaled_Slots 1.42
Info.Bottleneck Memory_Synchronization Scaled_Slots 0.01
Info.Bottleneck Mispredictions Scaled_Slots 19.24
Info.Bottleneck Other_Bottlenecks Scaled_Slots 18.64
Info.System Time Seconds 1.77
MUX % 100.00
This is the failure by default using pmu-tools at 4.6 release point.
./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 do-mux' -pm 10 -v1
INFO: App: ./workloads/GITGREP pmu-tools1 do-mux .
topdown full tree + All Bottlenecks ..
/usr/bin/python /home/admin1/ayasin/perf-tools/pmu-tools/toplev.py --no-desc -vl6 --nodes '+IPC,+Instructions,+UopPI,+Time,+SLOTS,+CLKS,+Mispredictions,+Big_Code,+Instruction_Fetch_BW,+Branching_Overhead,+DSB_Misses,+Cache_Memory_Bandwidth,+Cache_Memory_Latency,+Memory_Data_TLBs,+Memory_Synchronization,+Irregular_Overhead,+Other_Bottlenecks,+Base_Non_Br' -V GITGREP-pmu-tools1-do-mux.toplev-vl6-perf.csv --frequency --metric-group +Summary --tune 'DEDUP_NODE = "MEM_Parallel_Reads,Lock_Latency,Slots_Utilization,Power,L2_Bound,Big_Code,DSB_Misses,IC_Misses,Contested_Accesses,Data_Sharing,PMM_Bound,Memory_Operations,DRAM_Bound,Other_Light_Ops,Mispredictions,Cache_Memory_Bandwidth,Cache_Memory_Latency,Memory_Data_TLBs,Memory_Synchronization,Base_Non_Br,Instruction_Fetch_BW,Irregular_Overhead,Core_Bound_Likely,Branch_Misprediction_Cost,Other_Bottlenecks"' -- ./workloads/GITGREP pmu-tools1 do-mux 2>&1 | tee GITGREP-pmu-tools1-do-mux.toplev-vl6.log | egrep '<==|MUX|Info(\.Bot|.*Time)|warning.*zero' | sort
BE/Core Backend_Bound.Core_Bound % Slots 21.2 [30.0%]<==
Info.Botlnk.L2 DSB_Misses Scaled_Slots 0.58 [ 6.1%]
Info.Bottleneck Base_Non_Br Scaled_Slots -75.96 [ 7.5%]
Info.Bottleneck Big_Code Scaled_Slots 5.49 [85.8%]
Info.Bottleneck Branching_Overhead Scaled_Slots 114.85 [ 7.5%]
Info.Bottleneck Cache_Memory_Bandwidth Scaled_Slots 2.24 [ 7.5%]
Info.Bottleneck Cache_Memory_Latency Scaled_Slots 1.25 [12.0%]
Info.Bottleneck Instruction_Fetch_BW Scaled_Slots 9.59 [23.1%]
Info.Bottleneck Irregular_Overhead Scaled_Slots 8.49 [ 7.0%]
Info.Bottleneck Memory_Data_TLBs Scaled_Slots 0.42 [ 7.0%]
Info.Bottleneck Memory_Synchronization Scaled_Slots 0.02 [ 7.0%]
Info.Bottleneck Mispredictions Scaled_Slots 14.51 [85.8%]
Info.Bottleneck Other_Bottlenecks Scaled_Slots 19.11 [ 7.0%]
Info.System Time Seconds 1.77
MUX % 0.00
warning: 35 nodes had zero counts: ALU_Op_Utilization Clears_Resteers DSB DTLB_Load DTLB_Store Decoder0_Alone L1_Bound L3_Hit_Latency Load_Op_Utilization Local_DRAM MITE MITE_4wide Microcode_Sequencer Mispredicts_Resteers Mixing_Vectors Other_Mispredicts Other_Nukes Port_0 Port_1 Port_5 Port_6 Ports_Utilization Ports_Utilized_0 Ports_Utilized_1 Remote_Cache Remote_DRAM Serializing_Operation Slow_Pause Split_Loads Split_Stores Store_Latency Store_Op_Utilization Store_STLB_Miss Unknown_Branches X87_Use
ERROR: Too many metrics with zero counts; 35 unexpected (ALU_Op_Utilization Clears_Resteers DSB DTLB_Load DTLB_Store Decoder0_Alone L1_Bound L3_Hit_Latency Load_Op_Utilization Local_DRAM MITE MITE_4wide Microcode_Sequencer Mispredicts_Resteers Mixing_Vectors Other_Mispredicts Other_Nukes Port_0 Port_1 Port_5 Port_6 Ports_Utilization Ports_Utilized_0 Ports_Utilized_1 Remote_Cache Remote_DRAM Serializing_Operation Slow_Pause Split_Loads Split_Stores Store_Latency Store_Op_Utilization Store_STLB_Miss Unknown_Branches X87_Use). Run longer or use: --toplev-args ' --no-multiplex' !
!
ERROR: Command "./do.py --tune :forgive:0 :help:0 :msr:1 :sample:3 :size:1 :loops:3 :loop-ideal-ipc:1 -v0 profile -a './workloads/GITGREP pmu-tools1 do-mux' -pm 10 -v1" failed with '256' !
!
perf-tools flags the zero counts & suggests to run longer or use no-multiplex.
from pmu-tools.
But even with multiplex issues shouldn't the formula guard against bad values? These are not uncommon.
I have a open bug on detecting too short run time for multiplexing in toplev
from pmu-tools.
Also I'm surprised that 1s is not enough anymore to get through all the groups. It must have really grown a lot.
from pmu-tools.
1s is too short.
There are around a couple dozen groups for the full tree with current toplev each group get sample <5% of time.
from pmu-tools.
Related Issues (20)
- toplev: Fix misaligned values in columns when area is too long
- tl-tester sometimes has comparison failures
- toplev does not print multiplex information for all metrics
- toplev: Compute minimum column widths for numbers / units
- toplev should print % for Scaled_Slots
- toplev should support filtering metrics by their threshold HOT 1
- ADL 100% Machine_Clears
- toplev should accumulate and report running time
- Add support for Meteor Lake HOT 1
- Test suite should cover both models for hybrid targets HOT 1
- toplev --run-sample: A requested CPU in '0' is not supported by PMU 'cpu_atom' (CPUs 8-23) for event 'cycles:pp' HOT 3
- toplev add option to only collect bottlenecks, not L1/L2
- MTL support misses Info.Bottlenecks HOT 1
- Trunk version of toplev regressed in generation of valid groups HOT 1
- CLTRAMP3D workload is not available
- Incorrect event for IpCall metric
- event_download returns incorrect list name in hybrid
- How to measure only the second part (a region) of a program? HOT 1
- No precise event (ppu) usage with perf-tools auto-drilldown HOT 2
- No cmask in perf output.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pmu-tools.