Comments (19)
run.log
The workload agenda is here:
workloads:
- name: stress-ng
iterations: 5
params:
cleanup_assets: true
duration: 10
extra_args: '--cpu-method gcd --taskset 5,7 -l 100'
stressor: cpu
threads: 2
uninstall: false
runtime_parameters:
A55_frequency: 1328000
A76_frequency: 1328000
X1_frequency: 1745000
sysfile_values:
/sys/devices/system/cpu/cpu1/online: 0
/sys/devices/system/cpu/cpu2/online: 0
/sys/devices/system/cpu/cpu3/online: 0
/sys/devices/system/cpu/cpu4/online: 0
/sys/devices/system/cpu/cpu5/online: 1
/sys/devices/system/cpu/cpu6/online: 0
/sys/devices/system/cpu/cpu7/online: 1
from workload-automation.
Yep, I've been using your branch rather than the upstream implementation.
from workload-automation.
Hi, thanks for reporting this, that should not be the case so sounds like we might have a bug somewhere.
As a workaround could you try explicitly specifying the frequency of the enabled cores that you are looking for and see if that allows you to make progress?
e.g.
cpu2_frequency: 1328000
cpu5_frequency: 1328000
cpu6_frequency: 1745000
from workload-automation.
Thanks for the quick response. Still does not seem to work:
workloads:
- name: stress-ng
iterations: 10
params:
cleanup_assets: true
duration: 10
extra_args: '--cpu-method callfunc --taskset 6,7 -l 100'
stressor: cpu
threads: 2
uninstall: false
runtime_parameters:
# A55_frequency: 1328000
# A76_frequency: 1328000
# X1_frequency: 1745000
cpu2_frequency: 1328000
cpu5_frequency: 1328000
cpu6_frequency: 1745000
sysfile_values:
/sys/devices/system/cpu/cpu1/online: 0
/sys/devices/system/cpu/cpu2/online: 1
/sys/devices/system/cpu/cpu3/online: 0
/sys/devices/system/cpu/cpu4/online: 0
/sys/devices/system/cpu/cpu5/online: 1
/sys/devices/system/cpu/cpu6/online: 1
/sys/devices/system/cpu/cpu7/online: 1
With the output as below:
INFO Running job wk1
INFO Configuring augmentations
INFO Configuring target for job wk1 (stress-ng) [1]
ERROR Cannot configure frequencies for CPU4 as no CPUs are online.
INFO Completing job wk1
ERROR Job wk1 iteration 1 completed with status FAILED. retrying...
INFO Running job wk1
INFO Configuring augmentations
INFO Configuring target for job wk1 (stress-ng) [1]
ERROR Cannot configure frequencies for CPU4 as no CPUs are online.
INFO Completing job wk1
ERROR Job wk1 iteration 1 completed with status FAILED. retrying...
INFO Running job wk1
INFO Configuring augmentations
INFO Configuring target for job wk1 (stress-ng) [1]
ERROR Cannot configure frequencies for CPU4 as no CPUs are online.
INFO Completing job wk1
ERROR Job wk1 iteration 1 completed with status FAILED. Max retries exceeded.
from workload-automation.
Hmm.. I see. It seems like this is happening because WA is resolving to the first cpu in the cluster and incorrectly not checking to find the first "online" cpu in the cluster.
If you don't have the requirement for particular cpus and only the number online per cluster, one potential workaround may be to online the first cpu of each cluster and hopefully allow WA's resolution to function as intended.
E.g. for your first example:
sysfile_values:
/sys/devices/system/cpu/cpu0/online: 1
/sys/devices/system/cpu/cpu1/online: 0
/sys/devices/system/cpu/cpu2/online: 1
/sys/devices/system/cpu/cpu3/online: 0
/sys/devices/system/cpu/cpu4/online: 1
/sys/devices/system/cpu/cpu5/online: 0
/sys/devices/system/cpu/cpu6/online: 1
/sys/devices/system/cpu/cpu7/online: 0
from workload-automation.
Ahhhh i see, I was hoping that that wasnt the case as I would prefer having the flexibility of particular cpus
from workload-automation.
I think I've found the problem (and a few others in the process). Would you be able to try out this [1] branch on your setup and let me know if this resolves the issue for you?
[1] https://github.com/marcbonnici/workload-automation/tree/cpu_domain_fix
from workload-automation.
Okay, so I switched branches, and i just used the setup.py and followed the installation with:
cd workload-automation
sudo -H python setup.py install
And the given version is 3.4.0.dev1+7c432d74. but the issue still seems to occur.
from workload-automation.
Hmm.. thanks for trying that out.
Do you have your run.log
available to see if there are any further hints in there?
from workload-automation.
Thanks, would you be able to pull my branch again and see if this resolves this problem for you?
from workload-automation.
Still seems to be happening.
run.log
Also in case you need the agenda:
stressng_w_10iter.txt
from workload-automation.
Hi Honsunhc - what happens if you try to explicitly set the frequency for each online CPU, rather than the cluster frequency?
e.g
runtime_parameters:
cpu0_frequency: 1328000
cpu5_frequency: 1328000
cpu7_frequency: 1745000
from workload-automation.
Right it looks like next issue here is that WA queries the device at the time it validates the input parameters and this can change before they are committed to the device.
At the point the cluster A76
(for example) will resolve to both cpus 4 and 5 (if both are online at that time) so WA picks the first cpu and hence is later generating the error since as part of the sysfile setting that cpu is being turned off before WA can actually commit the frequency.
I think Scotts workaround should work as it doesn't not rely on this resolution, however I've also updated my branch again to change the order the sysfile runtime parameters are set on the device so that any frequency configuration happens before we offline cpus. Would you be able to check if this one gets things working for you?
from workload-automation.
So I tried both Scotts method and the normal cluster method, and they both work great! There was one instance using the A76
method where the first iteration ran fine but then the remaining four iterations did have the same CPU issues, but this only happened once. If that error persists, I'll open a new issue, but at the moment I think its fixed! Thanks!
from workload-automation.
Thanks for confirming, I'm glad we finally have a working setup for you.
I think I might know what could cause the issue with the cluster approach but would need to look into this further so I'll keep this issue open for now as well.
from workload-automation.
So it seems that this could be a more persistent issue.
I attached the run log below:
run.log
from workload-automation.
I think the issue here is the cluster names combined with the hotplugging and iterations, the resolution of the cpus is still being performed at the start of the run and when trying to configure the device on subsequent iterations we run into the same problem.
Does using the cpuX_frequency
notation still work here?
from workload-automation.
Yep, using cpuX_frequency
works great.
from workload-automation.
Ok thanks for confirming. I looks like to solve the cluster parameters in combination with hotplugging the runtime parameter mechanism would require some more invasive changes.
Just to double check, are you still using my topic branch to get things working on your end rather than the upstream implementation? If so I'll look at merging those changes so we at least have a workable solution upstream as well.
from workload-automation.
Related Issues (20)
- Linux workloads HOT 2
- Ordering constraints for output processors HOT 2
- Reliable power measurement with ADB connected HOT 7
- can we configure an agenda to loop indefinitely? HOT 3
- GeekBench5 automation is broken for latest version (5.4.6) HOT 1
- Issue running any workload on rooted Pixel 6 (Android 13) HOT 3
- Runtime parameter cpufreq change failure on Pixel 6 (root access, Android 13) HOT 4
- Adding a custom instrument - Could not find plugin or alias HOT 5
- Add Simpleperf binary that WA uses instead of the on device one HOT 1
- pm grant permission error HOT 1
- Issues with some workloads HOT 7
- Error in record HOT 7
- extra_plugin_paths config ignored
- Buggy rt-app binary HOT 2
- Support Python 3.12 HOT 3
- Some questions about adding own workloads HOT 2
- Simpleperf error when more than 6 events HOT 3
- Reducing Setting Up Target Phase? HOT 3
- Gem5 Support HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from workload-automation.