The following JSON configuration was used on a Juno R0, big core running with "perform

The known-good version I tried was <a class="commit-link" data-hovercard-type="commit"

Have you tried to run without enabling trace ? <p dir="

Have you tried to run without enabling trace ? </blockqu

rtapp execution gives unreliable actual duty cycle about rt-app HOT 21 OPEN

scheduler-tools commented on May 2, 2024

rtapp execution gives unreliable actual duty cycle

from rt-app.

Comments (21)

douglas-raillard-arm commented on May 2, 2024

Actually, it seems more like a calibration issue: rt-app gives very different calibration results from one run to another, but when a good value is found, rt-app seems to execute workloads reliably.

EDIT: both issues are present, and seems to share a common cause

from rt-app.

douglas-raillard-arm commented on May 2, 2024

The known-good version I tried was 482e47a

from rt-app.

vingu-linaro commented on May 2, 2024

Hi Douglas,

I have run on my hikey 20 times : "rt-app doc/examples/template.json" with the master branch and the calibration stays the same: 151ns. I have also checked the log file of the thread, the variation of the duration of the 10msec run event is 18usec.

from rt-app.

vingu-linaro commented on May 2, 2024

I also tried with static link:
The calibration returns 18 times 500ns and 2 times 501ns (obviously the ldexp is not the same) and the variation of the 10msec run event is 123usec.

I also run tests on the big core of hikey960 and the results are quite similar: the calibration always return 138ns and the variation of the 10 msec run event is only 7usec

from rt-app.

douglas-raillard-arm commented on May 2, 2024

Do you see anything strange in the traces I've attached to the issue ?

from rt-app.

vingu-linaro commented on May 2, 2024

i don't see any strange things.
Have you tried to run without enabling trace ? and only using the log file with a memory buffer ?

from rt-app.

douglas-raillard-arm commented on May 2, 2024

Have you tried to run without enabling trace ?

Calibration runs as root without tracing. We just run rt-app with idle states disabled, performance cpufreq gov and the rest of the userspace frozen.

and only using the log file with a memory buffer ?

Do you mean logging to a file in tmpfs, or a specific rt-app option ?

from rt-app.

vingu-linaro commented on May 2, 2024

Have you tried to run without enabling trace ?

Calibration runs as root without tracing. We just run rt-app with idle states disabled, performance cpufreq gov and the rest of the userspace frozen.

and only using the log file with a memory buffer ?

Do you mean logging to a file in tmpfs, or a specific rt-app option ?

"log_size" : 1024, as an example so you will not access file while running events
or even
"log_size" : "disable"

from rt-app.

douglas-raillard-arm commented on May 2, 2024

I'll give it a go. Is that the same logs as enabled by logstats (we disable it) ? Can't find a reference to logstats in the documentation.

Also found that util of the task has some kind of low freq component (~120ms period) with the current rt-app. That happened on CPU1 of my Juno R0, although CPU2 was apparently unaffected (they are the same kind of big core).

from rt-app.

vingu-linaro commented on May 2, 2024

logstats ? Do you mean log_timing() ?
For the low freq component, i will try to reproduce it

from rt-app.

douglas-raillard-arm commented on May 2, 2024

@derkling added "logstats" global option in the JSON produced by LISA, so I assume rt-app knows about it. This is supposed to enable/disable the generation of slack logs, now that they can also be emitted as ftrace events.

Let me know if you are interested in a trace where this strange util variation is observed, and fixed by the change to the busy loop

from rt-app.

vingu-linaro commented on May 2, 2024

There is no "logstats" is master branch sha1: 9a50d76
This raises the question : have you tried the master branch of rt-app ?

from rt-app.

douglas-raillard-arm commented on May 2, 2024

Tests when changing the busy loop were conducted on master @ 9a50d76
So the comparison between rt-app versions with same toolchain are somewhat valid.

But rt-app used within LISA is:

rt-app v1.0-95-g72ab18b (2019-09-05 14:26:07 BST)

This commit SHA1 does not seem to exist in rt-app. @derkling Do you remember if the build of rt-app you upstreamed in LISA ?
EDIT: it's probably 72ab18b (and not g72ab18b), so it's almost the latest master. However, I also cannot find references to logstats in the code ...

from rt-app.

douglas-raillard-arm commented on May 2, 2024

I'll try with "log_size" : "disable" rather than logstats

from rt-app.

douglas-raillard-arm commented on May 2, 2024

I've tried with "log_size": "disable" and I get the following results on Juno R2.
JSON: rta_ntaskscpumigration.json.txt

Task utilization with current version of rt-app:

With rt-app from #90 :

These results seem to be reproducible (I ran multiple iterations with each, always getting similar results).

I also ran an integration cycle with the modified version of rt-app and it removed some failures, most notably on the CPUMigration tests that these graphs are taken from.

from rt-app.

vingu-linaro commented on May 2, 2024

I have run your json file on my hikey but still can't reproduce your instability. I would say that this is even quite stable.
chart.pdf

Then, the theoretical range of util_avg for your migrX-X tasks is [159-211] but with #90 your range is around [110-150]

could you try to run my rt-app binary ? so we can check if your instability comes from your compilation env
rt-app.gz

from rt-app.

vingu-linaro commented on May 2, 2024

Douglas,
Any update on this problem ? Have you tried the binary that I posted ?

from rt-app.

vingu-linaro commented on May 2, 2024

Hi Douglas,

Have you done any progress on this topic ?

from rt-app.

douglas-raillard-arm commented on May 2, 2024

Hi @vingu-linaro, sorry for the response delay. The current state of things seems to be:

I think we get issues more when (at least) two tasks are scheduled at the same time on the same CPU like in the trace I posted here. There is plenty of idle time so I can't really explain it, and that may be a wrong lead
PELT can give surprising results at time (although not at that scale AFAIK), so the relation duty cycle <=> utilization might not be so simple. In the meantime, I've added some duty-cycle oriented plots to LISA so it might be time to revisit the trace I posted here.

note: The CPUMigration test in LISA will soon not trigger that issue anymore, since it is going to measure the duty cycle from the trace directly to avoid this kind of issue.

from rt-app.

vingu-linaro commented on May 2, 2024

Hi @vingu-linaro, sorry for the response delay. The current state of things seems to be:

I think we get issues more when (at least) two tasks are scheduled at the same time on the same CPU like in the trace I posted here. There is plenty of idle time so I can't really explain it, and that may be a wrong lead

I have just rerun run your json file on my hikey and tasks are scheduled simultaneously on the same CPU AFAICT. And the util_avg of all tasks stays quite stable: +/-1 at most

PELT can give surprising results at time (although not at that scale AFAIK), so the relation duty cycle <=> utilization might not be so simple. In the meantime, I've added some duty-cycle oriented plots to LISA so it might be time to revisit the trace I posted here.

yes there is no linear relation between duty cycle and utilization because the period also impact the utilization. The formula is:
max utilization : (1-y^r) / (1-y^p) with r running time and p the period (kim that the step is 1024us)
min utilization : max utilization * y^(p-r)

note: The CPUMigration test in LISA will soon not trigger that issue anymore, since it is going to measure the duty cycle from the trace directly to avoid this kind of issue.

from rt-app.

douglas-raillard-arm commented on May 2, 2024

PELT can give surprising results at time

I was referring to this kind of behavior

The red lines Y coordinate give the duration of the corresponding rtapp activation. Black lines are the same for the sleep part. The util signal has a weird non-symmetrical oscillation even in parts where the duty cycle stays stable (between t=3.75 and t=4).

The current PELT simulator we have seems to work well, except in that case where it gives different results, so I assume there is something tricky going on, but I've not been able to pinpoint what it is exactly. The issue can be reproduced with the Invariance test in LISA [1] (it's not failing all the times, but you should get at least 2% of failed runs or so).

I don't think this issue and the one we are discussing here are related, but who knows ...

[1] https://lisa-linux-integrated-system-analysis.readthedocs.io/en/master/kernel_tests.html#lisa.tests.scheduler.load_tracking.InvarianceItem.test_util_correctness

PS: This kind of plot can be reproduced in a notebook with:

        task = 'the_rtapp_task_name'
        trace = Trace('trace.dat')
        # plot util
        axis = trace.analysis.load_tracking.plot_task_signals(task, signals=['util])
        activation_axis = axis.twinx()
        # Plot activation/sleep and activation "background bands". You can also replace "duration=True" by "duty_cycle=True" with similar results
        trace.analysis.tasks.plot_task_activation(task, alpha=0.2, axis=activation_axis, duration=True)
        axis.legend()

from rt-app.

rtapp execution gives unreliable actual duty cycle about rt-app HOT 21 OPEN

Comments (21)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent