Git Product home page Git Product logo

Comments (36)

raykzhao avatar raykzhao commented on July 24, 2024 1

Hi @hamadmarri @Alt37

It seems that CacULE scheduler works best for me after enabling full preemption and removing all the scheduler tweaks from zen-kernel, xanmod, etc.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 24, 2024 1

Hi @hamadmarri

The new patch seems to be as smooth as the previous smoother patch on my machine. Thank you!

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 24, 2024 1

Hi @hamadmarri

I guess the same starvation bug also exists in the original Cachy scheduler, so I just tried the original Cachy scheduler (without idle-balance) with the fix. It also seems to make the original Cachy scheduler smoother. Now both Cachy and CacULE with the fix feel similar on my machine under heavy load. Not sure which scheduler performs better during microbenchmarks.

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

ssr-2020-10-31_15.58.41_edit.zip
Here's video-demonstration of issue. Strangely it even recorded badly (near the end sound came back to normal but SimpleScreenRecorder (or audio Monitor) received it with interrupts)
How to reproduce on KDE Plasma:

  1. Downgrade some package so Discover will inform you about updates
  2. Play some YT video (optionally force HD quality to increase CPU usage to make it more noticeable)
  3. Click on tray Discover icon
  4. Notice sound and video interrupts
    I'm using LTS distro (Kubuntu 20.04) and mobile CPU with disabled Hyperthreading and lots of kernel tweaks so results may vary.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Hi @Alt37

What was the last revision that wasn't having this issue? I need to check what changes possibly makes this problem.

What is the hrrn_max_life? 30s?

Thank you

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

I tested only 2 last revisions (r7 with 5.4 and r8 with 5.8). Both of them have this issue.
"hrrn_max_life" parameter had default value (30). I tried to increase it to 60 but it didn't help. As you can see from my video sound interrupts immediately when some other process stresses CPU.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Is it possible to test r6 on v5.8?
the changes between r7 and r8 are big. But since they both have this issue, I am thinking that might be the problem is in the small changes between r6 and r7.

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

Tried build with "cachy-5.8-r6.patch" (minus "sysctl_sched_nr_migrate" tweak). Nothing changes - frames and audio samples starts to drop immediately after I click on KDE Discover's "Updates" system tray icon.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024
4\. I'm using LTS distro (Kubuntu 20.04) and mobile CPU with disabled Hyperthreading and lots of kernel tweaks so results may vary.

Hi @Alt37

Maybe some kernel tweaks affect the scheduler? I believe disabling Hyperthreading will increase security/reduce performance, but since it works fine with CFS, so I don't think it is an issue. What kind of other tweaks that are on fair.c scheduler? f-sync? or some sysctl_sched_latency changes?

What kind of hard drive is used? Sometimes slow HD would takes time to load tasks which increase the wait time, thus Cachy will run them over other running tasks.

Thanks

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

What kind of other tweaks that are on fair.c scheduler

sched_latency_ns=(sysctl kernel.sched_latency_ns / 6) * 4
sched_min_granularity_ns=sched_latency_ns/8
sched_wakeup_granularity_ns * 2.5

For dual-core CPU without HT it's

sysctl kernel.sched_latency_ns=8000000
sysctl kernel.sched_min_granularity_ns=1000000
sysctl kernel.sched_wakeup_granularity_ns=5000000

Also

sysctl kernel.sched_nr_migrate=128
sysctl kernel.sched_rt_runtime_us=800000

I tried to revert all those parameters to default values, tried to increase "sched_latency_ns" and "sched_hrrn_max_lifetime_ms", tried to boot with "nothreadirqs - nothing solved this problem completely. At this point I doubt that there's anything wrong on my end because with the same conditions CFS works absolutely fine in this regard.


I saw that you're also KDE Plasma user but on openSUSE distro. Did you try to reproduce it on your system?

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

What kind of other tweaks that are on fair.c scheduler

sched_latency_ns=(sysctl kernel.sched_latency_ns / 6) * 4
sched_min_granularity_ns=sched_latency_ns/8
sched_wakeup_granularity_ns * 2.5

For dual-core CPU without HT it's

sysctl kernel.sched_latency_ns=8000000
sysctl kernel.sched_min_granularity_ns=1000000
sysctl kernel.sched_wakeup_granularity_ns=5000000

Also

sysctl kernel.sched_nr_migrate=128
sysctl kernel.sched_rt_runtime_us=800000

I tried to revert all those parameters to default values, tried to increase "sched_latency_ns" and "sched_hrrn_max_lifetime_ms", tried to boot with "nothreadirqs - nothing solved this problem completely. At this point I doubt that there's anything wrong on my end because with the same conditions CFS works absolutely fine in this regard.

I saw that you're also KDE Plasma user but on openSUSE distro. Did you try to reproduce it on your system?

I am running youtube right now, and downloading (567M) update through discover without any interruption. Every thing is smooth. Usually I use zypper to update, both zypper and discover don't cause any freezes on my machine.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

What kind of other tweaks that are on fair.c scheduler

sched_latency_ns=(sysctl kernel.sched_latency_ns / 6) * 4
sched_min_granularity_ns=sched_latency_ns/8
sched_wakeup_granularity_ns * 2.5

For dual-core CPU without HT it's

sysctl kernel.sched_latency_ns=8000000
sysctl kernel.sched_min_granularity_ns=1000000
sysctl kernel.sched_wakeup_granularity_ns=5000000

Also

sysctl kernel.sched_nr_migrate=128
sysctl kernel.sched_rt_runtime_us=800000

I tried to revert all those parameters to default values, tried to increase "sched_latency_ns" and "sched_hrrn_max_lifetime_ms", tried to boot with "nothreadirqs - nothing solved this problem completely. At this point I doubt that there's anything wrong on my end because with the same conditions CFS works absolutely fine in this regard.

I saw that you're also KDE Plasma user but on openSUSE distro. Did you try to reproduce it on your system?

The mini freezes on Cachy is caused because there are tasks waited so long compared to other running tasks. Cachy will pick those waited tasks and favor them over other tasks (to enhance responsiveness) but sometime I/O tasks waited so long and there is no way to tell Cachy that those tasks are not interactive tasks. Some tasks wait and run only one time and then new threads created, so it leave no tracking option for Cachy. I wounder what causes the long delay for I/O (either HD or Network I am guessing) on your setup?

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

The mini freezes on Cachy is caused because there are tasks waited so long compared to other running tasks. Cachy will pick those waited tasks and favor them over other tasks.

Looks like on my machine newer tasks (Discover/packagekitd) are more "favored" than Chromium ones.
The only thing I could try is to rebuild kernel without other patches using config similar to yours (if you don't mind to upload it here).

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Sure
config-5.9.1-1-default.zip

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

Unfortunately on my system it's reproducible even with "generic" kernel configuration (HZ_250, PREEMPT_VOLUNTARY).

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

The only thing that I am guessing the cause of the issue is FAIR_GROUP. I am not using fair_group in my config, I disabled it. Can you please try with FAIR_GROUP disabled? If disabling FAIR_GROUP solved the interrupts then I think I have a bug in Cachy with FAIR_GROUP.

Thanks

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

I'm confused...
menuconfig/nconfig allows to disable
FAIR_GROUP_SCHED (General Setup > Control Group support > CPU Controller > Group scheduling for SCHED_OTHER)
only after
CONFIG_SCHED_AUTOGROUP (General Setup > Automatic process group scheduling)
is unselected
But your config has CONFIG_SCHED_AUTOGROUP disabled and FAIR_GROUP_SCHED enabled...

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

I'm confused...
menuconfig/nconfig allows to disable
FAIR_GROUP_SCHED (General Setup > Control Group support > CPU Controller > Group scheduling for SCHED_OTHER)
only after
CONFIG_SCHED_AUTOGROUP (General Setup > Automatic process group scheduling)
is unselected
But your config has CONFIG_SCHED_AUTOGROUP disabled and FAIR_GROUP_SCHED enabled...

Selecting Autogroup will automatically selects fairgroup, but not vice versa. autogroup needs fairgroup to work at its best. You can disable both autogroup and fairgroup, enable both, or disable autogroup and enable fairgroup, but you can't disable fair_group and keep autogroup enabled as I see in the kconfigs.

Please try with both disabled, I hope this will fix the issue so we know what caused the interrupts.

Thanks

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

The only thing that I am guessing the cause of the issue is FAIR_GROUP. I am not using fair_group in my config, I disabled it. Can you please try with FAIR_GROUP disabled? If disabling FAIR_GROUP solved the interrupts then I think I have a bug in Cachy with FAIR_GROUP.

Thanks

Well, usually I disable it. IDK how I didn't disable it with this build. Sorry about that.

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

So I tried with FAIR_GROUP_SCHED disabled. I guess it helps but doesn't eliminate the problem. Sound interrupts still happen. Not immediately like before (when I launched Discover for updates) but sometimes when CPU usage increases.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Could you please try this patch
cacule5.9-r9.zip

Please let me know if you need a patch on specific version to try.

I really hope that this problem is solved because the mini-freezes are existed since cachy-r1 where the problem disappeared from my machine but still existed on some others machine when they use Chromium browser or in some different cases. The root of the problem is HRRN. In the attached patch I replaced HRRN with different policy (idea and math equation are taken from FreeBSD ULE scheduler) - interactivity score.

Thank you

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Hi @hamadmarri @Alt37

It seems that CacULE scheduler works best for me after enabling full preemption and removing all the scheduler tweaks from zen-kernel, xanmod, etc.

Hi @raykzhao

Please let me know if CacULE has better response time with/without heavy load (whether mini-freezes, sound interrupts resolved). And also the overall performance compared to Cachy and CFS.

Thank you so much

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

Tried to rebuild Ubuntu's version "5.9.0-2" (based on 5.9.0) with latest patch of CacULE (2020-12-08). With patch it fails too boot (hangs right after GRUB), without CacULE - loads fine. Config has nothing special to my previous rebuilds (tickless kernel, 500HZ, RCU_BOOST_DELAY, BFQ by default)
config-5.9.0-2-cacule-2020-12-08.tar.gz

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Tried to rebuild Ubuntu's version "5.9.0-2" (based on 5.9.0) with latest patch of CacULE (2020-12-08). With patch it fails too boot (hangs right after GRUB), without CacULE - loads fine. Config has nothing special to my previous rebuilds (tickless kernel, 500HZ, RCU_BOOST_DELAY, BFQ by default)
config-5.9.0-2-cacule-2020-12-08.tar.gz

Hi @Alt37

Could you please try without CONFIG_SCHED_AUTOGROUP

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

Hi @hamadmarri

Could you please try without CONFIG_SCHED_AUTOGROUP

As expected, that didn't help. Boot still stuck at "Loading initial ramdisk" and I can't find any information in /var/log

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

It related to FAIR_GROUP

I got this error in qemu when enabled FAIR_GROUP

[    0.474757] BUG: kernel NULL pointer dereference, address: 0000000000000000
[    0.475304] #PF: supervisor read access in kernel mode
[    0.475466] #PF: error_code(0x0000) - not-present page
[    0.475721] PGD 0 P4D 0 
[    0.475916] Oops: 0000 [#1] SMP NOPTI
[    0.475916] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.9.12+ #1
[    0.475916] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a-rebuilt.opensuse.org 04/01/2014
[    0.475916] RIP: 0010:pick_next_entity.isra.0+0xc/0x80
[    0.475916] Code: d2 49 89 c0 48 89 c8 49 f7 f0 44 01 d8 44 29 d0 c1 f8 1f 83 e0 02 83 e8 01 c3 0f 1f 40 00 41 55 41 54 49 89 f4 55 48 89 fd 53 <48> 8b 1f e8 5c 12 f9 ff 49 89 c5 48 85 db 74 27 48 8b 4b 10 48 8b
[    0.475916] RSP: 0018:ffffc9000000bd18 EFLAGS: 00000046
[    0.475916] RAX: 0000000000000000 RBX: ffff88807dc29100 RCX: ffff88807d5400f8
[    0.475916] RDX: 0000000000000000 RSI: ffff88807d530080 RDI: 0000000000000000
[    0.475916] RBP: 0000000000000000 R08: 000000001c3806f8 R09: 0000000000000001
[    0.475916] R10: 0000000000000590 R11: 00000000000001df R12: ffff88807d530080
[    0.475916] R13: ffffc9000000bd88 R14: ffff88807dc29180 R15: ffff88807dc29180
[    0.475916] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[    0.475916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.032134] smpboot: CPU 1 Converting physical 0 to logical die 1
[    0.475916] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000006f0
[    0.475916] Call Trace:
[    0.475916]  pick_next_task_fair+0xb6/0x340
[    0.475916]  __schedule+0xf5/0x6d0
[    0.475916]  schedule+0x45/0xb0
[    0.475916]  native_cpu_up+0x346/0x620
[    0.475916]  ? cpuhp_kick_ap+0xd0/0xd0
[    0.475916]  bringup_cpu+0x26/0xc0
[    0.475916]  ? cpuhp_kick_ap+0xd0/0xd0
[    0.475916]  cpuhp_invoke_callback+0x95/0x510
[    0.475916]  _cpu_up+0xa0/0x130
[    0.475916]  cpu_up+0x6f/0x90
[    0.475916]  bringup_nonboot_cpus+0x43/0x50
[    0.475916]  smp_init+0x21/0x5f
[    0.475916]  kernel_init_freeable+0xb0/0x1ce
[    0.475916]  ? rest_init+0x95/0x95
[    0.475916]  kernel_init+0x5/0xfb
[    0.475916]  ret_from_fork+0x22/0x30
[    0.475916] Modules linked in:
[    0.475916] CR2: 0000000000000000
[    0.475916] ---[ end trace 7e5cca4425e9453b ]---
[    0.475916] RIP: 0010:pick_next_entity.isra.0+0xc/0x80
[    0.475916] Code: d2 49 89 c0 48 89 c8 49 f7 f0 44 01 d8 44 29 d0 c1 f8 1f 83 e0 02 83 e8 01 c3 0f 1f 40 00 41 55 41 54 49 89 f4 55 48 89 fd 53 <48> 8b 1f e8 5c 12 f9 ff 49 89 c5 48 85 db 74 27 48 8b 4b 10 48 8b
[    0.475916] RSP: 0018:ffffc9000000bd18 EFLAGS: 00000046
[    0.475916] RAX: 0000000000000000 RBX: ffff88807dc29100 RCX: ffff88807d5400f8
[    0.475916] RDX: 0000000000000000 RSI: ffff88807d530080 RDI: 0000000000000000
[    0.475916] RBP: 0000000000000000 R08: 000000001c3806f8 R09: 0000000000000001
[    0.475916] R10: 0000000000000590 R11: 00000000000001df R12: ffff88807d530080
[    0.475916] R13: ffffc9000000bd88 R14: ffff88807dc29180 R15: ffff88807dc29180
[    0.475916] FS:  0000000000000000(0000) GS:ffff88807dc00000(0000) knlGS:0000000000000000
[    0.475916] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.475916] CR2: 0000000000000000 CR3: 000000000240a000 CR4: 00000000000006f0
[    0.475916] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009
[    0.475916] ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000009 ]---
qemu-system-x86_64: terminating on signal 2

Sorry it is my bad, I will fix it soon.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Please check this fix: https://github.com/hamadmarri/cacule-cpu-scheduler/blob/ce293ce0d324e1b35ae534ae3281871ee1c19cfc/patches/CacULE/v5.9/cacule5.9.patch

from cacule-cpu-scheduler.

xalt7x avatar xalt7x commented on July 24, 2024

@hamadmarri
Fair comparison requires identical configs. Do I need to disable CONFIG_SCHED_AUTOGROUP and FAIR_GROUP_SCHED for both builds (with & without CacULE) ?

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

@hamadmarri
Fair comparison requires identical configs. Do I need to disable CONFIG_SCHED_AUTOGROUP and FAIR_GROUP_SCHED for both builds (with & without CacULE) ?

CacULE (after the above fix) would work with CONFIG_SCHED_AUTOGROUP and FAIR_GROUP_SCHED
However, I prefer disabling both since fair/auto groups are so specific to CFS to enhance the latency. Cachy/CacULE use policies that considers user interactivity/latency by the nature of HRRN or Interactivity score. Therefore, enabling fair/auto group in CacULE, will propably not provide any more responsiveness or interactivity, it will just add more overhead processing/updating fair groups data.

Based on my previous Cachy testing and examining (with this test: https://github.com/hamadmarri/os-scheduler-responsiveness-test), I didn't have any gain in responsiveness when enabling auto/fair group. So, I assume it is not needed.

For fair comparisons, I think it is good to compare the best of both i.e. CFS with fair/auto groups, CacULE with or without (whatever is best on your machine). Having said that, the other tuning stuff such are latency_ns and other variables related to load balancing should be the same.

from cacule-cpu-scheduler.

hf29h8sh321 avatar hf29h8sh321 commented on July 24, 2024

CacULE has some mini-freezes on my machine.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 24, 2024

@hamadmarri @hf29h8sh321

Unfortunately it seems that the mini-freeze is more noticable than Cachy scheduler when under heavy load on my machine e.g. compiling the kernel in background.

I think it's probably better to let the users/kernel maintainers select between Cachy/CacULE scheduler in kconfig, similar to how PDS/BMQ schedulers did, see #19.

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

Could you please try this patch on top of commit: 1dd9bff04302d65999dbfe3fed53c08ec957525b

Please disable FAIR_GROUP and AUTOGROUP since cachy/cacule don't need it.

On my tests on https://www.youtube.com/watch?v=LXb3EKWsInQ
with 1080p60 and compiling linux kernel with make -j6 (on a 4 CPUs machine)

CFS: many mini-freezes and sometimes the video freezes with audio running
CacULE: some mini-freezes but never pauses the video
CacULE with the attached patch: few mini-freezes (hard to spot) and never pauses the video
Cachy5.9: very similar to CacULE but few times pauses the video

Those kernels have exact .config of opensuse tumbleweed defaults, except fair_group is disabled.

Please let me know if you got any enhancement with this patch
smoother.zip

from cacule-cpu-scheduler.

hf29h8sh321 avatar hf29h8sh321 commented on July 24, 2024

CacULE with the smoother patch has audio interruptions under load, more than original cachy. I found that the now deleted rdb branch from the kernel tree has the best results, with only occasional stuttering under load.

from cacule-cpu-scheduler.

raykzhao avatar raykzhao commented on July 24, 2024

Hi @hamadmarri

CacULE with the smoother patch seems to be better than both the original Cachy scheduler and CacULE without the smoother patch under heavy load on my machine. Thank you for the great work!

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

patch2.zip
Could you please try this patch instead of smoother.

On top of commit: 1dd9bff04302d65999dbfe3fed53c08ec957525b

While I am working on making a global queue, I noticed starvation. It turned out that for every task migration, the task run time resets. IDK how I couldn't notice this until now.

Please let me know if it is better with this little patch.

Thank you

from cacule-cpu-scheduler.

hamadmarri avatar hamadmarri commented on July 24, 2024

I hope the last patch fixes the freezing issues to everyone. I have updated the patch in this commit: de32c14

Please reopen this issue if the problem is still exist.

Thank you

from cacule-cpu-scheduler.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.