Git Product home page Git Product logo

Comments (11)

solardiz avatar solardiz commented on June 13, 2024

We had (a different fork of) recent LKRG run on Ubuntu 22.04.1 with kernel 5.15.0-1019-aws #23-Ubuntu SMP Wed Aug 17 18:35:04 UTC 2022 aarch64 aarch64 in AWS instance type c6g.medium for 4+ months with no such issue showing up. However, that instance has only 1 vCPU, so perhaps the issue is a race condition showing up on multi-[v]CPU systems.

from lkrg.

Adam-pi3 avatar Adam-pi3 commented on June 13, 2024

However, parsing errors may have caused the hash of kernel.stext that needs to be updated but missed in p_arch_jump_label_transform_ret

What parsing error are you referring to? If something is incorrectly read, likely you see different memory layout than LKRG which may result in such type of the problems.

Additionally, LKRG synchronize with JUMP_LABEL using various locks which means it is impossible for integrity routine to not see the result of JUMP_LABEL work. It looks like you might hit some issue which is not root-cause and the patch is masking the real problem. Did you try to run under very verbose level to see what JUMP_LABEL really does?

from lkrg.

solardiz avatar solardiz commented on June 13, 2024

the patch is masking the real problem.

Sure, which I assume is @root-hardenedvault's understanding too, which is why he calls this a "workaround" and doesn't send us a PR with these changes right away. Ideally, we'd figure out the real problem and arrive at a proper fix.

from lkrg.

root-hardenedvault avatar root-hardenedvault commented on June 13, 2024

It appears that the issue is caused by a race condition. LKRG does not require any lock to be held when accessing p_db.p_jump_label.state. The panic consistently occurs during the process of updating the core text hash in arch_jump_label_transform_ret. We have also observed that p_db.p_jump_label.state is set to 1 (P_JUMP_LABEL_CORE_TEXT) when the integrity_timer calculates and compares the core text hash. It's likely that LKRG may update the core text hash while checking if it has been changed, which could lead to the race condition. Is there a mechanism in LKRG to avoid this situation? However, this cannot explain why the above patch works, since those updates would not be executed. Another scenario can trigger the panic (the similar kernel logs) is when the nftables work as a systemd service at boot time.

from lkrg.

Adam-pi3 avatar Adam-pi3 commented on June 13, 2024

Function arch_jump_label_transform is called under JUMP_LABEL lock. When LKRG intercept the call, it is also running under JUMP_LABEL lock and we do synchronize against it. Integrity verification routine won't run before acquiring this lock:
https://github.com/lkrg-org/lkrg/blob/main/src/modules/database/p_database.h#L192

If LKRG has this lock acquired, JUMP_LABEL engine won't modify .text section. I don't think it's a correct root-cause.

from lkrg.

accelbread avatar accelbread commented on June 13, 2024

I'm also seeing this issue, also on a Raspberry Pi 4. It occurs consistently, a few seconds after my system makes it to the login prompt.

from lkrg.

solardiz avatar solardiz commented on June 13, 2024

@root-hardenedvault @accelbread We think we've just fixed this issue with #294 here - can you please test and let us know? Thank you!

from lkrg.

accelbread avatar accelbread commented on June 13, 2024

I'll give it a test over the weekend, thanks!

from lkrg.

accelbread avatar accelbread commented on June 13, 2024

Unfortunately, this does not fix the issue for me :(

from lkrg.

Adam-pi3 avatar Adam-pi3 commented on June 13, 2024

@accelbread can you provide some details about the problem? What is the kernel version, How easy is to repro it? Can you recompile the LKRG with P_LKRG_JUMP_LABEL_STEXT_DEBUG, enable log_level=3 and show the logs?

btw. I heavily tested Ubuntu 23.10 under the kernel 6.5.0-1005-raspi and the issue is not there. If you have an opportunity to check the same OS/kernel it would be helpful

from lkrg.

accelbread avatar accelbread commented on June 13, 2024

I am on 6.1.57-hardened1 on NixOS. I have LKRG built into the kernel.

It is easy to reproduce. If I have default settings, a few seconds after boot, the device restarts. If I boot with "lkrg.kint_validate=1", the device does not restart a few seconds after boot, and runs fine.

I can recompile and retest later with debug and logs, and get back. Seems 6.5.9 kernel is available too now so will upgrade first.

I could also produce a minimal reproducing sd-card image if you'd like.

from lkrg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.