google / kafel Goto Github PK
View Code? Open in Web Editor NEWA language and library for specifying syscall filtering policies.
Home Page: http://google.github.io/kafel
License: Apache License 2.0
A language and library for specifying syscall filtering policies.
Home Page: http://google.github.io/kafel
License: Apache License 2.0
Currently I find myself needing to define a base policy for any executables I run:
//
// Kafel policy to allow nsjail a few syscalls to launch the executable.
//
POLICY NsJail {
ALLOW {
execve,
prctl,
prlimit64
}
}
Otherwise I am getting audit messages like this:
type=SECCOMP msg=audit(1714125050.685:56): auid=4294967295 uid=0 gid=0 ses=4294967295 subj=kernel pid=12366 comm="nsjail" exe="/usr/bin/nsjail" sig=31 arch=c000003e syscall=59 compat=0 ip=0x7f6dcd26b55b code=0x0AUID="unset" UID="root" GID="root" ARCH=x86_64 SYSCALL=execve
It seems to me that intuitively, nsjail should not apply the seccomp rules to itself, but only to the child, sandboxed process.
Some syscalls (such as connect
) take in a struct as one of their arguments. Is it possible to filter on a field within this struct?
So far, I've tried arg.field
, and arg->field
, but haven't had any luck.
I would like to do some ip address filtering with the connect
syscall, but it seems like I can't access struct members:
$ nsjail -Mo --chroot / --disable_clone_newnet --seccomp_string "KILL { connect (fd, sockaddr_in) { (sockaddr_in.sin_addr == 2927745225) } } DEFAULT LOG" -- $(which curl) -I https://google.com/
[I][2021-02-24T18:17:36+0000] Mode: STANDALONE_ONCE
[I][2021-02-24T18:17:36+0000] Jail parameters: hostname:'NSJAIL', chroot:'/', process:'/home/vagrant/.nix-profile/bin/curl', bind:[::]:0, max_conns_per_ip:0, time_limit:0, personality:0, daemonize:false, clone_newnet:false, clone_newuser:true, clone_newns:true, clone_newpid:true, clone_newipc:true, clone_newuts:true, clone_newcgroup:true, keep_caps:false, disable_no_new_privs:false, max_cpus:0
[I][2021-02-24T18:17:36+0000] Mount: '/' -> '/' flags:MS_RDONLY|MS_BIND|MS_REC|MS_PRIVATE type:'' options:'' dir:true
[I][2021-02-24T18:17:36+0000] Mount: '/proc' flags:MS_RDONLY type:'proc' options:'' dir:true
[I][2021-02-24T18:17:36+0000] Uid map: inside_uid:900 outside_uid:900 count:1 newuidmap:false
[I][2021-02-24T18:17:36+0000] Gid map: inside_gid:900 outside_gid:900 count:1 newgidmap:false
[W][2021-02-24T18:17:36+0000][19584] bool sandbox::preparePolicy(nsjconf_t*)():122 Could not compile policy: 1:48: unexpected char `.'
[F][2021-02-24T18:17:36+0000][19584] int main(int, char**)():327 Couldn't prepare sandboxing policy
Linux 4.14 introduced SECCOMP_RET_KILL_PROCESS
. Former SECCOMP_RET_KILL
is now know as SECCOMP_RET_KILL_THREAD
.
The difference is quite profound, actually. Suggestion:
Extend the language, allowing KILL THREAD
and KILL PROCESS
(as two tokens) in addition to KILL
;
KILL
/ DENY
kills process on newer kernels, kills the calling thread on older ones.
It looks like returning SECCOMP_RET_KILL_PROCESS (0x80000000)
on older kernels will be interpreted as SECCOMP_RET_KILL (0)
: unrecognized return values are treated as SECCOMP_RET_KILL
, high bit is ignored when computing which action has the highest priority.
I would like to ALLOW only one specific filename to be executed by an execve
and KILL all the others.
I tried to look inside the examples or documentation but it doesn't exist anything similar. I made a few attempts but they were unsuccessful.
The specific use case of this policy would be to use it with nsjail to kill all the execve
called inside the sandbox. The issue is that nsjail executes an execve
immediately after applying the seccomp-bpf filter so it crashes.
There is already an issue for this on the nsjail repo, but it hasn't been addressed since 2020.
It looks like bitwise OR is defined in some parts of the lexer, but expression.c|h do not reference it. It looks like it's not possible to OR values, is that correct, and is this an oversight?
Kafel generates an incorrect BFP program for the profile pasted at the end of this message (thanks to Enis Lavery for making the PoC). Note that the sched_setscheduler
syscall, 0x90
on AMD64, should be allowed. However, Kafel generates the following BPF when built from commit 32768d3:
$ ./tools/dump_policy_bpf/dump_policy_bpf ./kafel-bug/broken.kafel | grep -C4 0x90
46: if A < 0x96 goto 53
47: if A < 0x9d goto 52
48: A := arg 0 low
49: if A == 0 then 53 else 52
50: if A < 0x90 goto 52
51: if A >= 0x91 goto 52
52: ERRNO 0xd
53: ALLOW
54: if A < 0x8a goto 57
We end up on line 52 regardless of A's value on line 50, and clearly there's something fishy going on on lines 50 and 51.
I believe the root cause is that add_jump
accounts for tpos
being changed when floc
is resolved, but doesn't account for the possibility that this tpos
update can similarly invalidate fpos
. I don't think that this can happen more than once for each fpos
and tpos
. Assuming this is correct, then the fix is fairly simple; however, I'll defer to you on what you believe the correct fix is.
Example profile that triggers the bug:
POLICY A {
ALLOW {
clone{
clone_flags == 0 || clone_flags == 0 || clone_flags == 0 ||
clone_flags == 0 || clone_flags == 0 || clone_flags == 0
},
ioctl{
cmd == 0 || cmd == 0 || cmd == 0 || cmd == cmd & 1 ||
cmd & 1 == cmd & cmd || cmd == 0 || cmd == 0 || cmd == 0 ||
cmd == 0 || cmd == 0 || cmd == 0 || cmd == 0 || cmd == 0 ||
cmd == 0 || cmd == 0 || cmd == 0 || cmd == 0 || cmd == 0 ||
cmd == 0 || cmd == 0 || cmd == 0 || cmd == 0 || cmd == cmd & 1
|| cmd & 1 == cmd & 1
},
prctl{
option == 0
},
ptrace{
request == 0 || request == 0 || request == 0 || request == 0 ||
request == 0 || request == 0 || request == 0
},
getsockopt{
level == optname ||
level == optname ||
level == optname ||
level == optname
},
setsockopt{level == optname || level == optname || level == optname ||
level == optname || level == optname || level == optname ||
level == optname || level == optname || level == optname ||
level == optname || level == optname || level == optname ||
level == optname || level == optname || level == optname},
shutdown{how == 0 || how == 0 || how == 0},
setrlimit{resource == 0},
access,
arch_prctl,
bind,
chdir,
chroot,
clock_getres,
close,
connect,
epoll_create,
epoll_create1,
execve,
exit,
fdatasync,
flock,
fstatfs,
ftruncate,
getppid,
getrandom,
getresuid,
getrlimit,
gettid,
getuid,
inotify_add_watch,
kcmp,
mlock,
mmap,
mprotect,
newfstat,
newfstatat,
open,
openat,
prlimit64,
read,
readlink,
recvmsg,
restart_syscall,
rmdir,
rt_sigreturn,
sched_setaffinity,
sched_setscheduler,
sendmmsg,
set_robust_list,
setpriority,
setsid,
setuid,
shmget,
sigaltstack,
timerfd_create,
waitid,
write
}
}
USE A DEFAULT ERRNO(13)
ATT.
The latest version of seccomp now supports the SECCOMP_RET_LOG
action (https://lkml.org/lkml/2017/8/11/16). This is incredibly useful for debugging seccomp policies while in development. It would be great if kafel could support a LOG
action that corresponds to SECCOMP_RET_LOG.
Currently libkafel.so
takes 440KiB (x86_64
, release, stripped) which seems superfluous.
On top of that, it has 7122 relocations and relocation definitions themselves take 167KiB. This has runtime cost as well.
Proposal:
exploit redundancy in the data set to reduce the footprint;
get rid of relocations — instead of storing string pointers in tables use integral offsets in a string pool. The later is cumbersome to do manually, use code generation.
$ /tmp/k/dump_policy_bpf <SIGABRT.PC.7ffff6efc7ef.STACK.de2c48334.CODE.-6.ADDR.\(nil\).INSTR.mov____%r8d\,%eax.fuzz
Compile error
=================================================================
==11651==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x611000000280 at pc 0x00000046c162 bp 0x7fff6c8c2900 sp 0x7fff6c8c20b0
READ of size 257 at 0x611000000280 thread T0
#0 0x46c161 in printf_common(void*, char const*, __va_list_tag*) (/tmp/k/dump_policy_bpf+0x46c161)
#1 0x4d7713 in __asan::ReportGenericError(unsigned long, unsigned long, unsigned long, unsigned long, bool, unsigned long, unsigned int, bool) [clone .part.11] (/tmp/k/dump_policy_bpf+0x4d7713)
#2 0x46b9c7 in printf_common(void*, char const*, __va_list_tag*) (/tmp/k/dump_policy_bpf+0x46b9c7)
#3 0x46c9ba in __interceptor_vfprintf (/tmp/k/dump_policy_bpf+0x46c9ba)
#4 0x46ca72 in __interceptor_fprintf (/tmp/k/dump_policy_bpf+0x46ca72)
#5 0x50a895 in main /home/jagger/src/nsjail/kafel/tools/dump_policy_bpf/main.c:48:5
#6 0x7f7d1a4c53f0 in __libc_start_main /build/glibc-jxM2Ev/glibc-2.24/csu/../csu/libc-start.c:291
#7 0x419679 in _start (/tmp/k/dump_policy_bpf+0x419679)
0x611000000280 is located 0 bytes to the right of 256-byte region [0x611000000180,0x611000000280)
allocated by thread T0 here:
#0 0x4d2b38 in realloc (/tmp/k/dump_policy_bpf+0x4d2b38)
#1 0x50e83c in grow_errors_buffer /home/jagger/src/nsjail/kafel/context.c:154:18
#2 0x50e83c in append_error /home/jagger/src/nsjail/kafel/context.c:195
#3 0x526dfa in kafel_yyerror /home/jagger/src/nsjail/kafel/parser.y:406:5
#4 0x526dfa in kafel_yyparse /home/jagger/src/nsjail/kafel/parser.c:2083
#5 0x50c9ff in parse /home/jagger/src/nsjail/kafel/kafel.c:61:7
#6 0x50c9ff in kafel_compile /home/jagger/src/nsjail/kafel/kafel.c:101
#7 0x50a765 in main /home/jagger/src/nsjail/kafel/tools/dump_policy_bpf/main.c:42:12
#8 0x7f7d1a4c53f0 in __libc_start_main /build/glibc-jxM2Ev/glibc-2.24/csu/../csu/libc-start.c:291
SUMMARY: AddressSanitizer: heap-buffer-overflow (/tmp/k/dump_policy_bpf+0x46c161) in printf_common(void*, char const*, __va_list_tag*)
Shadow bytes around the buggy address:
0x0c227fff8000: fa fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
0x0c227fff8010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c227fff8020: 00 00 fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c227fff8040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c227fff8050:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff8090: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c227fff80a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==11651==ABORTING
Testcase attached
SIGABRT.PC.7ffff6efc7ef.STACK.de2c48334.CODE.-6.ADDR.(nil).INSTR.mov____%r8d,%eax.txt
Please consider adding a tag to the current HEAD. This will enable other projects to download kafel source as of this particular version, and it also enables a "Releases" page in Github where it's possible to download a ZIP or tarball of the source, e.g. as in https://github.com/abseil/abseil-cpp/releases. Thanks!
Nice to have: shared/common policy files that can be included in other kafel policies via an "#include"-style interface.
Hi I want to use kafel but don't want programs to be dependent on the library itself.
What I was thinking was to have a sort of skafel("kafel string", &prog)
(static kafel) and a simple tool that wraps kafel to dump errors and update prog
to contain the sock_filter, etc.
Issues and some solutions:
skafel
?
Common syscalls like send
, sendfile
, recv
, stat
, uname
, waitpid
, and few others are missing from syscalls/amd64_syscalls.c
Here's the list syscalls that are missing if you were to use the default docker seccomp profile using kafel.
chown32
fadvise64_64
fchown32
fcntl64
fstat64
fstatat64
fstatfs
fstatfs64
ftruncate64
getegid32
geteuid32
getgid32
getgroups32
getresgid32
getresuid32
getuid32
ipc
lchown32
_llseek
lstat
lstat64
mmap2
_newselect
recv
send
sendfile
sendfile64
setfsgid32
setfsuid32
setgid32
setgroups32
setregid32
setresgid32
setresuid32
setreuid32
setuid32
sigreturn
stat
stat64
statfs
statfs64
truncate64
ugetrlimit
uname
waitpid
As an aside, is there a reason these common syscalls are missing?
Tests fail if SIGSYS
dumps core. Quick and dirty fix:
--- a/kafel/test/runner/harness.c
+++ b/kafel/test/runner/harness.c
@@ -160,10 +160,10 @@ int test_policy_enforcment(test_func_t test_func, void* data,
TEST_FAIL("non-zero (%d) exit code", si.si_status);
}
}
- if (should_kill && (si.si_code != CLD_KILLED || si.si_status != SIGSYS)) {
+ if (should_kill && ((si.si_code != CLD_DUMPED && si.si_code != CLD_KILLED) || si.si_status != SIGSYS)) {
TEST_FAIL("should be killed by seccomp");
}
- if (si.si_code == CLD_KILLED) {
+ if (si.si_code == CLD_KILLED || si.si_code == CLD_DUMPED) {
if (si.si_status == SIGSYS) {
if (!should_kill) {
TEST_FAIL("should not be killed by seccomp");
man seccomp:
The arch field is not unique for all calling conventions. The x86-64 ABI and the x32 ABI both use AUDIT_ARCH_X86_64 as arch, and they run on the same processors. Instead, the mask __X32_SYSCALL_BIT is used on the system call number to tell the two ABIs apart.
This means that in order to create a seccomp-based blacklist for system calls performed through the x86-64 ABI, it is necessary to not only check that arch equals AUDIT_ARCH_X86_64, but also to explicitly reject all system calls that contain __X32_SYSCALL_BIT in nr.
Apparently, __X32_SYSCALL_BIT
is not checked. Meaning that if a policy is compiled for x86_64, blacklists certain syscalls but the default action is ALLOW
, a 32-bit caller will bypass the blacklist.
$ echo "DENY{SYSCALL[10]}DEFAULT ALLOW" | ./tools/dump_policy_bpf/dump_policy_bpf
BPF program with 7 instructions
0: A := architecture
1: if A != 0xc000003e goto 5
2: A := syscall number
3: if A < 0xa goto 6
4: if A >= 0xb goto 6
5: KILL
6: ALLOW
Hi team, I am able to build libkafel.a
on my x86 box, but failed when dump_bpf_policy
binary is trying to link with libkafel.a
Here is the error message:
cc -std=gnu11 -I../../include -Wall -Wextra -Werror -O2 -Wno-error=type-limits main.o disasm.o print.o ../../libkafel.a -o dump_policy_bpf
/home/linuxbrew/.linuxbrew/bin/ld: ../../libkafel.a(libkafel.o): in function `kafel_yyerror':
arm_syscalls.c:(.text+0x6463): undefined reference to `YYUSE'
collect2: error: ld returned 1 exit status
Any idea what's going on or how to debug this? I can provide more system info if this is environment related. Thanks for your help!
The latest stable version is over two years old and there’s a bunch of new commits in the master since then. Can you please tag a new release?
#define TCGETS 0x5401
#define PROT_NONE 0x0
#define PROT_READ 0x1
#define PROT_WRITE 0x2
#define PROT_EXEC 0x4
#define O_RDONLY 0x00000000
#define O_WRONLY 0x00000001
#define O_RDWR 0x00000002
#define O_CREAT 0x00000100
#define O_EXCL 0x00000200
#define O_NOCTTY 0x00000400
#define O_NONBLOCK 0x00004000
#define O_DIRECTORY 0x00200000
#define O_CLOEXEC 0x02000000
#define OPEN_MASK 0xFDDFBBFF
#define SAFE_WRONLY 0x00000301
#define SAFE_RDRW 0x00000302
#define SIGHUP 1
#define SIGINT 2
#define SIGQUIT 3
#define SIGILL 4
#define SIGTRAP 5
#define SIGABRT 6
#define SIGIOT 6
#define SIGBUS 7
#define SIGFPE 8
#define SIGKILL 9
#define SIGUSR1 10
#define SIGSEGV 11
#define SIGUSR2 12
#define SIGPIPE 13
#define SIGALRM 14
#define SIGTERM 15
#define SIGSTKFLT 16
#define SIGCHLD 17
#define SIGCONT 18
#define SIGSTOP 19
#define SIGTSTP 20
#define SIGTTIN 21
#define SIGTTOU 22
#define SIGURG 23
#define SIGXCPU 24
#define SIGXFSZ 25
#define SIGVTALRM 26
#define SIGPROF 27
#define SIGWINCH 28
#define SIGIO 29
#define SIGPOLL 29
POLICY AllowAllocations {
ALLOW {
brk,
mmap,
mprotect,
munmap
}
}
POLICY AllowBasicFsCalls {
ALLOW {
access,
close {
fd > 2
},
getcwd,
getdents,
ioctl {
fd <= 2
},
lseek,
newlstat,
newfstat,
newstat,
open {
(flags == O_RDONLY) ||
(flags & OPEN_MASK == O_RDONLY) ||
(flags & OPEN_MASK == SAFE_WRONLY) ||
(flags & OPEN_MASK == SAFE_RDRW)
},
read,
readlink,
write {
fd > 2
}
}
}
POLICY AllowBasicIO {
ALLOW {
ioctl {
cmd == TCGETS
},
read {
fd == 0
},
readv {
fd == 0
},
write {
fd == 1 || fd == 2
},
writev {
fd == 1 || fd == 2
}
}
}
POLICY AllowIPC {
ALLOW {
pipe,
pipe2
}
}
POLICY AllowMisc {
ALLOW {
arch_prctl,
exit_group,
futex,
getrlimit,
set_robust_list,
set_tid_address,
sysinfo
}
}
POLICY AllowSignals {
ALLOW {
rt_sigaction,
rt_sigprocmask
}
}
POLICY Main {
ALLOW {
execve
},
USE AllowAllocations,
USE AllowBasicFsCalls,
USE AllowBasicIO,
USE AllowMisc,
USE AllowSignals
}
USE Main DEFAULT KILL
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.