Git Product home page Git Product logo

go-sandbox's Introduction

go-sandbox

GoDoc Go Report Card Release

Original goal was to replica uoj-judger/run_program in GO language using libseccomp. As technology grows, it also implements new technologies including Linux namespace and cgroup.

The idea of rootfs and interval CPU usage checking comes from syzoj/judge-v3 and the pooled pre-forked container comes from vijos/jd4.

If you are looking for sandbox implementation via REST / gRPC API, please check go-judge.

Notice: Only works on Linux since ptrace, unshare, cgroup are available only on Linux

Build & Install

  • install latest go compiler from golang/download
  • install libseccomp library: (for Ubuntu) apt install libseccomp-dev
  • build & install: go install github.com/criyle/go-sandbox/...

Technologies

libseccomp + ptrace (improved UOJ sandbox)

  1. Restricted computing resource by POSIX rlimit: Time & Memory (Stack) & Output
  2. Restricted syscall access (by libseccomp & ptrace)
  3. Restricted file access (read & write & access & exec). Evaluated by UOJ FileSet

Improvements:

  1. Precise resource limits (s -> ms, mb -> kb)
  2. More architectures (arm32, arm64)
  3. Allow multiple traced programs in different threads
  4. Allow pipes as input / output files

Default file access syscall check:

  • check file read / write: open, openat
  • check file read: readlink, readlinkat
  • check file write: unlink, unlinkat, chmod, rename
  • check file access: stat, lstat, access, faccessat
  • check file exec: execve, execveat

linux namespace + cgroup

  1. Unshare & bind mount rootfs based on hostfs (eliminated ptrace)
  2. Use Linux Control Groups to limit & acct CPU & memory (eliminated wait4.rusage)
  3. Container tech with execveat memfd, sethostname, setdomainname

Design

Result Status

  • Normal (no error)
  • Program Error
    • Resource Limit Exceeded
      • Time
      • Memory
      • Output
    • Unauthorized Access
      • Disallowed Syscall
    • Runtime Error
      • Signalled
        • SIGXCPU / SIGKILL are treated as TimeLimitExceeded by rlimit or caller kill
        • SIGXFSZ is treated as OutputLimitExceeded by rlimit
        • SIGSYS is treaded as Disallowed Syscall by seccomp
        • Potential Runtime error are: SIGSEGV (segment fault)
      • Nonzero Exit Status
  • Program Runner Error

Result Structure

type Result struct {
    Status            // result status
    ExitStatus int    // exit status (signal number if signalled)
    Error      string // potential detailed error message (for program runner error)

    Time   time.Duration // used user CPU time  (underlying type int64 in ns)
    Memory Size          // used user memory    (underlying type uint64 in bytes)
    // metrics for the program runner
    SetUpTime   time.Duration
    RunningTime time.Duration
}

Runner Interface

Configured runner to run the program. Context is used to cancel (control time limit exceeded event; should not be nil).

type Runner interface {
    Run(context.Context) <-chan runner.Result
}

Pre-forked Container Protocol

  1. Pre-fork container to run programs inside
  2. Unix socket to pass fd inside / outside

Container / Host Communication Protocol (single thread):

  • ping (alive check):
    • reply: pong
  • conf (set configuration):
    • reply pong
  • open (open files in given mode inside container):
    • send: []OpenCmd
    • reply: "success", file fds / "error"
  • delete (unlink file / rmdir dir inside container):
    • send: path
    • reply: "finished" / "error"
  • reset (clean up container for later use (clear workdir / tmp)):
    • send:
    • reply: "success"
  • execve: (execute file inside container):
    • send: argv, env, rLimits, fds
    • reply:
      • success: "success", pid
      • failed: "failed"
    • send (success): "init_finished" (as cmd)
      • reply: "finished" / send: "kill" (as cmd)
      • send: "kill" (as cmd) / reply: "finished"
    • reply:

Any socket related error will cause the container exit (with all process inside container)

Pre-forked Container Environment

Container restricted environment is accessed though RPC interface defined by above protocol

Provides:

  • File access
    • Open: create / access files
    • Delete: remove file
  • Management
    • Ping: alive check
    • Reset: remove temporary files
    • Destroy: destroy the container environment
  • Run program
    • Execve: execute program with given parameters
type Environment interface {
    Ping() error
    Open([]OpenCmd) ([]*os.File, error)
    Delete(p string) error
    Reset() error
    Execve(context.Context, ExecveParam) <-chan runner.Result
    Destroy() error
}

Packages (/pkg)

  • seccomp: provides seccomp type definition
    • libseccomp: provides utility function that wrappers libseccomp
  • forkexec: fork-exec provides mount, unshare, ptrace, seccomp, capset before exec
  • memfd: read regular file and creates a sealed memfd for its contents
  • unixsocket: send / recv oob msg from a unix socket
  • cgroup: creates cgroup directories and collects resource usage / limits
  • mount: provides utility function that wrappers mount syscall
  • rlimit: provides utility function that defines rlimit syscall
  • pipe: provides wrapper to collect all written content through pipe

Packages

  • cmd/runprog/config: defines arch & language specified trace condition for ptrace runner from UOJ
  • container: creates pre-forked container to run programs inside
  • runner: interface to run program
    • ptrace: wrapper to call forkexec and ptracer
      • filehandler: an example implementation of UOJ file set
    • unshare: wrapper to call forkexec and unshared namespaces
  • ptracer: ptrace tracer and provides syscall trap filter context

Executable

  • runprog: safely run program by unshare / ptrace / pre-forked containers

Configurations

  • config/config.go: all configs toward running specs (similar to UOJ)

Kernel Versions

  • 5.19: memory.peak in cgroup v2
  • 4.15: cgroup v2
  • 4.14: SECCOMP_RET_KILL_PROCESS
  • 4.6: CLONE_NEWCGROUP
  • 3.19: execveat()
  • 3.17: seccomp, memfd_create
  • 3.10: CentOS 7
  • 3.8: CLONE_NEWUSER without CAP_SYS_ADMIN, CAP_SETUID, CAP_SETGID
  • 3.5: prctl(PR_SET_NO_NEW_PRIVS)
  • 2.6.36: prlimit64

Benchmarks

ForkExec

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/forkexec
BenchmarkSimpleFork-4              	   12409	    996096 ns/op
BenchmarkUnsharePid-4              	   10000	   1065168 ns/op
BenchmarkUnshareUser-4             	   10000	   1061770 ns/op
BenchmarkUnshareUts-4              	   10000	   1056558 ns/op
BenchmarkUnshareCgroup-4           	   10000	   1049446 ns/op
BenchmarkUnshareIpc-4              	     709	  16114052 ns/op
BenchmarkUnshareMount-4            	     745	  16207754 ns/op
BenchmarkUnshareNet-4              	    3643	   3492924 ns/op
BenchmarkFastUnshareMountPivot-4   	     612	  20967318 ns/op
BenchmarkUnshareAll-4              	     837	  14047995 ns/op
BenchmarkUnshareMountPivot-4       	     488	  24198331 ns/op
PASS
ok  	github.com/criyle/go-sandbox/pkg/forkexec	147.186s

Container

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/container
BenchmarkContainer-4   	    5907	   2062070 ns/op
PASS
ok  	github.com/criyle/go-sandbox/container	21.763s

Cgroup

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/cgroup
BenchmarkCgroup-4   	   50283	    245094 ns/op
PASS
ok  	github.com/criyle/go-sandbox/pkg/cgroup	14.744s

Socket

Blocking:

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/unixsocket
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkBaseline-8             12170148              1048 ns/op
BenchmarkGoroutine-8             2658846              4910 ns/op
BenchmarkChannel-8               8454133              1431 ns/op
BenchmarkChannelBuffed-8         8767264              1357 ns/op
BenchmarkChannelBuffed4-8        9670935              1230 ns/op
BenchmarkEmptyGoroutine-8       34927512               342.8 ns/op
PASS
ok      github.com/criyle/go-sandbox/pkg/unixsocket     83.669s

Non-block:

$ go test -bench . -benchtime 10s
goos: linux
goarch: amd64
pkg: github.com/criyle/go-sandbox/pkg/unixsocket
cpu: Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
BenchmarkBaseline-8             11609772              1001 ns/op
BenchmarkGoroutine-8             2470767              4788 ns/op
BenchmarkChannel-8               8488646              1427 ns/op
BenchmarkChannelBuffed-8         8876050              1345 ns/op
BenchmarkChannelBuffed4-8        9813187              1212 ns/op
BenchmarkEmptyGoroutine-8       34852828               342.2 ns/op
PASS
ok      github.com/criyle/go-sandbox/pkg/unixsocket     81.679s

go-sandbox's People

Contributors

alphanecron avatar criyle avatar dependabot[bot] avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-sandbox's Issues

RunnerFailed(no such process)

Hello!
First of all, you have done a great job! The project is awesome.
To tell the truth, there are a lot difficult details in the code. I guess a strong linux knowlange is required %)

Some time an error happens:

results: Result[RunnerFailed(no such process)][2.937ms 2.8 MiB][2.893533ms 3.997806ms] <nil>
setupTime:  2.893533ms
runningTime:  3.997806ms
Runner Error

The problem is 'floating' and restart, usually, helps. It doensn't depend on programm or input parameters

May be you have some thoughts about this?

请问我该如何阅读这些代码?

我想学习一下你的沙盒实现方法,但是这么多文件完全不知道该从哪里开始理解学习。你能稍微指点一下,可以吗?拜托了

Deadlock with RawSyscall in syncWithChild()

I found a deadlock when hammering the forkexec-runner with a bunch of executions.

System: Ubuntu Jammy amd64

Steps to reproduce:

  • Compile this source: https://gist.github.com/nename0/1d77883609ea26f4f52b35e848a3665e
  • Run it with GOMAXPROCS=4 ./deadlock (with permissions to create a cgroup e.g. root)
  • If the above exits normally, just try again. It might take ~10 tries.
  • If the execution is stuck, use Ctrl+z to get the parent pid
  • Then use gdb -p <pid> to attach
  • And run thread apply all bt in gdb
    You will see that four threads are stuck at RawSyscall() called from syncWithChild() in Line 117.
    This effectively deadlocks the go scheduler.

If you dig a bit you will also see a few forked child pre-exec (all child after exec should be zombies cause of the rlimit).
If you gdb into them they are stuck at forkAndExecInChild1() in Line 427.

I'm not really sure whats causes this, why those reads from the socketpair block.
You can fix the symptoms by replacing the RawSyscall with Syscall like I did in 73c169b.
However im not sure if this fixes the cause or just the symptoms.
Especially because sometimes you get a runner unknown: broken pipe output from the execution, even with the above fix.

Problems with runner=ptrace in docker

Hello! I'am still researching your great codebase.
I have some problems to run sandbox in docker.
For example with this Dockerfile

FROM golang:1.19
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build ./cmd/runprog
CMD ./runprog -runner ptrace -show-trace-details ls

The output is

rlimit:  RLimits[CPU[1 s:3 s],Data[256.0 MiB:256.0 MiB],File[64.0 MiB:64.0 MiB],Stack[256.0 MiB:256.0 MiB],OpenFile[256:256],Core[0 B:0 B]]
tracer started:  13 <nil>
------  13  ------ 13
process exited:  13 38
results: Result[RunnerFailed(child process exit before execve)][329µs 1.7 MiB][57.369µs 565.723µs] <nil>
setupTime:  57.369µs
runningTime:  565.723µs
Runner Error
7 0 1732 0

But, If I changed the runner type to container, the output is

rlimit:  RLimits[CPU[1 s:3 s],Data[256.0 MiB:256.0 MiB],File[64.0 MiB:64.0 MiB],Stack[256.0 MiB:256.0 MiB],OpenFile[256:256],Core[0 B:0 B]]
/usr/bin/ls: error while loading shared libraries: libselinux.so.1: cannot stat shared object: Error 38
results: Result[Nonzero Exit Status( 127)][855µs 1.6 MiB][507.993µs 8.342936ms] <nil>
2 1 1632 127
setupTime:  507.993µs
runningTime:  8.342936ms
Nonzero Exit Status

On my local Linux 5.19.0-32-generic #33~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Jan 30 17:03:34 UTC 2 x86_64 x86_64 x86_64 GNU/Linux the both container and ptrace run successfull. It is look like that ptrace in docker container can't start subprcess.
The command to run dockerfile: docker build -t app . && docker run --privileged --rm app

May you have some thoughts about that problem?
And could you provide a small description, what type of executor is faster/safer ? If I understand right, with container executor we can't control file access, but ptrace executor, I guess, i slower

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.