gramineproject / gramine Goto Github PK

A library OS for Linux multi-process applications, with Intel SGX support

License: GNU Lesser General Public License v3.0

Shell 0.83% Dockerfile 0.30% Emacs Lisp 0.01% Makefile 1.06% Meson 1.83% C 87.35% Assembly 1.55% Python 6.63% C++ 0.28% GDB 0.13% Rust 0.02% HTML 0.01%

gramine's Introduction

Gramine Library OS with Intel SGX Support

A Linux-compatible Library OS for Multi-Process Applications

What is Gramine?

Gramine (formerly called Graphene) is a lightweight library OS, designed to run a single application with minimal host requirements. Gramine can run applications in an isolated environment with benefits comparable to running a complete OS in a virtual machine -- including guest customization, ease of porting to different OSes, and process migration.

Gramine supports native, unmodified Linux binaries on any platform. Currently, Gramine runs on Linux and Intel SGX enclaves on Linux platforms.

In untrusted cloud and edge deployments, there is a strong desire to shield the whole application from rest of the infrastructure. Gramine supports this “lift and shift” paradigm for bringing unmodified applications into Confidential Computing with Intel SGX. Gramine can protect applications from a malicious system stack with minimal porting effort.

Gramine is a growing project and we have a growing contributor and maintainer community. The code and overall direction of the project are determined by a diverse group of contributors, from universities, small and large companies, as well as individuals. Our goal is to continue this growth in both contributions and community adoption.

Note that the Gramine project was formerly known as Graphene. However, the name "Graphene" was deemed too common, could be impossible to trademark, and collided with several other software projects. Thus, a new name "Gramine" was chosen.

Gramine documentation

The official Gramine documentation can be found at https://gramine.readthedocs.io. Below are quick links to some of the most important pages:

Users of Gramine

We maintain a list of companies experimenting with Gramine for their confidential computing solutions.

Getting help

For any questions, please use GitHub Discussions or join us on our Gitter chat.

For bug reports and feature requests, post an issue on our GitHub repository.

If you prefer emails, please send them to [email protected] (public archive).

Reporting security issues

Please report security issues to [email protected]. See also our security policy.

gramine's People

Contributors

Stargazers

Watchers

Forkers

vijaydhanraj veenasai2 sahason libinliu0189 boryspoplawski jkr0103 svenkata9 aneessahib duanbing nirusu rodgerzhu mkow gabrielus xzhangxa stefanberger cunyang rowalay laplacekorea guolsnetgap sbellem anna1532 mkbhanda igor-davidyuk mythi liangintel agentrx minskeyguo fnerdman tigerly optimistyzy wdsun1008 surfndez teotro acamtech kailun-qin masdevas phala-network omycow dongx1x satya1493 ying2liu jinengandhi-intel ben2077 g302ge themoonshotfactory meithecatte michael-m-zhang villain88 oshogbo aep llly analytics-zoo grydz jerryrhyu rodrigoieh bigdata-memory cyberflamego guzongmin sanvol xinyao1994 lejunzhu aniket-intelx yanzhichao anjalirai-intel spacecase123 lzha101 cashmaney horovitzdan cchen113 yao-ji fengjixuchui phoenix597 thetrident cpucorecore jiazhang0 linuxsecuritymodules man9ourah yeonbangsong dl8 deb-intel pbnather monavij bronzeme manju956 tejaswineel filiphagan pinkdiamond1 nvbodong woju hagipoker techfitmaster glassofwhiskey jinghe-intc heisaa randoruf sammyne sungjungk seccask ashahba ruide

gramine's Issues

Move to C11 atomics

Description of the problem

We should migrate away from the mixture of GCC built-ins and some legacy custom atomics to C11 atomics.

For reasons why not to stay with GCC built-ins please see the discussion under gramineproject/graphene#1593.

pwrite03, write02 LTP tests passing with graphene direct but failing with GSGX

Description of the problem

With the Fix filesystem corner cases commit 0012183743c3a9b4ea4b85a97e7859da4836b1ca, the tests are passing with Graphene native but fail with GSGX. Originally these tests had started failing with the inode PR and the above commit was the fix for the tests.

Graphene direct output:

error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
/home/intel/graphene/LibOS/shim/test/ltp/src/lib/tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
/home/intel/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/pwrite/pwrite03.c:25: TPASS: pwrite(fd, NULL, 0) == 0

Summary:
passed   0
failed   0
skipped  0
warnings 0

GSGX output:

error: Using insecure argv source. Graphene will continue application execution, but this configuration must not be used in production!
/home/intel/graphene/LibOS/shim/test/ltp/src/lib/tst_test.c:1248: TINFO: Timeout per run is 0h 05m 00s
/home/intel/graphene/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/pwrite/pwrite03.c:20: TFAIL: pwrite() should have succeeded with ret=0: EACCES (13)

Summary:
passed   0
failed   0
skipped  0
warnings 0

Steps to reproduce

In order to run the LTP tests with GSGX, you need to make a change to the src/lib/tst_test.c file and change the flag from MAP_SHARED to MAP_PRIVATE at this line: https://github.com/linux-test-project/ltp/blob/da2f34028f046a208aa2fed5e287df2538e69f91/lib/tst_test.c#L108

Rebuild LTP with SGX=1 flag.

cd install/testcases/bin/

graphene-sgx pwrite03

Expected results

Actual results

RFC: Reorganize the repository

Currently this repository has a bit messy layout. Proposed changes:

Move .h files into "include" directories (now there are symlinks pointing to src/.h)
Move shared code (between Pal and LibOS) out of the Pal directory
...
Please suggest more

[LibOS] dentry cache for filesystem can cause conflict between stat() result and host file status

Description of the problem

shim_do_stat(), shim_do_lstat() etc. check dentry cache for filesystem first. If the file is already looked up and exists in dentry cache, these function return the cached data. However the cache will be out of sync after other programs modify it. Then program in Graphene can get conflict result when calling open() to access real file.

Steps to reproduce

A scenario that a program stat() a file, exec rm program of OS to delete it, stat() the file again to conform that rm works.

Expected results

The second stat() return ENOENT.

Actual results

The second stat() return 0 and cached stat.

Additional information

Here is the related log of the scenario.

[P27950:T2:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x9293f03f0) = -2
[P27950:T2:java] ---- shim_mkdir("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",511) = 0
[P27950:T2:java] ---- shim_lstat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x9293ee340) = 0
20/09/24 10:30:00 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x888990410) = 0
[P28139:T91:java] ---- shim_execve("/bin/rm",[rm,-rf,/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79,],[LD_LIBRARY_PATH=/lib:/lib/x86_64-linux-gnu:/usr//lib/x86_64-linux-gnu:/opt/jdk:/opt/jdk/lib/jli,PATH=/opt/jdk/bin:/usr/sbin:/usr/bin:/sbin:/bin,]..
[P28139:T91:rm] ---- shim_newfstatat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x8e91dd648,256) = 0
[P28139:T91:rm] ---- shim_openat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",O_RDONLY|604400,0000) = 3
[P28139:T91:rm] ---- shim_openat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",O_RDONLY|2604400,0000) = 3
[P28139:T91:rm] ---- shim_unlinkat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",512) = 0
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x888990250) = 0
20/09/24 10:33:27 WARN JavaUtils: Attempt to delete using native Unix OS command failed for path = /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79. Falling back to Java IO way
java.io.IOException: Failed to delete: /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x888990220) = 0
[P27950:T90:java] ---- shim_lstat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x88898e0c0) = 0
[P27950:T90:java] ---- shim_stat("/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",0x8889901d0) = 0
[P27950:T90:java] ---- shim_openat(AT_FDCWD,"/tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79",O_RDONLY|2204000,0000) = -2
20/09/24 10:33:27 ERROR DiskBlockManager: Exception while deleting local spark dir: /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79
java.io.IOException: Failed to list files for dir: /tmp/blockmgr-87d441cc-2080-455c-837f-d4337e45ba79

Integration with sanitizers

Description of the problem

It would be good to enable some sanitizers in CI (e.g. AddressSanitizer from Clang) to allow for better and more reliable detection of low-level bugs.

Current plan:

Clang support (gramineproject/graphene#1794; done, but not tested in CI yet).
Enable Clang's UBSan in CI (gramineproject/graphene#2657, gramineproject/graphene#2653).
Enable Clang's ASan in CI for heap allocations (#32).
Enable Clang's ASan in CI for global variables (#251).
Enable Clang's ASan in CI for stack allocations
- Enable stack sanitization (#240)
- ~~(if feasible) Detect also use-after-return~~ - skipped, see comment

The tricky part is that we're running without stdlib and manage memory on our own (especially on SGX), so most likely we'll need some hacks to get ASan working.

Previous work: @yamahata and @stefanberger ran GCC's UBSan (see gramineproject/graphene#724, gramineproject/graphene#871, gramineproject/graphene#2089, gramineproject/graphene#2094, gramineproject/graphene#2097, gramineproject/graphene#2101, gramineproject/graphene#2102, gramineproject/graphene#2103). It's much less powerful than ASan, but still something! I think we can try to add it to CI as a first step, to have at least something.

Progress bar for large enclave or app

Description of the problem

When launching a large enclave with graphene, e.g, 32-128GB, we need to wait for a few minutes for enclave creation and app/lib copy etc. However, current command line output show nothing to end users. We don't know which stage we are and how much time we need to wait for. That's not user friendly. :(

Adding a progress bar can simply fix this problem.

Steps to reproduce

Changing enclave size to 128GB in tensorflow example.

Expected results

Preparing Graphene Env 
[######         20%]  Creating enclave
Time used XXs

Netlink socket family support

Description of the problem

It is a Linux kernel interface used for inter-process communication (IPC) between both the kernel and userspace processes, and between different userspace processes, in a way similar to the Unix domain sockets. There are bunch of applications leverage ZeroMQ protocol that relies on netlink for high-performance asynchronous messaging. GSGX doesn't support it yet but It worth to adding this feature to GSGX.

[LibOS] Emulate mlock/mlock2/munlock/etc family of syscalls

Implement mlock, mlock2, munlock, mlockall, munlockall (lock and unlock memory). These can be stubbed easily by always returning success (Graphene cannot guarantee that the host OS will perform lock/unlock anyway, and a malicious OS will swap pages anyway).

This will allow to re-enable more LTP tests such as mincore02.

Regression with Inode, chroot rewrite commit 74420be

Description of the problem

Internal CI flagged the following tests as failures with the Inode, chroot FS rewrite feature that was merged recently.
The following tests are regressions with the commit: ftruncate03, ftruncate03_64, pwrite02, pwrite02_64, pwrite03, pwrite03_64, write02

These tests are also run in the open source CI but were missed may be due to the parser not checking for TFAIL in the tests.

Console link for the commit in open source CI:

https://localhost:8080/job/graphene-18.04/6305/testReport/apps/LTP/test_direct___pwrite02/
https://localhost:8080/job/graphene-18.04/6305/testReport/apps/LTP/test_direct___pwrite03/
https://localhost:8080/job/graphene-18.04/6305/testReport/apps/LTP/test_direct___ftruncate03/

Console link from an earlier commit in opensource CI where the tests had passed:

https://localhost:8080/job/graphene-18.04/6250/testReport/apps/LTP/test_direct___pwrite02/
https://localhost:8080/job/graphene-18.04/6250/testReport/apps/LTP/test_direct___pwrite03/
https://localhost:8080/job/graphene-18.04/6250/testReport/apps/LTP/test_direct___ftruncate03/

Sample output for pwrite03 where the test passed:

/home/jenkins/workspace/graphene-18.04/LibOS/shim/test/ltp/src/lib/tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s
/home/jenkins/workspace/graphene-18.04/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/pwrite/pwrite03.c:25: TPASS: pwrite(fd, NULL, 0) == 0

Summary:
passed   0
failed   0
skipped  0
warnings 0

Sample output for pwrite03 where the test failed:

/home/jenkins/workspace/graphene-18.04/LibOS/shim/test/ltp/src/lib/tst_test.c:1250: TINFO: Timeout per run is 0h 05m 00s
/home/jenkins/workspace/graphene-18.04/LibOS/shim/test/ltp/src/testcases/kernel/syscalls/pwrite/pwrite03.c:21: TFAIL: pwrite() should have succeeded with ret=0: EINVAL (22)

Summary:
passed   0
failed   0
skipped  0
warnings 0

Steps to reproduce

Run the above LTP tests locally with graphene-direct

Expected results

All subtests should be TPASS.

Actual results

At least 1 subtest has a TFAIL.

Deprecate IAS API v3 and `ias_request` tool

IAS API v3 (which is currently used by attestation utilities in /Pal/**/Linux-SGX/tools/*) is undocumented as of this writing. The documentation describes only v4 API: https://api.trustedservices.intel.com/documents/sgx-attestation-api-spec.pdf.

The ias_request tool needs not to be written in C, I wrote graphene-ias-query in Python as a wrapper of a simple library (graphenelibos.ias). When graphene-ias-query reaches functional parity with ias_request, the latter might be removed.

Check for TOCTOU bugs through compiler optimization

We sometimes have code which does roughly this:

int val = *uvar; // uvar is a pointer to untrusted memory
if (val > MAX_VAL) { // some check
    fail();
}
do_something(val); // now use the checked value

The idea here to first copy the value to trusted memory and then check and use the checked value. If this is done in this order this is TOCTOU safe. But we need to prevent the compiler from messing with this (for example by using the untrusted memory directly).

Changing the copy to a memcpy is (AFAIK) not enough since the compiler is allowed to optimize the memcpy away since it's part of the C standard (It's for example quite common to replace it with a mov for small fixed sized).

Enable configurable signing algorithm in RA-TLS (currently forced to use RSA)

This is a feature request to support ECDSA in addition to RSA as the choice of signing algorithm used by RA-TLS. From https://github.com/oscarlab/graphene/blob/master/Pal/src/host/Linux-SGX/tools/ra-tls/ra_tls.h, it appears that currently the certs are forced to be RSA based certs. It would be great if ECDSA based RA-TLS certs can be supported.

I would further request to permit the ECDSA curve choice as a configuration parameter.

In our application, we use components such as the Microsoft Confidential Consortium Framework (CCF) blockchain, which currently only supports ECDSA based client certs for mTLS.

Thanks
Prakash

Description of the problem

Steps to reproduce

Expected results

Actual results

Stress-ng open syscall test failing with assert fd < handle_map->fd_size on RHEL and CentOS

Description of the problem

On RHEL and CentOS, stress-ng open syscall test is failing with the following errors:

[P1:shim] error: Child process (vmid: 0x84) got disconnected [P124:T124:stress-ng] assert failed ../LibOS/shim/src/bookkeep/shim_handle.c:419 fd < handle_map->fd_size

Complete log file is attached below:
stress-ng_open_rhel.txt

Steps to reproduce

graphene-direct stress-ng --open 0 --timeout 60s --verbose

Expected results

Actual results

[Pal/Linux-SGX] Protected Files do not work with Protected argv/env

This design issue was detected while working on gramineproject/graphene#1674.

We provision the secret (the master key to encrypt/decrypt protected files) on top of RA-TLS which runs on top of LibOS which is executed only after PAL bootstrap code. However, reading protected-argv and protected-envp file is implemented in the PAL bootstrap code.

This results in a situation when there is no key to decrypt the file with protected argv/env.

We could introduce additional Dk* APIs to read these files and rewrite argv and envp write before calling main. Even this is probably complex, because this is the job of the Glibc loader. Moreover, it's technically wrong, because dynamic libraries have to start with the envs set up securely.

The only thing I can come up with right now is using the LD_PRELOAD trick further in the Secret Provisioning library, like this: https://github.com/thomasknauth/sgx-ra-tls/blob/9d457d35dba6910fde6eb0483acb0582997a7bd4/apps/redis-secrect-provisioning-example/redis-server-provision-secret.c#L128. But this is an ugly hack.

Rewrite signal handling in SGX

The same design change is made for Linux (gramineproject/graphene#281). The SGX PAL is left for future work but should be addressed in near future.

Mountpoint order is not preserved in current manifest syntax

Description of the problem

Currently we don't use proper TOML syntax for declaring mountpoints, instead, we use a syntax which resembles the pre-TOML one used in Graphene. As a result, the entries are not ordered, but Graphene actually relies on the specific mounting order (e.g. you can't mount /lib/asdf first and then /lib, but the other way around works). The problem is, that TOML structure is just a dictionary, so the order of keys is not preserved.

This will be solved by finally migrating to a proper TOML syntax for everything, which will be eventually done (after this, mount points list should be a TOML array). Cleaning up mounting code should also help here - I think that in Graphene mount order shouldn't matter, we only need to ensure that there are no duplicated mounts.

Steps to reproduce

Swap the order of entries in e.g. manifest.template in LibOS regression, so that "$(ARCH_LIBDIR)" is listed before "/lib".

Expected results

Everything still works.

Actual results

[P6098:T1:bootstrap_cpp] Mount /lib already exists, verify that there are no duplicate mounts in manifest
[P6098:T1:bootstrap_cpp] (note that /proc and /dev are automatically mounted in Graphene).
[P6098:T1:bootstrap_cpp] Mounting file:../../../../Runtime on /lib (type=chroot) failed (17)
[P6098:T1:bootstrap_cpp] Error during shim_init() in init_mount (-17)

Additional information

gramineproject/graphene#2210 contains an ugly workaround, but we need a proper solution.

Security mitigations and hardening

Ideas for security mitigations and "bug prevention"

Mitigations/sanitizations

We need better sanitization of OCALL arguments. Current version most likely is not dangerous, but there's a chance that some apps may be manipulated this way into doing something unexpected. See gramineproject/graphene#1236.

Bug detection

Run tests with sanitizers in CI (see #19).
Run some linters in CI. Problem: most have high false-positive ratios.
Implement __user-like specifier to check for TOCTOU bugs during compilation (gramineproject/graphene#635; most relevant for OCALLs).

[Pal/SGX] Rework `enclave_entry.S`

Description of the problem

enclave_entry.S is a place holding code which manages, well, enclave entry. This includes all kinds of entries:

ecalls,
ocalls,
ocall returns,
exception handling entries,
ocalls from exceptions,
...

The code has a couple of problems:

some already described in: gramineproject/graphene#2603 gramineproject/graphene#2532 #83 gramineproject/graphene#637
stack overflows, e.g. issuing multiple exception handling entries - while NSSA disallows for multiple of such entries, the routine just rewires in-enclave registers (in SSA frame) to point to signal handling routines and then exits the enclave; this can be arbitrarily nested
allows untrusted part to inject arbitrary exception (from the set of exceptions defined in PAL)
untrusted part injects -EINTR if a async signals comes while handling an ocall. This per se is not inside enclave_entry.S, but the way it's handled is a strict result of the way SGX PAL handles signals, which heavily affects code in enclave_entry.S.

The whole mechanism of handling ocalls, exceptions and ocalls from exception handlers needs to be rethinked (and possibly redesigned).

Known issues for production deployment

This issue lists items that need to be kept in mind as you consider using Graphene in a production deployment scenario.

Issues: (checked means "already fixed on master")

Leftovers from signal rework

Description of the problem

To finish signal rework (gramineproject/graphene#2090) following TOODs/issues need to be fixed:

initial, Pal allocated stack can be reused as LibOS syscall emulation stack,
revisit locking around each thread->field - since there is no nested syscall calling, some might be accessed lockless,
support SA_AUTODISARM,
~~revisit PAL-Linux-SGX signal handling, also tracked in #85~~
~~remove LEAVE_PAL_CALL and friends,~~ gramineproject/graphene#2149
~~remove DkRaiseFailure (but this needs a PAL API rework - to return errors directly).~~ gramineproject/graphene#2182

[LibOS] Double free of epoll_item

Description of the problem

Under some racy conditions (see the attached code) it is possible to free an epoll_item twice.
The main issue is that epoll_item is always on two lists: each epoll instance has a list of watched handles and a handle has a list of epoll instances it's being watched by. When closing an epoll, we need to walk the former list, removing and freeing all epoll_items and when closing a handle we need to do the same for the latter list. While the accesses to those lists are guarded by appropriate locks, it does not prevent an epoll_item from being taken out of those two lists concurrently, leading to double free.
Removing a handle from all epoll instances (happens at handle close): https://github.com/oscarlab/graphene/blob/master/LibOS/shim/src/sys/shim_epoll.c#L90
Removing all handles from an epoll instance (happens at epoll close): https://github.com/oscarlab/graphene/blob/master/LibOS/shim/src/sys/shim_epoll.c#L411

After giving it a quick thought it seems that this is not solvable without some global synchronization and it seems that the Linux kernel indeed uses a global lock for similar problem. More info: https://elixir.bootlin.com/linux/latest/source/fs/eventpoll.c#L42

Steps to reproduce

As this is a race, you might need to run it a bunch of times.

#include <err.h>
#include <errno.h>
#include <pthread.h>
#include <sys/epoll.h>
#include <unistd.h>

static int ready = 0;
static int start = 0;

static void waitfor(int* x) {
    while (!__atomic_load_n(x, __ATOMIC_SEQ_CST)) {
        __asm__ volatile ("pause");
    }
}

static void* f(void* x) {
    __atomic_store_n(&ready, 1, __ATOMIC_SEQ_CST);
    waitfor(&start);
    close((int)(long)x);
    return NULL;
}

int main(int argc, char* argv[]) {
    int efd, fd, ret;
    struct epoll_event event;

    int p[2];
    ret = pipe(p);
    if (ret < 0) { 
        err(1, "pipe");
    }

    if (close(p[1]) < 0) {
        err(1, "close"); 
    }

    fd = p[0];

    efd = epoll_create1(0);
    if (efd < 0) {
        err(1, "epoll_create");
    }

    event.data.fd = fd;
    event.events = EPOLLIN;
    ret = epoll_ctl(efd, EPOLL_CTL_ADD, fd, &event);
    if (ret < 0) {
        err(1, "epoll_ctl");
    }

    pthread_t th;
    if (pthread_create(&th, NULL, f, (void*)(long)fd) < 0) {
        err(1, "pthread_create");
    }

    waitfor(&ready);
    __atomic_store_n(&start, 1, __ATOMIC_SEQ_CST);

    close(efd);

    if (pthread_join(th, NULL) < 0) {
        err(1, "pthread_join");
    }

    return 0;
}

Expected results

No crashes :)

Actual results

Crashes :)

[LibOS] Add support for Musl libc

Currently Graphene supports only Glibc as the standard C library. Add support for Musl libc.

This will be beneficial for IoT/embedded users.

With Go program, when server program does a read, getting permission denied.

Description of the problem

With Go program, when server program does a read, getting permission denied.

Steps to reproduce

Able to reproduce on a recent graphene pull(Aug 30th, 2021), commit-id-> c321726229eaf0a1b52dc5e2507c9cfab423ea94
Also able to reproduce on https://github.com/oscarlab/graphene/releases/tag/v1.2-rc1

Providing Sample Go program and scripts to reproduce the issue.

In graphene repo, under your /home->/graphene/Examples directory, copy this zip file->(
go_sample.zip) , and then unzip it,
to create go_sample directory under /graphene/Examples/go_sample.

Under /graphene/Examples/go_sample$
Run the script -> ./launch_in_graphene_locally.sh
This will build the sample Go program(in a docker container), and then do a graphene-sgx build, and
then it will launch it locally on your host system.

When the Go Server code does a read, getting permission denied.

Expected results

Output below, when running the same Go program, outside of Graphene.
Examples/go_sample$ ./main
SK_DBG: listening on 172.17.0.1:8805
client: wrote: hello
server: read: hello

Actual results

In Graphene, when the Go Server code does a read, getting permission denied.

[P17460:T1:main] debug: Allocating stack at 0x0 (size = 8388608)
[P17460:T1:main] debug: loading "file:./main"
[P17460:T1:main] debug: adding a library for gdb: file:./main
[P17460:T1:main] debug: Creating pipe: pipe.srv:17460
debug: sock_getopt (fd = 11, sockopt addr = 0x7ffda07f52b0) is not implemented and always returns 0
[P17460:T1:main] debug: Shim process initialized
[P17460:shim] debug: IPC worker started
[P17460:T1:main] debug: Created sigframe for sig: 23 at 0xc4009390 (handler: 0x460be0, restorer: 0x460d20)
debug: sock_getopt (fd = 12, sockopt addr = 0x7ffda07f52b0) is not implemented and always returns 0
[P17460:T1:main] debug: Creating pipe: pipe.srv:8fcf6dd6dc08723b8328139ef1955390c16982da3379e1b7f6c07bc4bdc66514
debug: sock_getopt (fd = 15, sockopt addr = 0x7ffda07f52b0) is not implemented and always returns 0
debug: sock_getopt (fd = 16, sockopt addr = 0x7ffda07f52b0) is not implemented and always returns 0
debug: sock_getopt (fd = 17, sockopt addr = 0x7ffda07f52b0) is not implemented and always returns 0
[P17460:T1:main] debug: add fd 5 (handle 0xfb098610) to epoll handle 0xfb098550
[P17460:T1:main] debug: add fd 3 (handle 0xfb0983c0) to epoll handle 0xfb098550
SK_DBG: listening on 172.17.0.1:8805
Allowing access to an unknown file due to file_check_policy settings: file:/etc/localtime
2021/09/03 18:08:33 read udp 172.17.0.1:8805: read: permission denied

Additional information

Go sample code under gopro2 folder in zip file attached.
When Go Program calls net.ListenUDP, this api invokes 2 syscalls, 1. to create socket, 2. bind
When Go server program calls net.ListenUDP these 2 syscalls are successful.
Go program launches a light-weight thread that runs the Client-function, which tries to connect to the server using
Go's net.DialUDP function, and then does a write. Later, server code tries to do a read, and this is where it fails,
with permission denied error(shown in the log above).

Enhance `/proc/self/cmdline` emulation similarly to the Linux implementation

Currently, we implement a naïve emulation of /proc/self/cmdline in Graphene, see gramineproject/graphene#2180.

In particular:

Graphene copies cmdline to a global variable on LibOS initialization, while in normal Linux it just stores pointers to argv
- for example, if you modify argv[4], the /proc/self/cmdline must show this modification
- this is not just a quirk but documented behavior (see man 5 proc)
Graphene has a static buffer of one page (4KB) in size to hold cmdline, but Linux allows cmdline to be bigger
- should change that to dynamic allocation and correctly propagate on fork/clone

For the moment, our naïve emulation is enough. But some workloads may bump into one of the above limitations, so we need to fix it in the future.

[LibOS] `__process_pending_options()` is buggy in `shim_socket.c`

Hi,

In want to run the mumble server (https://github.com/mumble-voip/mumble) called murmur in Graphene. My OS is Ubuntu 20.04.

I copied the memcached manifest template and the Makefile and customized it to my needs.
When I run graphene in direct mode I got the following error from murmur:

Failed to set initial capabilities

Could be the issue, that some system calls are missing? The log throws a couple of warnings around the error message:

[P255363:T1:murmurd] debug: glibc register library /lib/x86_64-linux-gnu/libgpg-error.so.0 loaded at 0x5599043ca000
[P255363:T1:murmurd] debug: adding a library for gdb: file:/lib/x86_64-linux-gnu/libgpg-error.so.0
[P255363:T1:murmurd] warning: Unsupported system call prctl
[P255363:T1:murmurd] warning: Unsupported system call prctl
[P255363:T1:murmurd] warning: Unsupported system call prctl
[P255363:T1:murmurd] warning: Unsupported system call prctl
[P255363:T1:murmurd] warning: Unsupported system call prctl
[P255363:T1:murmurd] warning: Unsupported system call prctl
[P255363:T1:murmurd] warning: Unsupported system call sysinfo
[P255363:T1:murmurd] debug: Creating pipe: pipe.srv:8e8056f4f079e6d4c14a29f20d09a6d4039c90784f069159aa22a9691f2098b8
[P255363:T1:murmurd] debug: Creating pipe: pipe.srv:a2c4dc9fd2c5151d7fae45d205f377c50f1ab2868222fcba9469e3875db0d86e
[P255363:T1:murmurd] debug: Creating pipe: pipe.srv:2410492390f15c7266c199e70a5f57b1cb967c8ea258aa795b33a60f338ebc22
[P255363:T1:murmurd] debug: Creating pipe: pipe.srv:25d0a8cb62f22e2b187fbf820d0362b0fcf4cbfa05f7a46da3f8ff5e3d0e50a5
[P255363:T1:murmurd] warning: Unsupported system call capget
[P255363:T1:murmurd] warning: Unsupported system call capset
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx
Failed to set initial capabilities                                                             <- murmur error message
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx
[P255363:T1:murmurd] warning: Unsupported system call statx

Before Graphene stops to work it writes the following debug message into the log:

[P255501:T1:murmurd] debug: process 255501 exited with status 0
[P255503:T2:murmurd] debug: ipc send to 255501: IPC_MSG_LEASE
[P255503:T2:murmurd] debug: Sending ipc message to 255501
[P255503:T2:murmurd] debug: Waiting for response (seq = 2)
[P255503:i1:murmurd] debug: IPC leader disconnected
[P255503:i1:murmurd] debug: Unknown process (vmid: 0x3e60d) disconnected

Is my assumption correct or do I miss something else?

[LibOS] Epoll events reported only for the first fd in case of duplicates

Description of the problem

If an epoll instance monitors two fds that have the very same pal_handle (e.g. duplicated via dup) events will be reported only for the first one.
Code at fault: https://github.com/oscarlab/graphene/blob/5dd0c5a64b9ae4705825da5d7c4bc2c8ad3311ec/LibOS/shim/src/sys/shim_epoll.c#L347-L364
The inner loop (epoll->fds list) looks for the first matching pal_handl, which actually might not be unique.

[Pal/Linux-SGX] Encrypted pipes: opening same end of pipe twice is possible in multiple processes

Description of the problem:

PR gramineproject/graphene#1400 introduces Encrypted IPC: all communication via pipes, UNIX domain sockets, and socketpairs is transparently wrapped in TLS sessions. This is implemented via mbedTLS API.

An interesting corner case is forks: typically, a parent process creates two ends of the pipe via pipe() and spawns a child process via fork(). After that the parent and the child close their respective ends of the pipe and start communication on the still-open ends of the pipe.

The child inherits the TLS session of both ends of the pipe from the parent by serialization/deserialization (just like any other PAL handle, see db_streams.c). Serialization is achieved via standard mbedTLS API mbedtls_ssl_context_save() and deserialization -- via mbedtls_ssl_context_load(). However, these APIs were intended for use in another context (save TLS context to disk, exit app, start app again and load TLS context from disk). So these APIs destroy (reset) the TLS context upon calls to these functions.

This doesn't work for our use-case. Graphene needs to preserve the TLS session even during fork. For example, it is unknown whether the child will use the pipe at all -- maybe the parent wants to use this pipe as means of communication between its own two threads. Even worse, it is unknown which end of the pipe the child will close -- so it's impossible to guess which TLS session to destroy.

This led me to comment out the part with mbedtls_ssl_session_reset_int() in these APIs. Theoretically, a "sane" application will never use the same end of the pipe in two processes. However, if any application does so, then it will result in a forked xor stream and possibility to decrypt significant portions of data.

There seems to be no easy way to make this scenario secure in the above case. Hopefully someone has a clever idea how to predict/figure out an unused TLS session; then we'll be able to destroy such TLS sessions, and possibility of decryption disappears.

IPC leader does not wait for all processes to finish

Description of the problem

If the first process, which is the IPC leader, exits, all other processes cannot use IPC any more. We need to decide on a first process exit policy and implement it. I see two options:

IPC leader kills all other processes before exiting
IPC leader just cleans up app resources, but stays alive as long as other processes are still running.

We already have a TODO about it:
https://github.com/oscarlab/graphene/blob/4f8d6fb8b323d62f00be2ac2c7239878b14afc1e/LibOS/shim/src/sys/shim_exit.c#L23-L27

Regression tests for malicious outside

It would be great to be able to simulate a malicious host in regression tests. With this SGX fixes like gramineproject/graphene#511 or gramineproject/graphene#522 could be tested automatically.

Not sure what the best way for this is. The best idea I had so far was to commit patches and have a script which copies the source, applies the patch, compiles it and then run the test with the resulting (i.e. modified) pal_loader.

Edit: fixed grammar/typo in last sentence.

[LibOS] Missing support for flock syscall

Hello,
I don't know if there is a better place to track this, but would it be possible to add support for the flock() system call in Graphene LibOS?

At the moment, in shim_syscalls.c this is declared as unsupported.

Gramine doesn't emulate `/proc/stat` file

I am seeing this error while running PyTorch. The application is not spawning over multiple threads as the CPUs stat are not accessible.

shim_openat(AT_FDCWD,"/proc/stat",O_RDONLY|2000000,0000) = -2
FileNotFoundError: [Errno 2] No such file or directory: '/proc/stat'

Requirements are missing in .deb release of v1.2-rc1

Description of the problem

When using the new .deb packages of v1.2-rc1 on a relatively clean system using the instructions listed in the release, Graphene cannot be used correctly due to missing Python, protobuf & maybe Intel AESM packages, depending on the environment.

Steps to reproduce

Have a fairly clean system with essential development tools
Install Graphene .deb pre-release as stated in the release notes (I used the out-of-tree DCAP package):

sudo apt-key adv --fetch-keys https://packages.grapheneproject.io/graphene.asc
echo 'deb [arch=amd64 signed-by=EA3C2D624681AC968521587A5EE1171912234070] https://packages.grapheneproject.io/ unstable main' | sudo tee /etc/apt/sources.list.d/graphene-unstable.list
sudo apt update
sudo apt install graphene-dcap     # for out-of-tree DCAP driver

Build an application
Try to sign it with Graphene

Expected results

Graphene signs the application successfully.

Actual results

Graphene fails due to missing packages, e.g.:

graphene-sgx-sign \
  --key signer-key.pem \
  --manifest redis-server.manifest \
  --output redis-server.manifest.sgx
Traceback (most recent call last):
   File "/usr/bin/graphene-sgx-sign", line 4, in <module>;
   from graphenelibos.sgx_sign import main
   File "/usr/lib/python3/dist-packages/graphenelibos/sgx_sign.py", line 16, in <module>;
   import toml
ModuleNotFoundError: No module named "toml";
make: *** [sgx_outputs] Error 1
Makefile:97: recipe for target 'sgx_outputs' failed

Additional information

I know this is still a pre-release in early stages and the first package, just wanted to "note this down" here as an issue to be fixed for a future packaged release.

Ideally, the package can be directly used within a Docker environment as I am trying to use here (warning: it's a bit messy, haven't really cleaned it up):
edgelesssys/marblerun@1ba7157

Note the commented out packages. These contain most of the ones needed to get Graphene to sign the application and return the SIGSTRUCT. However, AESM is still not completely installed as a requirement there. For experiments, I passed the AESM service from the host.

Also, when I am already here creating this issue: Any plans to also adjust your samples to get rid of the GRAPHENEDIR variable, which will not quite work with the release packages?

Stress-ng sigsuspend test failing with IPC error -13 error on CentOS and Ubuntu

Description of the problem

Error message on RHEL:

[P1:shim] error: IPC worker: DkStreamWaitForClient failed: -13
[P493:T3110:stress-ng] error: process creation failed
[P496:T3111:stress-ng] error: process creation failed
[P488:T3109:stress-ng] error: process creation failed

Error message on Ubuntu:

[P1:shim] error: IPC worker: error running IPC callback 5: -13
[::] error: Error during shim_init() in receive_checkpoint_and_restore (-61)
rasp@rasp-WHITLEY:~/validation/oscarlab/Examples/stress-ng$ [P498:T3111:stress-ng] error: process creation failed
[P489:T3109:stress-ng] error: process creation failed
[P483:T3108:stress-ng] error: process creation failed

The test was executed with PR 2595 as well as PR 2597 but the issue is not resolved.

The memory used after the test shoots up by 40GB and doesn't free automatically.

Log files attached:
sigsuspend_rhel.txt
sigsuspend_ubuntu.txt

Steps to reproduce

graphene-direct stress-ng --sigsuspend 0 --timeout 60s --verbose

Expected results

Actual results

[LibOS,Pal] Refactor RTLD code in both LibOS and PAL

Description of the problem

While working on a subtle bug in gramineproject/graphene#1428, I came up with a quick-and-dirty fix: gramineproject/graphene#1434.

In reality, there are two issues with Graphene's RTLD (loading of binaries) subsystem:

The code quality is bad. For example, in file Pal/src/db_rtld.c, map_elf_object_by_handle() and add_elf_object() are 90% similar, but the former is used by Linux PAL (used for mapping an executable to non-specified address) while the latter is used by Linux-SGX PAL (used for mapping an executable with an SGX-specified address). Similarly, ./LibOS/shim/src/elf/shim_rtld.c is barely readable. Ideally, we would extract the shared functionality of these files in Pal/lib/rtld/ and remove duplicates/unused code.
The main (and only) executable is treated completely differently as shared libraries. For example, shared libraries are checkpointed (both their shim handles and their VMA mappings) and sent to the child. However, the executable is not sent to the child (though its shim handle is sent but never used), and the PAL layer provides this executable. We must consolidate this logic, somehow treating the executable as "just another binary to load and checkpoint".

Packaging

Development:

Documentation:

NEWS/ChangeLog/whatever
new installation procedure (#244)
new building procedure
guide for distro maintainers/packagers

Release:

maintainer scripts for releasing
make a release

Outside effort:

Upstream SGX driver (merged in 5.11)
/dev exec mount options (systemd/systemd#17940)
udev rules for /dev/sgx/enclave (systemd/systemd#18944, released in 248)

Filesystem refactoring

This describes the current state of my filesystem refactoring project.

Legend:

✔️ Done (merged to master)
🚧 In progress (usually has a PR open)
⭐ Next (usually will be unlocked by current "in progress")

Bug fixes and new features

✔️ gramineproject/graphene#952 Listing a directory doesn't show mountpoints
✔️ lseek on directories (PR gramineproject/graphene#2406)
✔️ Fix /proc code (will fix gramineproject/graphene#948, gramineproject/graphene#1387; PR gramineproject/graphene#2453)
✔️ Fix lseek overflow on big offsets (PR gramineproject/graphene#2478)
✔️ File locking (fcntl) (will fix gramineproject/graphene#437) (PR gramineproject/graphene#2481)
✔️ Fix poll on pseudo-handles (PR gramineproject/graphene#2498) (found in gramineproject/graphene#2419)
✔️ Fix crashes after mknod on path reuse (PR gramineproject/graphene#2499) (found in gramineproject/graphene#2419)
🚧 Investigate the master list of issues (gramineproject/graphene#1803): probably will lead to more minor bug fixes.
Symlinks (will fix gramineproject/graphene#516): as in-memory files emulated by Graphene

Detailed list

Multi-process synchronization

✔️ Initial design and discussion (gramineproject/graphene#2158)
✔️ Prototype in Python to clarify the API (https://github.com/pwmarcz/fs-demo)
✔️ Proof of concept: FD position sync (PR gramineproject/graphene#2264, gramineproject/graphene#2267)
✔️ File locking (fcntl) (will fix gramineproject/graphene#437) (PR gramineproject/graphene#281, ~~gramineproject/graphene#2522~~)
✔️ Fix mknod crash (PR gramineproject/graphene#2499)
Synchronize dentry cache

Dentry/low level filesystem cleanup

✔️ Initial design and discussion (issue gramineproject/graphene#2321)
✔️ Rewrite low-level dentry functions (PR gramineproject/graphene#2324)
✔️ Remove ino (PR gramineproject/graphene#2374)
✔️ Make mode and type functional (PR gramineproject/graphene#2379)
✔️ Simplify readdir (PR gramineproject/graphene#2383)
✔️ Change treatment of synthetic files (pipes, sockets) - needed for changing mount semantics (PR gramineproject/graphene#2402)
✔️ Change mount semantics (don't overwrite existing dentries by mount) (PR gramineproject/graphene#2370)
✔️ Rewrite pseudo-filesystems (PR gramineproject/graphene#2453)
✔️ Remove qstr (PR gramineproject/graphene#2585, #267 )
✔️ Handle unlink and rename correctly
✔️ Migrate to inodes: this will take several steps, but should unblock further changes (PR gramineproject/graphene#2646, #5, issue #279)
Make sure dentry stores stat data; remove mode() and stat() operations
Remove lseek() operation (don't pass it through to underlying filesystem)
Clear semantics for remaining filesystem operations (issue #279)

Path lookup

✔️ Rewrite path handling (PR gramineproject/graphene#2342)
✔️ Rewrite path lookup logic (PR gramineproject/graphene#2333, gramineproject/graphene#2354)

[RFC] New interface for calling host-level syscalls

Description

For reasons why we would want this, please look at pros&cons.

The new version would change the way blocking host-level syscalls are done, the non-blocking, or rather "fast" syscalls would be called directly (same way as now). We would consider futex with FUTEX_WAKE to be a "fast" syscall and readto be a "slow" syscall. The "slow" syscalls wouldn't be called directly, but using a helper thread. E.g. it could look like:

We want to issue a host-level syscall read(fd, buf, size)
Write syscall number and arguments in some shared buffer.
Wake-up a dedicated host helper thread (probably using futexes).
Wait for helper thread being done - sleep using special sleeping function.
Helper thread issues a syscall, saves the return value, wakes up the original thread and goes back to sleep.

The assumption is that host-level thread are cheap and we could have one for each Graphene thread (which would basically double the number of host threads).

If an interrupt (signal) comes, then we notify the helper thread about it and wait for it finishing the job (it either goes back to sleep or writes the return values, if syscall was completed already).
Why even bother with all of this and not just call syscalls directly? To use the special sleeping function (basically a futex, but that's not really important), which would be signals (Graphene level signals) aware.

Pros:

We would be able to handle signals coming at any moment, even when we are inside LibOS/PAL before issuing the syscall (currently we can block in such cases).
This would simplify signal handling, especially in Linux-SGX PAL (e.g. no need for weird EINTR injection and losing PAL state in case of ocall interrupt).

Cons:

Double the number of threads.
Some additional overhead on each "slow" syscall. I'm not sure how much would that slow the execution, but I suspect not much - need to be determined empirically. Note that we would do this only on "slow" syscalls, so this shouldn't be that bad (as they are "slow" anyway). The most (only?) noticeable overhead would be in case of "slow" syscall that actually do not block, e.g. read on file descriptor with some data already ready to be read.

Note that in case of PAL Linux-SGX we would do all of this in the untrusted part, so the overhead would probably be negligible.

Idea v2

Another approach could be wrapping each blocking syscall with something like this:

xor r11, r11
xchg r11, [some_per_thread_variable] ; this variable would be set to 1 by signal handling routines
cmp r11, 0
jne .skip
syscall
jmp .syscall_done
.skip:
mov rax, -EINTR
.syscall_done:

some_per_thread_variable would be set by LibOS code in an appropriate upcall iff we were interrupted inside LibOS or PAL code.
What I don't like about this approach:

There is a 3 instruction window, which can still miss a signal (between xchg and syscall). The window could probably be narrowed down to 2 instructions, e.g. by replacing xchg and cmp with sub r11, [addr], but that does not solve the issue.
This would require accessing an untrusted variable some_per_thread_variable from LibOS. While this can be done in a secure manner (e.g. writing inline asm), the idea does sound nice and sets a dangerous precedence. Besides we would need to provide an interface for such access just because SGX needs it.

I personally dislike this idea even more than the first one.

RFC: interruptible POSIX locks

I'm trying to fix recently added POSIX locks (gramineproject/graphene#2481) so that you can interrupt them, but I ran into a problem trying to do it with current IPC. I don't want to jump and change too much before I learn your opinion.

@boryspoplawski @mkow @dimakuv

The problem

Graphene's current implementation of POSIX locks (fcntl) do not support interrupting the request:

void handler(int signum) {
    printf("got signal %d\n", signum);
}
signal(SIGALRM, &handler);

fcntl(fd, F_SETLK, ...);
pid_t pid = fork();
if (pid == 0) {
    // child process:

    // trigger SIGALRM after 10 seconds
    alarm(10);

    // this should be interrupted:
    fcntl(fd, F_SETLKW, ...);
    ...
}

The fcntl(F_SETLKW) in the child process should wait for 10 seconds and then fail with EINTR (after the process receives SIGALRM). Under Graphene, it hangs indefinitely.

So far, I've seen this checked by LTP (test fcntl16), and I'm pretty sure it's used by stress-ng (see gramineproject/graphene#2510). I'm not sure how much it's used in actual applications, but interrupting the fcntl call seems to be the only sensible way of trying to take a lock with timeout.

How to interrupt an operation over IPC?

Interrupting the lock operation is easy in the single-process version. However, if we are in a remote process, the operation is requested over IPC, and it's not that easy to achieve the same semantics.

The current IPC implementation does something like this:

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this waits for response, and retries on EINTR
ipc_send_message_and_get_response(lock_msg, &result);
return result;

As long as ipc_send_message_and_get_response keeps retrying, we have no way to return earlier. I've been thinking about some solutions:

Stop waiting?: You could be tempted to modify the ipc_send_message_and_get_response() function to not wait (perhaps with some retry = false parameter). However, this does not actually cancel the operation, so the lock will be taken eventually. This is wrong - we want the F_SETLKW operation to fail without side effects.

Stop waiting and issue another call? We could send another IPC message to cancel the pending operation. Keep in mind that the operation might get finished before the "cancel" message is delivered, so we need to learn what the actual result is.

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this returns -EINTR if the wait is interrupted
ret = ipc_send_message_and_get_response(lock_msg, /*retry=*/false, &result);

if (ret == -EINTR) {
    // Cancel, and get result. `result` will be -EINTR if we succesfully canceled the operation, 
    // or something else (0, -ENOMEM, ...) if the operation finished in the meantime
    cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... };
    ipc_send_message_and_get_response(cancel_msg, /*retry=*/true, &result);
}
return result;

The problem with that is that if the operation does get finished in the meantime, the remote side has to keep storing the result in case we receive POSIX_LOCK_CANCEL. So we would need some way of cleaning up that result.

Also, if the response to POSIX_LOCK_SET is sent in the meantime, we are going to just drop it.

Keep waiting for the original response? Ideally, we want to always get the response for the original message, but sometimes we want to trigger the response to come early. So we could register a wait, but on EINTR, send additional message in the meantime, then return to the previous wait:

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

ipc_send_message_and_register_wait(lock_msg);
ret = wait_for_response(lock_msg, /*retry=*/false);
if (ret == -EINTR) {
     // Cancel. That will cause response to `lock_msg` to be sent without further delay
     // (if it hasn't been sent already).
     cancel_msg = { POSIX_LOCK_CANCEL, msg.seq, ... };
     ipc_send_message(cancel_msg);

     // Wait for response to the original message. `result` will be -EINTR if we canceled the operation,
     // or something else (0, -ENOMEM...) if the operation finished in the meantime.
     wait_for_response(lock_msg, /*retry=*/true, &result);
}
return result;

I like that, but it makes the IPC API more complicated: instead of one function that does everything, there are separate stages (first send a message and register a wait, then wait for response). That means managing some state between calls (a "waiter" registered for the IPC worker) that currently is local to ipc_send_message_and_get_response.

Add an on_interrupt callback? The above could be simplified by keeping the ipc_send_message_and_get_response as is, but making it perform some additional action when the wait is interrupted.

int send_cancel(struct shim_ipc_message* msg) {
     cancel_msg = { POSIX_LOCK_CANCEL, msg->seq, ... };
     return ipc_send_message(cancel_msg);        
}

lock_msg = { POSIX_LOCK_SET, .wait = true, ... };

// this calls `on_interrupt` when the wait is interrupted, then keeps on waiting
// (or fails, if `on_interrupt` returned failure)
ipc_send_message_and_get_response(lock_msg, /*on_interrupt=*/&send_cancel, &result);
return result;

This would certainly be easiest to use, but it's a pretty specific use-case. I think it's worth it only if we expect to use it again soon.

So I think either 3 or 4 are viable. But maybe I'm missing something and there is an easier way?

UDP Client does not get the correct address using getsockname

Description of the problem

UDP Client does not get the correct address using getsockname.
When UDP client connects to a server address, kernel assigns the IP-address:port-number for
the client's end-point. After call to connect system call, the client IP address:port-number can be retrieved using
getsockname system call.
I am using a golang program, that calls a Go library api-> net.DialUDP(which invokes system call connect).
Inside golang's net.DialUDP function, after call to connect system call, they call getsockname to retreive
the client end-points IP-address:port-number, and store it in their handle(as LocalAddress).
When go application code retrieves LocalAddress, it is NOT getting the correct address/port.

Steps to reproduce

Issue reproduced on -> commit fb71e4376a1fa797697832ca5cbd7731dc7f8793
in gramine-project.
If you run the attached go program, you can notice the LocalAddress
retreived by the UDP client is same as RemoteAddress(to which client is connecting to).
Details on how to reproduce are in additional info section below.

Expected results

LocalAddress of client is expected to be different from the server address it is connecting to.

Actual results

LocalAddress of client is same as server address it is connecting to.

From the logs:
Remote UDP address : 127.0.0.1:6000
Local UDP client address : 127.0.0.1:6000

Additional information -->

Attaching zip file, that has the sources to reproduce issue-> go_udp_client.zip

The Go source code is in a sub-folder inside the zip file-> gopro_udp_client/main.go

In graphene repo, under your /home->/graphene/Examples directory, copy this zip file-> , and then unzip it,
to create go_xx directory under /graphene/Examples/go_xx.

Under /graphene/Examples/go_xx$
Run the script -> ./launch_in_graphene_locally.sh
This will build the sample Go program(in a docker container), and then do a graphene-sgx build, and
then it will launch it locally on your host system.

Graphene debug logs, are in the zip file, titled as -> udp_graphene_connect_issue_debug_logs

GSC Failed to load entrypoint (missing shebang support in `execve()`)

Hi,

as you can see in issue gramineproject/graphene#2632 we're trying to run Postgres in Graphene. OS is Ubuntu 20.04.

Contrary to issue gramineproject/graphene#2632 we didn't use our own dockerfile, but instead, we used one from Dockerhub https://hub.docker.com/_/postgres

Then again we build and signed it using gsc.
When trying to run this image however we encountered the following problem:

[P9:T1:docker-entrypoint.sh] error: Failed to load /docker-entrypoint.sh. This may be caused by the binary being non-PIE, in which case Graphene requires a specially-crafted memory layout. You can enable it by adding 'sgx.nonpie_binary = 1' to the manifest.
[P9:T1:docker-entrypoint.sh] error: Error during shim_init() in init_loader (-22)

If we follow the instruction to add the sgx.nonpie_binary = 1 we get the following error:

Parsing /entrypoint.manifest as TOML failed: Duplicate keys! (line 26 column 1 char 1073)

We have also used the -L option during the gsc build, but there are no more debug information available.

Do you have any ideas to fix this issue?

Thanks in advance.

[Pal/Linux-SGX] DkVirtualMemoryAlloc() crashes when called on a non-enclave page

Description of the problem

DkVirtualMemoryAlloc() crashes when called on a page which is inside the "user range" (aka. LibOS memory range) but which was not added to the enclave.

Steps to reproduce

See gramineproject/graphene#1698 (this issue is the reason why the bug in gramineproject/graphene#1698 manifested in a crash instead of an error).

Expected results

Should never crash.

Actual results

memset() inside __create_vma_and_merge(), inside get_enclave_pages() crashes when trying to zero out the newly allocated region.

Output a Graphene configuration on startup (including insecure options!)

Right now we print some ad-hoc messages. We need some nice always outputted multi-line configuration message which is printed at every Graphene startup. This configuration message should print out the most important manifest options and also (in red) the enabled insecure options.

RFC: Separate signing

This is a proposal for split signing enclaves.

Definitions

rendering: processing jinja+toml to toml only
trusted file expansion: recursive walk through directories to pick regular files inside them
trusted file measurement: calculating sha256 over the regular file
TBSSIGSTRUCT: SIGSTRUCT to be signed with RSA
SIGSTRUCT: after signing

Current status

There are two tools: graphene-manifest, which only renders manifest templates, and graphene-sgx-sign, which expands and measures trusted files, then calculates MRENCLAVE (the inputs are: manifest from previous step and libpal.so), TBSSIGSTRUCT, signs it using provided key to SIGSTRUCT and exports .manifest.sgx (manifest after expansion and measurement) and .sig files (SIGSTRUCT). Those two files need to be provided when launching the enclave.

Problems

production enclave signing key needs to be protected
some trusted files might not be available and/or too big to be conveniently re-measured every time the manifest is processed
there are scenarios where people need to recalculate MRENCLAVE themselves: when connecting to an enclave via RA-TLS, client needs to verify MRENCLAVE provided by the other end (enclave)
last but not least, current manifest schema around trusted files is suboptimal, there is O(n²) processing complexity (gramineproject/graphene#2593); this is related, because this affects the same manifest attribute

Proposed architecture

App developer would be able to add trusted files hashes to manifest template. Those pre-hashed files will be skipped when doing measurement.

Proposed changes

Manifest changes

New schema:

sgx.trusted_files = [
  {uri = "file:...", sha256 = "..."},
  "file:...",
]

Manifests will be renamed: app.toml.jinja (the template) and app.toml (after rendering and trusted files measurement).

CLI tools

graphene-manifest will, after manifest rendering, expand and measure trusted files; files with pre-filled hashes will not be measured again
graphene-sgx-sign will not expand nor measure trusted files, instead will raise an error if there are directories or missing hashes
graphene-sgx-get-mrenclave new tool, will calculate MRENCLAVE based on manifest and libpal.so
graphene-sgx-keygen new tool, will generate serviceable RSA-3072 key in proper place

The default key location will be in ~/.config/graphene/enclave-key.pem (really in $XDG_CONFIG_HOME).

Python API

The following functions will be documented:

graphenelibos.manifest.render(template, ...)
graphenelibos.manifest.expand_and_measure_trusted_files(manifest)
graphenelibos.sgx_sign.get_mrenclave(manifest[, libpal])
graphenelibos.sgx_sign.get_tbssigstruct(manifest[, mrenclave])
graphenelibos.sgx_sign.sign(tbssigstruct, key)

Python implementation

use cryptography (https://cryptography.io/) instead of subprocess.run('openssl') and hand-crafted RSA signature
deprecate argparse, standardise on click (https://click.palletsprojects.com/)

Discussion

Advantages

simplicity
no (obvious) footguns

Drawbacks

graphene-manifest becomes mandatory: it's technically possible to skip this tool currently, but even now all the examples already use it

Considered alternatives

Manifest merging

There is no convincing use case for this, which would not be better served by either protected_files or just copying the hash into manifest template. I might have backup proposal should there be a real need.

Unsolved problems

manifest signing for submission to signing facility, and audit log
in the basic setup, libpal.so is needed in signing facility, and parsed using binutils

Future work

allowed_files, trusted_files and protected_files are poorly named, they should be renamed passthrough_files, measured_files and encrypted_files (not sure about the last one)

Known security issues

This issue lists known security problems with Graphene. Everything we're aware of but didn't have time to fix yet should be listed here. We'll also need to resolve all of them before we can say we're ready for production.

I'll try to keep this list up-to-date.

Critical: (checked means "already fixed on master")

Probably not-so-critical: (or rather: we don't know any app which would be vulnerable because of these)

CPUID sanitization (gramineproject/graphene#966) - currently we sanitize everything we know may lead to corruptions in users' apps (like xsave area sizes), but maybe there are more dangerous cpuid leaves? We also allow the values to change in time in some cases, which is usually impossible on real hardware.
There are tons of subtle attack angles related to differences between the upstream Linux API and what we actually implement or can support on a specific backend. E.g. on SGX1 we can't change memory permissions, so everything is RWX, or that we support flag X but not flag Y in some syscalls (while in Linux support for X implies support for Y). It's quite hard for me to imagine a real-world application which would rely on such things for its correctness (and I mean correctness, not exploitability if another bug is present), but it's still worth noting that such a possibility exists.
Time readings may be easily spoofed, but unless we won't allow it going backwards then it's hard to imagine an application exploitable by changing system time. There may be some issues with TLS certificate validation logic in applications (e.g. the attacker shifts back time and shows an old, leaked cert for some domain). I guess users would need to ship the applications with a reasonably up-to-date revocation lists?

Update: I moved the hardening ideas to a separate issue: #54.

Most LTP testcases will not work under SGX

Multiple test cases fail with this message:
stdout:

tst_test.c:109: BROK: mmap((nil),4096,3,1,3,0) failed: EACCES

stderr:

*** No client info specified in the manifest. Graphene will not perform remote attestation ***
file_map does not currently support writable pass-through mappings on SGX.  You may add the PAL_PROT_WRITECOPY (MAP_PRIVATE) flag to your file mapping to keep the writes inside the enclave but they won't be reflected outside of the enclave.

The reason is because most test cases use standardised main function:

https://github.com/linux-test-project/ltp/blob/f2926be6514023bb94141792e4e2bfffd2aac0fc/include/tst_test.h#L279-L288

This function calls do_setup():

https://github.com/linux-test-project/ltp/blob/f2926be6514023bb94141792e4e2bfffd2aac0fc/lib/tst_test.c#L1237

And do_setup() calls setup_ipc():

https://github.com/linux-test-project/ltp/blob/f2926be6514023bb94141792e4e2bfffd2aac0fc/lib/tst_test.c#L816

setup_ipc() calls mmap(..., MAP_SHARED, ...):
https://github.com/linux-test-project/ltp/blob/f2926be6514023bb94141792e4e2bfffd2aac0fc/lib/tst_test.c#L99

MAP_SHARED is unsupported under SGX.

Only tests which have their own main() function can possibly run under SGX. But I'd expect more, not less, testcases which would use this common setup function, because it is less code to write and maintain.

Tests that are disabled for this bug, because they're failing

abort01
accept01
accept4_01
access01
access02
access03
access04
acct01
add_key01
add_key02
add_key03
add_key04
alarm02
alarm03
alarm05
alarm06
alarm07
bind03
brk01
chdir03
chmod05
chmod06
chmod07
clock_adjtime01
clock_adjtime02
clock_gettime02
clone09
copy_file_range01
creat01
creat03
creat04
creat05
creat06
creat07
delete_module01
delete_module02
delete_module03
dirtyc0w
epoll_create1_01
epoll_ctl01
epoll_ctl02
epoll_wait02
execl01
execle01
execlp01
execv01
execve01
execve02
execve03
execve04
execve05
execveat01
execveat02
execveat03
execvp01
exit02
fallocate05
fanotify01
fanotify02
fanotify03
fanotify04
fanotify05
fanotify07
fanotify08
fanotify09
fanotify10
fanotify11
fanotify12
fchdir03
fchmod01
fchmod02
fchmod05
fchmod06
fcntl02
fcntl02_64
fcntl03
fcntl03_64
fcntl04
fcntl04_64
fcntl33
fcntl33_64
fcntl35
fcntl35_64
fdatasync03
fgetxattr01
fgetxattr02
fgetxattr03
flistxattr01
flistxattr02
flistxattr03
flock01
flock02
flock04
flock06
fremovexattr01
fremovexattr02
fsetxattr01
fsetxattr02
fsync01
fsync04
ftruncate04
ftruncate04_64
futex_wait05
getcpu01
getcwd01
getcwd02
getcwd03
getcwd04
getpriority01
getpriority02
getrandom01
getrandom02
getrandom03
getrandom04
getrlimit03
getsockopt02
getxattr04
inotify01
inotify02
inotify03
inotify04
inotify05
inotify07
inotify08
ioctl03
ioctl04
ioctl05
ioctl06
ioctl07
kcmp01
kcmp02
kcmp03
keyctl01
keyctl02
keyctl03
keyctl04
keyctl05
keyctl06
keyctl08
lgetxattr01
lgetxattr02
link08
listxattr02
listxattr03
llistxattr01
llistxattr02
llistxattr03
lseek01
lseek07
lseek11
madvise01
madvise02
madvise05
madvise06
madvise07
madvise08
madvise09
madvise10
memfd_create01
memfd_create02
mkdir02
mkdir05
mlock201
mlock202
mlock203
mmap12
mmap16
mprotect02
msgctl01
msgctl02
msgctl03
msgctl04
msgctl12
msgget01
msgget02
msgget03
msgsnd02
msgsnd05
msgsnd06
msync03
msync04
nanosleep01
nanosleep02
nice02
nice03
nice04
open01
open02
open08
open11
pause01
pipe01
pipe02
pipe11
pivot_root01
poll01
poll02
posix_fadvise01
posix_fadvise01_64
posix_fadvise02
posix_fadvise02_64
posix_fadvise03
posix_fadvise03_64
posix_fadvise04
posix_fadvise04_64
ppoll01
prctl01
prctl02
prctl03
preadv03
preadv03_64
preadv201_64
preadv202
preadv202_64
pselect01
pselect01_64
pselect03
pselect03_64
ptrace07
pwrite02
pwrite02_64
pwrite03
pwrite03_64
pwritev03
pwritev03_64
pwritev201
pwritev201_64
pwritev202
pwritev202_64
quotactl01
read01
read02
readahead01
readahead02
readlink01
readlink03
realpath01
recvmsg02
remap_file_pages02
request_key01
request_key02
request_key03
request_key04
request_key05
rmdir01
rmdir02
rmdir03
rt_sigsuspend01
rt_tgsigqueueinfo01
sbrk03
sched_getaffinity01
sched_setscheduler03
select04
semctl01
sendto02
setpriority01
setpriority02
setregid01
setregid01_16
setregid02
setregid02_16
setregid03
setregid03_16
setregid04
setregid04_16
setrlimit02
setrlimit03
setrlimit04
setrlimit06
setsockopt02
setsockopt03
setuid01_16
setuid03
setuid03_16
setuid04
setuid04_16
setxattr01
setxattr02
shmat01
shmat02
shmat03
sigpending02
socket02
socketpair01
socketpair02
splice01
splice03
splice04
splice05
stat01
stat01_64
stat03
stat03_64
stime01
stime02
sync03
sync_file_range02
syncfs01
syscall01
sysctl01
sysctl03
sysctl04
tee01
tee02
tgkill01
tgkill02
tgkill03
umount01
umount02
umount03
uname04
unlink05
ustat01
ustat02
utimes01
vhangup01
vhangup02
vmsplice01
vmsplice02
wait401
waitpid01
waitpid06
waitpid07
waitpid08
waitpid09
waitpid10
waitpid11
waitpid12
waitpid13
write01
write02
write03
write04
write05
writev01
writev07

Tests which have own main function (from grep), so are unaffected (by this issue)

adjtimex01
adjtimex02
asyncio02
bdflush01
bind01
bind02
cacheflush01
capget01
capget02
capset01
capset02
chdir01
chdir02
chdir04
chmod01
chmod02
chmod03
chmod04
chown01
chown02
chown03
chown04
chown05
chroot01
chroot02
chroot03
chroot04
clone01
clone02
clone03
clone04
clone05
clone06
clone07
close01
close02
close08
confstr01
connect01
creat07_child
creat08
dup01
dup02
dup03
dup04
dup05
dup06
dup07
dup201
dup202
dup203
dup204
dup205
dup3_01
dup3_02
epoll-ltp
epoll_pwait01
epoll-test
epoll_wait03
eventfd01
eventfd2_01
eventfd2_02
eventfd2_03
execl01_child
execle01_child
execlp01_child
execv01_child
execve01_child
execveat_child
execveat_errno
execve_child
execvp01_child
exit01
exit_group01
faccessat01
fallocate01
fallocate02
fallocate03
fanotify_child
fchdir01
fchdir02
fchmod03
fchmod04
fchmodat01
fchown01
fchown02
fchown03
fchown04
fchown05
fchownat01
fchownat02
fcntl01
fcntl05
fcntl06
fcntl07
fcntl08
fcntl09
fcntl10
fcntl11
fcntl12
fcntl13
fcntl14
fcntl15
fcntl16
fcntl17
fcntl18
fcntl19
fcntl20
fcntl21
fcntl22
fcntl23
fcntl24
fcntl25
fcntl26
fcntl27
fcntl28
fcntl29
fcntl30
fcntl31
fcntl32
fdatasync01
fdatasync02
fmtmsg01
fork01
fork02
fork03
fork04
fork05
fork06
fork07
fork08
fork09
fork10
fork11
fork12
fork13
fork14
fpathconf01
fstat01
fstat02
fstat03
fstat05
fstatat01
fstatfs01
fstatfs02
fsync02
fsync03
ftruncate01
ftruncate02
ftruncate03
ftruncate04
futex_wait01
futex_wait02
futex_wait03
futex_wait04
futex_wake01
futex_wake02
futex_wake03
futex_wake04
futimesat01
getcontext01
getdents01
getdents02
getdomainname01
getdtablesize01
getegid01
getegid02
geteuid01
geteuid02
getgid01
getgid03
getgroups01
getgroups03
gethostbyname_r01
gethostid01
gethostname01
getitimer01
getitimer02
getitimer03
get_mempolicy01
getpagesize01
getpeername01
getpgid01
getpgid02
getpgrp01
getpid01
getpid02
getppid01
getppid02
getresgid01
getresgid02
getresgid03
getresuid01
getresuid02
getresuid03
getrlimit01
getrlimit02
get_robust_list01
getrusage01
getrusage02
getrusage03
getrusage03_child
getrusage04
getsid01
getsid02
getsockname01
getsockopt01
gettid01
gettimeofday01
getuid01
getuid03
getxattr01
getxattr02
getxattr03
inotify_init1_01
inotify_init1_02
io_cancel01
ioctl01
ioctl02
io_getevents01
ioperm01
ioperm02
iopl01
iopl02
kill01
kill02
kill03
kill04
kill05
kill06
kill07
kill08
kill09
kill10
kill11
kill12
lchown01
lchown02
lchown03
link02
link03
link04
link05
link06
link07
linkat01
linkat02
listen01
llseek01
llseek02
llseek03
lstat01
lstat02
lstat03
mallopt01
mem03
memcmp01
memcpy01
memset01
migrate_pages01
mincore01
mincore02
mkdir09
mkdirat01
mknod01
mknod02
mknod03
mknod04
mknod05
mknod06
mknod07
mknod08
mknod09
mknodat01
mknodat02
mlock01
mlock02
mlock03
mlock04
mlockall01
mlockall02
mlockall03
mmap001
mmap01
mmap02
mmap03
mmap04
mmap05
mmap06
mmap07
mmap08
mmap09
mmap10
mmap11
mmap13
mmap14
mmap15
mmap16
modify_ldt01
modify_ldt02
modify_ldt03
mount01
mount02
mount03
mount03_setuid_test
mount04
mount05
mount06
move_pages01
move_pages02
move_pages03
move_pages04
move_pages05
move_pages06
move_pages07
move_pages08
move_pages09
move_pages10
move_pages11
mprotect01
mprotect02
mprotect03
mprotect04
mq_notify02
mremap01
mremap02
mremap03
mremap04
mremap05
msgrcv01
msgrcv02
msgrcv03
msgrcv04
msgrcv05
msgrcv06
msgrcv07
msgrcv08
msgstress01
msgstress02
msgstress03
msgstress04
msync01
msync02
msync03
munlock01
munlock02
munlockall01
munmap01
munmap02
munmap03
nanosleep03
nanosleep04
newuname01
nftw
nftw64
open03
open04
open05
open06
open07
open09
open10
open12
open12_child
open13
open14
openat01
openat02
openat02_child
openat03
pathconf01
pause02
pause03
perf_event_open01
perf_event_open02
personality01
personality02
pipe04
pipe05
pipe06
pipe07
pipe08
pipe09
pipe10
pipe2_01
pipe2_02
pread01
pread02
pread03
process_vm01
process_vm_readv02
process_vm_readv03
process_vm_writev02
profil01
pselect02
ptrace01
ptrace02
ptrace03
ptrace04
ptrace05
ptrace06
pwrite01
pwrite04
read03
read04
readdir01
readdir21
readlinkat01
readlinkat02
readv01
readv02
readv03
reboot01
reboot02
recv01
recvfrom01
recvmsg01
remap_file_pages01
removexattr01
removexattr02
rename01
rename02
rename03
rename04
rename05
rename06
rename07
rename08
rename09
rename10
rename11
rename12
rename13
rename14
renameat01
renameat201
renameat202
rt_sigaction01
rt_sigaction02
rt_sigaction03
rt_sigprocmask01
rt_sigprocmask02
rt_sigqueueinfo01
sbrk01
sbrk02
sched_getattr01
sched_getattr02
sched_getparam01
sched_getparam02
sched_getparam03
sched_get_priority_max01
sched_get_priority_max02
sched_get_priority_min01
sched_get_priority_min02
sched_getscheduler01
sched_getscheduler02
sched_rr_get_interval01
sched_rr_get_interval02
sched_rr_get_interval03
sched_setaffinity01
sched_setattr01
sched_setparam01
sched_setparam02
sched_setparam03
sched_setparam04
sched_setparam05
sched_setscheduler01
sched_setscheduler02
sched_yield01
select01
select02
select03
semctl01
semctl02
semctl03
semctl04
semctl05
semctl06
semctl07
semget01
semget02
semget03
semget05
semget06
semop01
semop02
semop03
semop04
semop05
send01
sendfile02
sendfile03
sendfile04
sendfile05
sendfile06
sendfile07
sendfile08
sendfile09
sendmsg01
sendmsg02
sendto01
setdomainname01
setdomainname02
setdomainname03
setegid01
setegid02
setfsgid01
setfsgid02
setfsgid03
setfsuid01
setfsuid02
setfsuid03
setfsuid04
setgid01
setgid02
setgid03
setgroups01
setgroups02
setgroups03
setgroups04
sethostname01
sethostname02
sethostname03
setitimer01
setitimer02
setitimer03
setns01
setns02
setpgid01
setpgid02
setpgid03
setpgid03_child
setpgrp01
setpgrp02
setresgid01
setresgid02
setresgid03
setresgid04
setresuid01
setresuid02
setresuid03
setresuid04
setresuid05
setreuid01
setreuid02
setreuid03
setreuid04
setreuid05
setreuid06
setreuid07
setrlimit01
set_robust_list01
setsid01
setsockopt01
set_thread_area01
set_tid_address01
settimeofday01
settimeofday02
setxattr03
sgetmask01
shmctl01
shmctl02
shmctl03
shmctl04
shmdt01
shmdt02
shmget01
shmget02
shmget03
shmget04
shmget05
sigaction01
sigaction02
sigaltstack01
sigaltstack02
sighold02
signal01
signal02
signal03
signal04
signal05
signal06
signalfd01
signalfd4_01
signalfd4_02
sigprocmask01
sigrelse01
sigsuspend01
sigwaitinfo01
simple_tracer
socketcall02
socketcall03
socketcall04
sockioctl01
ssetmask01
statfs01
statfs02
statfs03
statvfs01
statvfs02
string01
swapoff01
swapoff02
swapon01
swapon02
swapon03
symlink01
symlink02
symlink03
symlink04
symlink05
symlinkat01
sync01
sync02
sync_file_range01
sysconf01
sysfs01
sysfs02
sysfs03
sysfs04
sysfs05
sysfs06
sysinfo01
sysinfo02
syslog11
syslog12
syslogtst
time01
time02
timerfd01
timerfd02
timerfd03
timerfd_create01
timerfd_gettime01
timerfd_settime01
timer_getoverrun01
timer_gettime01
times01
tkill01
tkill02
truncate01
truncate02
truncate03
ulimit01
umount2_01
umount2_02
umount2_03
uname01
uname02
uname03
unlinkat01
unshare01
unshare02
utime01
utime02
utime03
utime04
utime05
utime06
utimensat01
vfork01
vfork02
wait01
wait02
wait402
waitid01
waitid02
waitpid02
waitpid03
waitpid04
waitpid05
writev02
writev05
writev06

[RFC] Standard installation paths

Statement of the problems

Those are problems that we need to be aware when solving

Stable paths

After signing the manifest, paths are baked in, so Graphene needs a standardised set of installed paths. The immediate problem concerns libsysdb.so and libc. Paths need to be the same across distributions.

Multiple libcs installed together

There is an ongoing proposal to make graphene work with executables compiled against musl: gramineproject/graphene#2269. It's a choice of enclave developer, which libc will be used (probably using fs.mount. manifest option), so we need to make multiple libcs available concurrently.

Multiple versions installed together, for signing

libpal.so is needed to calculate the expected measurement of the enclave. Also, loader might change between versions. So for each Graphene version, we need to install libpal.so and the respective python script which emulates loading for the purpose of signing. This signing script will be discovered/loaded/invoked by common graphene-sgx-sign command.

Proposed solution

PAL, LibOS, runtime

/usr/lib/graphene-<version>/direct/libpal.so
/usr/lib/graphene-<version>/direct/loader
/usr/lib/graphene-<version>/sgx/libpal.so
/usr/lib/graphene-<version>/sgx/loader
/usr/lib/graphene-<version>/libsysdb.so
/usr/lib/graphene-<version>/runtime/{glibc,musl,...}-<version>/*.so

TODO: gdb, headers, -dev packages,

Command line tools

Those might be ELFs, python scripts and/or shell scripts. I'm not sure if those need to be standardised. Possibly related: #13.

/usr/bin/graphene-direct
/usr/bin/graphene-sgx
/usr/bin/graphene-sgx-sign
/usr/bin/graphene-sgx-get-token
/usr/bin/graphene-manifest
# and possibly others

Package names

Packages (whether deb, rpm or else) should be named:

`graphene`

Metapackage depending on graphene-direct and graphene-sgx.

`graphene-direct`

Non-SGX Graphene (libpal.so, loader, /usr/bin/ loader script, manpage).

`graphene-sgx`

metapackage depending on the latest version

`graphene-sgx-1.23`

Graphene SGX compiled against upstream 5.11+ driver.

`graphene-sgx-oot-1.23`

Graphene SGX compiled against SDK driver.

`graphene-sgx-dcap-1.23`

Graphene SGX compiled against DCAP driver. (Not sure if needed, possibly the same as upstream driver).

`graphene-libos`

libsysdb.so

`graphene-runtime-glibc-1.23`

runtime/glibc-1.23

`graphene-runtime-musl-4.56`

Musl.

Not part of this RFC

All those don't need to be standardised, instead the decision should be taken by packager for the respective distro.

Documentation, manpages, readmes, ... all this goes according to distro guidelines.
Python package (depends on distro and system Python version).
Non-essential tools: Linux-SGX/tools, GSC, ...

Move duplicated constants to common headers

There are some constants which are duplicated between PALs and LibOS (e.g. RED_ZONE_SIZE is being introduced by both gramineproject/graphene#625 and gramineproject/graphene#561).

We should move such constants and macros to common headers.

RFC: graphene invocation

Introduction

I'd like to propose a unified way of running files in graphene. This will certainly be a very visible, very incompatible change, so I'd like to write some explanation and seek review, so we can get this right. Also, I'm not that knowledgeable about the architecture, so I apologize in advance if i had some misconceptions.

Requirements

command line invocations (hard)
invocation from shebang's manifest (nice to have)
easy way of running under GDB (nice to have, developers love it, so probably a must)

Status quo ante

Today it is mainly done with Runtime/pal_loader script. Script may be invoked in various ways. SGX can be specified in two ways, as argument (without -- in front, which breaks convention) or environment variable. Interaction between those is either-or, ie. it is not possible to "disable again", only ensure it is enabled. The same is true for GDB.

It is autodetected if the file specified is executable or manifest, and different decisions taken. Which one, it is not always obvious.

Related work

#14
that CMake/meson/whatever effort to rewrite buildsystem, which I can't find the issue number for
gramineproject/graphene#994

the proposal (DRAFT)

intended interface

From command line:

graphene[-sgx] {--manifest|-f} MANIFEST, and manifest contains path to executable and argv
graphene[-sgx] {--manifest|-f} MANIFEST [--] EXECUTABLE [ARGS ...], and the manifest contains neither executable nor argv
graphene[-sgx] [--] EXECUTABLE [ARGS] for the cases when the explicit manifest file specification is not needed (it is either autogenerated, discovered from standard location like /etc/, inferred from executable name or for other reasons)
graphene-gdb graphene[-sgx] [...], which does required GDB magic (this is expected to be obsoleted when rewriting for in-tree SGX driver to be upstreamed)

No Linux, Linux-SGX or FreeBSD, since user knows what the uname is or should be.

It is an error to specify arguments twice, both in command line and in manifest, even if they match.

Environment variables are not accepted.

From shebang in the manifest:

#!/usr/bin/graphene[-sgx] -f

This will be documented as One, True, Portable Way irrespective of any specific distro-related issues (for *BSD installation guidelines, see below).

installation

After proper installation (apt-get install or make DESTDIR=/ install), the following paths are available (either executables or symlinks thereto):

/usr/bin/graphene
/usr/bin/graphene-sgx
/usr/bin/graphene-gdb

On *BSD, the executables may be in /usr/local, but symlinks in /usr/bin are nevertheless mandatory at least for graphene and graphene-sgx. graphene-gdb symlink is optional.

LibOS does not know about PAL allocating pages

Currently when PAL allocates a page (for its internal purposes e.g. memory for event objects) LibOS has no idea about it. This sometimes leads to LibOS allocating a page on the same address and lies of Man ALL IS LOST ALL IS LOST the pon̷y he comes he c̶̮om~~es he come~~s the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*gle̠̅s ͎a̧͈͖re not reaͨl ZALGΌ IS TOƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ HE COMES
(source)

Rename external APIs

(As discussed in gramineproject/graphene#2662)

Graphene's patched code currently uses an API that contains some internal names ("shim_*", "syscalldb"). I propose the following changes:

SYSCALLDB macro -> GRAPHENE_SYSCALL
shim_register_library and possibly other functions -> graphene_register_library
SHIM_*_OFFSET -> GRAPHENE_*_OFFSET
shim_entry_api.h -> graphene_entry_api.h
- Other shim_*.h files are internal, but this one is a public header file included in patched code

etc.