Modern data analytics pipelines are highly dynamic, as they are constantly monitored and fine-tuned by both data engineers and scientists. Recent systems for managing pipelines make it easy to create, deploy, and track their evolution. However, they do not consider the privacy of both the data and the computation, which becomes a challenge for adoption as many of these systems are offered as services on the public cloud with less or no trust. Our goal is to design a secure and efficient data analytics pipeline management system, in which multiple parties could collaborate by providing datasets and libraries to form pipelines.
To this end, we present SecCask, which leverages hardware-based trusted execution environments (TEEs) to ensure privacy. Our system overcomes the problems of a naive design that isolates complete pipeline execution in one enclave by administering enclaves and runtimes. Moreover, the objective of supporting fast-changing pipelines with components requiring low latency brings the challenge of reducing the cold start overhead - the main performance bottleneck in modern hardware. Our approach consists of reusing trusted runtimes for different pipeline components and caching them to avoid the cost of initialization.
- Secure collaborative pipeline evolution with Intel SGX
- Runtime reusing and worker caching based on worker-component compatibility check
- Pipeline-aware Caching (PAC) strategy
-
Install Gramine LibOS globally. For reproducibility, clone, build and install the LibOS from this repo.
-
Clone, build and install EncFSPython 3.9.13 to
$PYTHONHOME
. -
Copy project directory to
$APP_HOME
. -
Build SecCask binary:
# In $APP_HOME $ mkdir build && cd build $ conan install .. -pr=default -s build_type=Debug --build=missing $ cmake -DCMAKE_BUILD_TYPE=Debug -GNinja .. $ ninja
-
Use the requirements file to create a Python 3.9 virtual environment including necessary packages for the system as well as those for experiments:
# In $APP_HOME $ python -m venv venv $ source venv/bin/activate (venv) $ python -m pip install -r requirements.txt
-
Build and install PyTorch from source using the following command. For more information, see this Dockerfile.
git clone https://github.com/pytorch/pytorch \ && cd pytorch \ && git checkout v1.4.1 \ && git submodule update --init --recursive \ && DEBUG=0 USE_CUDA=0 USE_MKLDNN=0 USE_OPENMP=0 ATEN_THREADING=NATIVE BUILD_BINARY=0 \ CFLAGS="-D__EMSCRIPTEN__" \ python setup.py install
-
Modify
$APP_HOME
and$ENCFSPYTHON_HOME
ingramine_manifest/Makefile
. -
Build SecCask Gramine manifest in the virtual environment:
(venv) $ cd $APP_HOME/gramine_manifest (venv) $ make && make SGX=1
NOTE: The build process may fail with
ModuleNotFoundError: No module named 'graminelibos'
. If so, copy the corresponding package from the system-level package directory to that ofvenv
:(venv) $ cp -r /usr/lib/python3/dist-packages/graminelibos /PATH/TO/VENV/lib/python3.9/site-packages
-
Comment line
SGX = "1"
ingramine_manifest/seccask.manifest
:[loader.env] # SGX = "1"
Run experiment $EXP_NAME
with command:
# In $APP_HOME
(venv) $ gramine-direct ./gramine_manifest/seccask --coordinator --mode=tls --manifest=sklmnist
Run experiment $EXP_NAME
with command:
# Without remote attestation
(venv) $ gramine-sgx ./gramine_manifest/seccask --coordinator --mode=tls --manifest=sklmnist
# With remote attestation
(venv) $ gramine-sgx ./gramine_manifest/seccask --coordinator --mode=ratls --manifest=sklmnist
See CommonCommands.md.
Please remember to replace the absolute paths with your folder structure.
See commented configuration file .conf/config.ini
.
SecCask supports running experiments specified by experiment manifests. Sample manifests are available under exp/
directory. The $EXP_NAME
in the experiment running commands are that of the manifest file name without .yaml
. For example, to execute the experiment specified by mlp2.yaml
, set $EXP_NAME
to mlp2
.
Please download the experimental workflow from this link and extract all components to $EXP_PREFIX
. The YAML files in the exp/
directory are required to be modified as well:
...
env:
test_source_base: The absolute path of $EXP_PREFIX
workspace_base: The temp path set in config.ini
...
- mlp2
- mnist
- resnet18_c10
- sklmnist
- sklmnist_v1
- vgg16_c10
- vgg16_c10_v1
- cs_autolearn
- cs_sa