HarmonyBatch, a cost-efficient resource provisioning framework designed to achieve predictable performance for multi-SLO DNN inference with heterogeneous serverless functions
HarmonyBatch comprises mainly three modules: a model profiler, a performance predictor, and a function provisioner. The model profiler profiles the model with both CPU and GPU functions to acquire model-specific and hardware-specific coefficients. The performance predictor can estimate the inference cost using our performance model. It then guides the function provisioner to identify an appropriate group strategy and function provisioning plans that guarantee the SLOs of all applications. The configurations provided by HarmonyBatch include both batch-related configurations and resource-related configurations. The batch-related configurations will be sent to the batch manager to control the request queue and resource-related configurations will be sent to the serverless platform to update the function.
Given a set of inference applications
where
git clone https://github.com/HarmonyBatch/HarmonyBatch
pip install requirements.txt
Set up the model name (i.g., VGG19) and algorithm name (i.g., HarmonyBatch) within the configuration file in conf/config.json
.
Set the application SLOs and arrival rates in main.py
and run the algorithm:
cd HarmonyBatch
python3 main.py
After runing the code, you will get the provisioning plan as follows for example:
Provisioning plan:
The configurations of the group 0 is:
cpu: 1.60
batch: 1
rps: 5
timeout: 0.0
cost: 4.350e-05
slo: 0.5
----
The configurations of the group 1 is:
...
Jiabin Chen, Fei Xu, Yikun Gu, Li Chen, Fangming Liu, Zhi Zhou, “HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions”.
We have also uploaded our paper to arxiv, and we would encourage anybody interested in our work to cite our paper. Our paper as been accepted by IWQOS2024.
@misc{chen2024harmonybatch,
title={HarmonyBatch: Batching multi-SLO DNN Inference with Heterogeneous Serverless Functions},
author={Jiabin Chen and Fei Xu and Yikun Gu and Li Chen and Fangming Liu and Zhi Zhou},
year={2024},
eprint={2405.05633},
archivePrefix={arXiv},
primaryClass={cs.DC}
}