This repository contains authors' implementation of several deep hierarchical VAE-based methods related to lossy image compression.
Code is under active development, and the API is subject to change.
Progressive coding: our models learn a deep hierarchy of latent variables and compress/decompress images in a coarse-to-fine fashion. This feature comes from the hierarchical nature of ResNet VAEs.
Compression efficiency: our models are powerful in terms of both rate-distortion (bpp-PSNR) and decoding speed.
Model Name | CPU* Enc. | CPU* Dec. | 1080 ti Enc. | 1080 ti Dec. | BD-rate* |
---|---|---|---|---|---|
qres34m |
0.899s | 0.441s | 0.213s | 0.120s | -3.95 |
qarv_base |
0.880s | 0.295s | 0.211s | 0.096s | -6.54 |
*Time is the latency to encode/decode a 512x768 image, averaged over 24 Kodak images. Tested in plain PyTorch (v1.13 + CUDA 11.7) code, ie, no mixed-precision, torchscript, ONNX/TensorRT, etc.
*CPU is Intel 10700k.
*BD-rate is w.r.t. VTM 18.0, averaged on three common test sets (Kodak, Tecnick TESTIMAGES, and CLIC 2022 test set).
- Lossy Image Compression with Quantized Hierarchical VAEs [arXiv]
- Published at WACV 2023, Best Algorithms Paper Award
- [Abstract]: a 12-layer VAE model named QRes-VAE. Good compression performance.
- [Code]: lossy-vae/lvae/models/qres
- QARV: Quantization-Aware ResNet VAE for Lossy Image Compression [arXiv]
- Technical report
- [Abstract]: improved version of QRes-VAE. Variable-rate, faster decoding, better performance.
- [Code]: lossy-vae/lvae/models/qarv
- An Improved Upper Bound on the Rate-Distortion Function of Images
- [Abstract]: a 15-layer VAE model used to estimate the information R(D) function. Shows that -30% BD-rate w.r.t. VTM is theoretically achievable.
- [Code]: lossy-vae/lvae/models/rd
Requirements:
- Python
- PyTorch >= 1.9 : https://pytorch.org/get-started/locally
- tqdm :
conda install tqdm
- CompressAI : https://github.com/InterDigitalInc/CompressAI
- timm >= 0.8.0 : https://github.com/huggingface/pytorch-image-models
Download and Install:
- Download the repository;
- Modify the dataset paths in
lossy-vae/lvae/paths.py
. - [Optional] pip install the repository in development mode:
cd /pasth/to/lossy-vae
python -m pip install -e .
Detailed usage is provided in each model's folder
COCO
- Download the COCO dataset "2017 Train images [118K/18GB]" from https://cocodataset.org/#download
- Unzip the images anywhere, e.g., at
/path/to/datasets/coco/train2017
- Edit
lossy-vae/lvae/paths.py
such that
known_datasets['coco-train2017'] = '/path/to/datasets/coco/train2017'
Kodak (link), Tecnick TESTIMAGES (link), and CLIC (link)
python scripts/download-dataset.py --name kodak --datasets_root /path/to/datasets
clic2022-test
tecnick
Then, edit lossy-vae/lvae/paths.py
such that known_datasets['kodak'] = '/path/to/datasets/kodak'
, and similarly for other datasets.
TBD