Git Product home page Git Product logo

learned-image-compression-with-gmm-and-attention's Introduction

Learned-Image-Compression-with-GMM-and-Attention

This repository contains the code for reproducing the results with trained models, in the following paper:

Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules. arXiv, CVPR2020.

Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto

Paper Summary

Recently, learned compression methods exhibit a fast development trend with promising results. However, there is still a performance gap between learned compression algorithms and reigning compression standards, especially in terms of widely used PSNR metric. In this paper, we explore the remaining redundancy of recent learned compression algorithms. We have found accurate entropy models for rate estimation largely affect the optimization of network parameters and thus affect the rate-distortion performance. We propose to use discretized Gaussian Mixture Likelihoods to parameterize the distributions of latent codes, which can achieve a more accurate and flexible entropy model. Besides, we take advantage of recent attention modules and incorporate them into the network architecture to enhance the performance. Experimental results demonstrate our proposed method achieves a state-of-the-art performance compared to existing learned compression methods on both Kodak and high-resolution datasets.

To our knowledge our approach is the first work to achieve comparable performance with latest compression standard Versatile Video Coding (VVC) regarding PSNR. More importantly, our approach can generate more visually pleasant results when optimized by MS-SSIM.

Environment

    pip3 install range-coder
    pip3 install tensorflow-compression or 
    pip3 install tensorflow_compression-1.2-cp36-cp36m-manylinux1_x86_64.whl

Test Usage

  • Download the pre-trained models (this model is optimized by MS-SSIM using lambda = 14) and unzip it.

  • Put your images to the directory valid/ and run the py files

    python3 encoder.py
    python3 decoder.py

Reconstructed Samples

Comparisons of reconstructed samples are given in the following.

Evaluation Results

Notes

This implementations are not original codes of our CVPR2020 paper, because original code is based on Tensorflow 1.9.0 and many features have been removed. This repo is a re-implementation, but the core codes are almost the same and results are also consistent with original results. This repo is also submitted to CVPR Workshop and Challenge on Leanred Image Challenge ([CLIC] (http://www.compression.cc/)) with the entry Kattolab in the Leaderboard.

If you think it is useful for your reseach, please cite our CVPR2020 paper. Our original RD data in the paper is contained in the folder RDdata/.

@inproceedings{cheng2020image,
    title={Learned Image Compression with Discretized Gaussian Mixture Likelihoods and Attention Modules},
    author={Cheng, Zhengxue and Sun, Heming and Takeuchi, Masaru and Katto, Jiro},
    booktitle= "Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)",
    year={2020}
}

learned-image-compression-with-gmm-and-attention's People

Contributors

zhengxuecheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

learned-image-compression-with-gmm-and-attention's Issues

About the GMM

Could you please tell me how to construct the loss function of the GMM model in the process of network training?
I am now puzzled by the Gaussian mixture model training process.

Thanks very much!

In network.py, why do you separate "phi" and pad it when you are encoding y?

Sorry for bothering you, Prof. Cheng. When I read your code, I have two questions. Firstly, now that you have adopted entropy bottleneck in tensorflow_compression package to encode z or decode string, why did you choose rangecoder package to encode y or decode string? Why didn't you continue using entropy model to complete the next encoding and decoding process? Secondly, why do you separate "phi" and pad it when you are encoding y? What's the function of phi (phi is gotten by a hyper_synthesis function)?

About Discretized Gaussian Mixture Likelihoods?

Thanks for your paper and code!

I doubt the code for Discretized Gaussian Mixture Likelihoods, where I find the noise in code(network.py) at 371 line , it doesn't match the description in paper as eq.8. Since when we test(compress) an image, the inputs y_hat is just an intger with np.int operation. So the code at line 375 is redundant and misses the goal for discretized operation.

write the train function according to the compress function

Sorry for bothering you, Prof. Cheng. I am a beginner in image compression. I am reading your code and want to write the train function according to your compress function. But I met some questions and need your help. First, you separate "y", "phi" to "tiny_y" and "tiny_phi" when you are using entropy_parameter function. I wonder whether I can directly put "y" and "phi" into the entropy_parameter function to get y_tilde and y_likehoods in training, which is essential for the next steps. Second, could you please tell me where I should pay more attention when I write the training function?Thank you. Look forward to your reply.

Encoding Decoding Time

How much time your method take for encoding and decoding the image ??
what I can see is that in the Kodak dataset it is taking around 40 sec for encoding and 5sec for decoding. Is it correct??
@ZhengxueCheng

python encoder.py causes NameError: name 'model_flag' is not defined

Hi, thanks for sharing the code. When I run the 'python encoder.py', I encountered the following error message:

Traceback (most recent call last):
  File "encoder.py", line 27, in <module>
    main()
  File "encoder.py", line 23, in main
    compress(input, output, num_filters, checkpoint_dir)
  File "/home/yq223369/workspace/ImageCompression/GMMA-tf/network.py", line 572, in compress
    fileobj.write(np.array(model_flag, dtype=np.uint8).tobytes())

NameError: name 'model_flag' is not defined

I found in network.py, the variable 'model_flag' is indeed undefined, could you help me with this error?

Hello Cheng,a request about original code!

Hello, I am a graduate student, and also a beginner of image compression. I am learning this code, but there is no training part in the code, and there is no training data set ,too. It would be great if you are willing to share these(original code include train data) with me. Thank you very much! look forward to your reply.

My email: [email protected] .

And if you are a Chinese:
你好,我是一名研究生新生,也是图像压缩的初学者,正在学习你的这篇论文的代码,但是代码中没有训练部分,也没有训练用数据集,所以希望如果您愿意分享给我这些就再好不过了,万分感谢!

我的邮箱:[email protected]

Hello Cheng, a request about your original code!

Hello, I am a graduate student, and also a beginner of image compression. I am learning this code, but there is no training part in the code, and there is no training data set ,too. It would be great if you are willing to share these(original code include train data) with me. Thank you very much! look forward to your reply.

My email: [email protected] .

And if you are actually a Chinese:
你好,我是一名研究生新生,也是图像压缩的初学者,正在学习你的这篇论文的代码,但是代码中没有训练部分,也没有训练用数据集,所以希望如果您愿意分享给我这些就再好不过了,万分感谢!

我的邮箱:[email protected]

NaN loss during training process

Dr. Cheng, bother you again. I have constructed a train function as your compress function. The difference is that my whole model is based on mean-scale architecture in paper "Joint Autoregressive and Hierarchical Priors for Learned Image Compression". I used your range-coder and the function of calculating likelihoods, which you have annotated in code "network.py". The question is that I have met with NaN loss, have you met with this case in your training debugging? If yes, how do you solve it.

AttributeError: module 'tensorflow.compat.v1' has no attribute 'variabled_scope'

TF 1.14.0
python 3.6.4

Traceback (most recent call last):
File "encoder.py", line 31, in
main()
File "encoder.py", line 27, in main
compress(input, output, num_filters, checkpoint_dir)
File "/home/qiusuo/Desktop/Learned-Image-Compression-with-GMM-and-Attention-master/network.py", line 518, in compress
_, _, y_means, y_variances, y_probs = entropy_parameter(tiny_phi, tiny_y, num_filters, training=False)
File "/home/qiusuo/Desktop/Learned-Image-Compression-with-GMM-and-Attention-master/network.py", line 369, in entropy_parameter
with tf.variabled_scope("entropy_parameter", reuse=tf.AUTO_REUSE):
AttributeError: module 'tensorflow.compat.v1' has no attribute 'variabled_scope'
(python36) qiusuo@qiusuo-CN15S:~/Desktop/Learned-Image-Compression-with-GMM-and-Attention-master$ python encoder.py
Requirement already satisfied: tensorflow-compression==1.2 from file:///home/qiusuo/Desktop/Learned-Image-Compression-with-GMM-and-Attention-master/tensorflow_compression-1.2-cp36-cp36m-manylinux1_x86_64.whl in /home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages (1.2)
Requirement already satisfied: scipy>=1 in /home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages (from tensorflow-compression==1.2) (1.5.4)
Requirement already satisfied: numpy>=1.14.5 in /home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages (from scipy>=1->tensorflow-compression==1.2) (1.19.4)
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/qiusuo/anaconda3/envs/python36/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
1.14.0
kodim21.png
Traceback (most recent call last):
File "encoder.py", line 31, in
main()
File "encoder.py", line 27, in main
compress(input, output, num_filters, checkpoint_dir)
File "/home/qiusuo/Desktop/Learned-Image-Compression-with-GMM-and-Attention-master/network.py", line 518, in compress
_, _, y_means, y_variances, y_probs = entropy_parameter(tiny_phi, tiny_y, num_filters, training=False)
File "/home/qiusuo/Desktop/Learned-Image-Compression-with-GMM-and-Attention-master/network.py", line 369, in entropy_parameter
with tf.variabled_scope("entropy_parameter", reuse=tf.AUTO_REUSE):
AttributeError: module 'tensorflow.compat.v1' has no attribute 'variabled_scope'

Cross-platform encoding and decoding

Dear Zhengxue,

I tested cross-platform encoding and decoding with your model, the reconstructed images shows error.
Both encoding and decoding are use CPU, just on different platform.

Did you met cross-platform issue when participating CLIC 2020? If so, how do you solve the problem?

Look forward to your reply! Thank you.

About mask and pad size?

Hi, sorry for asking question again :(

pad_size in network.py(589 line) is 2, and we extract y_mean_values/y_variances_values/y_probs_values(616/617/618 line) with pad_size, I think it's right when we start the masked_conv2d. However when the paddings of y_hat_value run out, we here use all the values from y_hat_value, not zero from paddings, here using pad_size(589 line) is out of place to estimate the mean and variance!!

And I also notice the question#3, I wonder the parallel operation between training and testing. Since we may feed the whole y_hat into the mask CNN and get its estimation when training. However the extracted y_hat around h_idx:h_idx+kernel w_idx:w_idx+kernel are used when testing. But you may train the network different from what I describe!

How to train

Hi ZhengxueCheng~

Is it convenient for you to open the end-to-end training code?
I would appreciate it if you could.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.