weitaovan / l-gm-loss Goto Github PK

Implementation of our accepted CVPR 2018 paper "Rethinking Feature Distribution for Loss Functions in Image Classification"

License: MIT License

CMake 2.63% Makefile 0.67% C++ 78.32% Cuda 7.71% MATLAB 0.85% Python 9.45% Shell 0.37%

lgm cvpr2018

l-gm-loss's Introduction

L-GM-loss

For Caffe and Tensorflow.
Implementation of our CVPR 2018 paper "Rethinking Feature Distribution for Loss Functions in Image Classification".
Paper authors: Weitao Wan, Yuanyi Zhong, Tianpeng Li, Jiansheng Chen.

Experiments in our paper were carried out through the Caffe implementation.
The folder tensorflow contains the Tensorflow demo.

Code is written by Yuanyi Zhong and Weitao Wan.

Abstract

We propose a large-margin Gaussian Mixture (L-GM) loss for deep neural networks in classification tasks. Different from the softmax cross-entropy loss, our proposal is established on the assumption that the deep features of the training set follow a Gaussian Mixture distribution. By involving a classification margin and a likelihood regularization, the L-GM loss facilitates both a high classification performance and an accurate modeling of the training feature distribution. As such, the L-GM loss is superior to the softmax loss and its major variants in the sense that besides classification, it can be readily used to distinguish abnormal inputs, such as the adversarial examples, based on their features' likelihood to the training feature distribution. Extensive experiments on various recognition benchmarks like MNIST, CIFAR, ImageNet and LFW, as well as on adversarial examples demonstrate the effectiveness of our proposal.

Instructions

(For tensorflow, please enter the tensorflow folder)

Install this caffe (see Caffe's Official Guide for Installation if you are new to Caffe)
Examples for CIFAR-100 in ./examples/cifar100

  ./train.sh 0 simple  # 0 is the GPU id, simple is the folder containing network definitions and solver

Layer details

Specify margin parameter α and likelihood weight λ, which is margin_mul and center_coef in the layer param, respectively.

    margin_mul {
      policy: STEPUP
      value: 0.1
      step: 5000 
      gamma: 2
      max: 0.3 
}

This specifies a gradually growing value for α (multiplied by 2 every 5000 iterations, with initial value 0.1 and final maximum value 0.3), which is helpful for training.

other indicators

update_sigma: false

Fix the variances to initial values (1.0).

isotropic: true

The variances of different dimensions are identical.

Data

We've described how the data is pre-processed in our paper. For example, the CIFAR-100 training data (32x32) is padded to 40x40 with zero pixels and then randomly cropped with a 32x32 window for training.
In the CIFAR-100 example, we use data in HDF5 format. You can choose other formats, changing the data layer accordingly.

The CIFAR100 training data (with or without augmentation) and test data can be downloaded from Baidu Drive(.h5 file).

Citations

If you find this work useful, please consider citing it.

@inproceedings{LGM2018,
  title={Rethinking Feature Distribution for Loss Functions in Image Classification},
  author={Wan, Weitao and Zhong, Yuanyi and Li, Tianpeng and Chen, Jiansheng},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

l-gm-loss's People

Contributors

Stargazers

Watchers

l-gm-loss's Issues

L-GM-LOSS源码

您好，我是做应用方面的，有没有pytorch的L-GM-LOSS的源码呢，能够直接使用的。

About add the feature variances

I am trying to add the learnable feature variances in the python code,but I found that the acc will decreased a lot.
The add strategy as below:
1.I reshape the var,mean as the (1,num_classes,feature_dim),reshape the input_feature as (batchsize,1,feature_dim)
2. using the input_feature subtract the mean and divise the var get a tensor with shape ((batchsize,num_classes,feature_dim)).
3.using the tensor batch_dot the mean and get the tensor with shape (batchsize,num_classes,num_classes)
4.Then I take the value of the diagonal of the two or three dimensions and get the final tensor as the margin_distance.
BUT the acc result is terrible,can you give me some advice?

Question about m_add_

I am reading your cpp code and have some question about the parameter m_add_,
//////////////////////////////////////////
template
static global void margin_top(const int M_, const int N_,
Dtype *top_data, const Dtype *label, const Dtype margin_mul, const Dtype margin_add) {

CUDA_KERNEL_LOOP(i, M_) {
    const int y = (int)label[i];
    top_data[i*N_ + y] += top_data[i*N_ + y] * margin_mul - margin_add;
    
}

}
///////////////////////////////////////
but I do not find the Initialization value of the margin_add in the cifar100 example's trainval.prototext. Can you tell me something about the parameter margin_add?
And I found you do not use the margin_add parameter in your tensorflow edition. Is this parameter not matter?
Look forward to your reply

About the adversarial experiments

Good work!
Could you share the codes of the adversarial experiments in paper? Thank you so much!

Is variance update code contained in tensorflow code? and about caffe

It seems that only tensor for mean variables are written in the definition of lgm_logits, def lgm_logits(...)

Variance variable should be defined and updated using gradient according to your paper,

but I can't find out that kind of code.

Which part of the tensorflow code are related with variance update?

Maybe I'm novice to caffe, so I can't find the caffe code in your documents.

It would be appreciated if you tell me which file is related to the main caffe code.

How to update mean and variance?

Hi, I'm currently using lgm loss in my research project, i'm wondering how to update mean and variance under the setting?

How to use mixup in LGM_loss?

question about evaluation

In your tensorflow code,I found you use the dist as the logits_eval and take the argmax as the right label . But the dist is the distance between the example feature and the distribution mean ,why the biggest distance distribution is the right label?

Questions about Face Verification

Hi, thanks for your work.
I am doing some experiments on LFW dataset for my paper, finding your experiments on Face Verification are interesting but unclear. And I could not reproduce the same result as yours.
Could you share your Face verification codes with me?
Thank you very much.

Issue about cross entropy loss in tensorflow

Hi,
The cross entropy loss you used in Tensforflow is 'sparse_softmax_cross_entropy_with_logits()'. however, this loss function will perform a 'softmax' on 'logits'. Since your lgm loss doesn't contain a 'softmax' operation, I wonder whether the loss function is correctly used here?

I am looking forward to hearing from you.

How to update the variance

thank you very much for sharing your code. I want to know how you update the variance of each class, or just take a constant value?

About loss

I have a question that why after classdistance layer, there is still a softmaxloss layer, so you change your distribution again.

Is eq. 23 implemented correctly?

In eq. 23 in the paper, derivative w.r.t x_i is
[(1 - p_zi)(1 + alpha) + lambda] Sigma^-1 (x - myu_zi) - ... ,

but the implementation (class_distance_layer.cu, line 227) looks like
[(1 - p_zi)(1 + alpha) Sigma^-1 + lambda] (x - myu_zi).

I wonder either the writing or the implementation is incorrect, though it is highly likely that it is I who is wrong (I am not very good at math).

question about feature var

In the tensorflow readme ,you said "This is the tensorflow demo for the LGM loss.
While the caffe version is more complete, this version does not support updating of the feature variances." ,but ,in the caffe edition,you also said "Fix the variances to initial values (1.0).". What is the difference?

when tensorflow implementation release

Hello!
Thanks for your wonderful work and sharing.
Just wonder when will release the tensorflow version of L-GM loss.

Thanks a lot.

Tensorflow code is "L-GM-Loss" or "L-Center-Loss"

Hello, I am very interested in this research project and I believe what you demonstrated in the paper is novel. However, I am not sure is this the same in your original caffe's code or not but in your tensorflow code, line 451-454 of resnet_model.py, what you are doing is "Margined Center Loss" instead of "Margined GM loss". There is no covariance involved in your calculating code (or your covariance matrix is identity matrix) which I believe is completely different with what you showed in the paper.

is L-GM-loss updated with keras code?

Hi, thank you for this wonderful work.
And I wonder that whether you will release the keras version of L-GM loss.
Look forward to your reply.

How was the neg_sqr_dist formulated? Could you point out the equation in the paper?

In the tensorflow folder, the python code resnet_model.py has the following lines
"
XY = tf.matmul(feat, means, transpose_b=True)
XX = tf.reduce_sum(tf.square(feat), axis=1, keep_dims=True)
YY = tf.reduce_sum(tf.square(tf.transpose(means)), axis=0, keep_dims=True)
neg_sqr_dist = -0.5 * (XX - 2.0 * XY + YY)
"
I have read the paper but couldn't understand how the neg_sqr_dist = -0.5 * (XX - 2.0 * XY + YY) equation was formed. Could you point out the equation in the paper which gave -0.5 * (XX - 2.0 * XY + YY) ?