Git Product home page Git Product logo

l-gm-loss's Introduction

L-GM-loss

For Caffe and Tensorflow.
Implementation of our CVPR 2018 paper "Rethinking Feature Distribution for Loss Functions in Image Classification".
Paper authors: Weitao Wan, Yuanyi Zhong, Tianpeng Li, Jiansheng Chen.

Experiments in our paper were carried out through the Caffe implementation.
The folder tensorflow contains the Tensorflow demo.

Code is written by Yuanyi Zhong and Weitao Wan.

Abstract

We propose a large-margin Gaussian Mixture (L-GM) loss for deep neural networks in classification tasks. Different from the softmax cross-entropy loss, our proposal is established on the assumption that the deep features of the training set follow a Gaussian Mixture distribution. By involving a classification margin and a likelihood regularization, the L-GM loss facilitates both a high classification performance and an accurate modeling of the training feature distribution. As such, the L-GM loss is superior to the softmax loss and its major variants in the sense that besides classification, it can be readily used to distinguish abnormal inputs, such as the adversarial examples, based on their features' likelihood to the training feature distribution. Extensive experiments on various recognition benchmarks like MNIST, CIFAR, ImageNet and LFW, as well as on adversarial examples demonstrate the effectiveness of our proposal.

Instructions

(For tensorflow, please enter the tensorflow folder)

  ./train.sh 0 simple  # 0 is the GPU id, simple is the folder containing network definitions and solver

Layer details

  • Specify margin parameter α and likelihood weight λ, which is margin_mul and center_coef in the layer param, respectively.
    margin_mul {
      policy: STEPUP
      value: 0.1
      step: 5000 
      gamma: 2
      max: 0.3 
}

This specifies a gradually growing value for α (multiplied by 2 every 5000 iterations, with initial value 0.1 and final maximum value 0.3), which is helpful for training.

  • other indicators
update_sigma: false

Fix the variances to initial values (1.0).

isotropic: true

The variances of different dimensions are identical.

Data

We've described how the data is pre-processed in our paper. For example, the CIFAR-100 training data (32x32) is padded to 40x40 with zero pixels and then randomly cropped with a 32x32 window for training.
In the CIFAR-100 example, we use data in HDF5 format. You can choose other formats, changing the data layer accordingly.

The CIFAR100 training data (with or without augmentation) and test data can be downloaded from Baidu Drive(.h5 file).

Citations

If you find this work useful, please consider citing it.

@inproceedings{LGM2018,
  title={Rethinking Feature Distribution for Loss Functions in Image Classification},
  author={Wan, Weitao and Zhong, Yuanyi and Li, Tianpeng and Chen, Jiansheng},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2018}
}

l-gm-loss's People

Contributors

weitaovan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

l-gm-loss's Issues

L-GM-LOSS源码

您好,我是做应用方面的,有没有pytorch的L-GM-LOSS的源码呢,能够直接使用的。

About add the feature variances

I am trying to add the learnable feature variances in the python code,but I found that the acc will decreased a lot.
The add strategy as below:
1.I reshape the var,mean as the (1,num_classes,feature_dim),reshape the input_feature as (batchsize,1,feature_dim)
2. using the input_feature subtract the mean and divise the var get a tensor with shape ((batchsize,num_classes,feature_dim)).
3.using the tensor batch_dot the mean and get the tensor with shape (batchsize,num_classes,num_classes)
4.Then I take the value of the diagonal of the two or three dimensions and get the final tensor as the margin_distance.
BUT the acc result is terrible,can you give me some advice?

Question about m_add_

I am reading your cpp code and have some question about the parameter m_add_,
//////////////////////////////////////////
template
static global void margin_top(const int M_, const int N_,
Dtype *top_data, const Dtype *label, const Dtype margin_mul, const Dtype margin_add) {

CUDA_KERNEL_LOOP(i, M_) {
    const int y = (int)label[i];
    top_data[i*N_ + y] += top_data[i*N_ + y] * margin_mul - margin_add;
    
}

}
///////////////////////////////////////
but I do not find the Initialization value of the margin_add in the cifar100 example's trainval.prototext. Can you tell me something about the parameter margin_add?
And I found you do not use the margin_add parameter in your tensorflow edition. Is this parameter not matter?
Look forward to your reply

Is variance update code contained in tensorflow code? and about caffe

It seems that only tensor for mean variables are written in the definition of lgm_logits, def lgm_logits(...)

Variance variable should be defined and updated using gradient according to your paper,

but I can't find out that kind of code.

Which part of the tensorflow code are related with variance update?

Maybe I'm novice to caffe, so I can't find the caffe code in your documents.

It would be appreciated if you tell me which file is related to the main caffe code.

question about evaluation

In your tensorflow code,I found you use the dist as the logits_eval and take the argmax as the right label . But the dist is the distance between the example feature and the distribution mean ,why the biggest distance distribution is the right label?

Questions about Face Verification

Hi, thanks for your work.
I am doing some experiments on LFW dataset for my paper, finding your experiments on Face Verification are interesting but unclear. And I could not reproduce the same result as yours.
Could you share your Face verification codes with me?
Thank you very much.

How to update the variance

thank you very much for sharing your code. I want to know how you update the variance of each class, or just take a constant value?

About loss

I have a question that why after classdistance layer, there is still a softmaxloss layer, so you change your distribution again.

Is eq. 23 implemented correctly?

In eq. 23 in the paper, derivative w.r.t x_i is
[(1 - p_zi)(1 + alpha) + lambda] Sigma^-1 (x - myu_zi) - ... ,

but the implementation (class_distance_layer.cu, line 227) looks like
[(1 - p_zi)(1 + alpha) Sigma^-1 + lambda] (x - myu_zi).

I wonder either the writing or the implementation is incorrect, though it is highly likely that it is I who is wrong (I am not very good at math).

question about feature var

In the tensorflow readme ,you said "This is the tensorflow demo for the LGM loss.
While the caffe version is more complete, this version does not support updating of the feature variances." ,but ,in the caffe edition,you also said "Fix the variances to initial values (1.0).". What is the difference?

Tensorflow code is "L-GM-Loss" or "L-Center-Loss"

Hello, I am very interested in this research project and I believe what you demonstrated in the paper is novel. However, I am not sure is this the same in your original caffe's code or not but in your tensorflow code, line 451-454 of resnet_model.py, what you are doing is "Margined Center Loss" instead of "Margined GM loss". There is no covariance involved in your calculating code (or your covariance matrix is identity matrix) which I believe is completely different with what you showed in the paper.

How was the neg_sqr_dist formulated? Could you point out the equation in the paper?

In the tensorflow folder, the python code resnet_model.py has the following lines
"
XY = tf.matmul(feat, means, transpose_b=True)
XX = tf.reduce_sum(tf.square(feat), axis=1, keep_dims=True)
YY = tf.reduce_sum(tf.square(tf.transpose(means)), axis=0, keep_dims=True)
neg_sqr_dist = -0.5 * (XX - 2.0 * XY + YY)
"
I have read the paper but couldn't understand how the neg_sqr_dist = -0.5 * (XX - 2.0 * XY + YY) equation was formed. Could you point out the equation in the paper which gave -0.5 * (XX - 2.0 * XY + YY) ?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.