kaidic / ldam-drw Goto Github PK
View Code? Open in Web Editor NEW[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Home Page: https://arxiv.org/pdf/1906.07413.pdf
License: MIT License
[NeurIPS 2019] Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Home Page: https://arxiv.org/pdf/1906.07413.pdf
License: MIT License
It was a very interesting paper to read :)
I have some questions regarding the hyper-parameters for LDAM loss.
What is the values of C
, the hyper-parameter to be tuned (according to the paper)? Is it (max_m / np.max(m_list))
introduced in below?
https://github.com/kaidic/LDAM-DRW/blob/master/losses.py#L28
Is s=30
in LDAM loss also a hyper-parameter to be tuned? I could not find any explanation in the paper. Did I miss something?
What were the tendency of these hyper-parameters when training? How do these hyper-parameter selections are related to the imbalance level (or different datasets)? The found parameters work for other datasets in the paper (Tiny ImageNet, iNaturalist)?
Thanks.
Thanks for your code a lot!
I have read your paper and code, it's really a good idea, but here I have a question about LDAM Loss. It's in the last line where we call the basic cross_entropy function in pytorch.
def forward(self, x, target):
index = torch.zeros_like(x, dtype=torch.uint8)
index.scatter_(1, target.data.view(-1, 1), 1)
index_float = index.type(torch.cuda.FloatTensor)
# self.m_list[None, :] add one dimension to the origin m_list
batch_m = torch.matmul(self.m_list[None, :], index_float.transpose(0, 1))
# equivalently transpose
batch_m = batch_m.view((-1, 1))
x_m = x - batch_m
# only the target labelpostion is x_m
output = torch.where(index, x_m, x)
return F.cross_entropy(self.s * output, target, weight=self.weight)
why the output is multiplied by s(here is 30 times), just to make the loss greater? However, we didn't do this to the Focal loss
I found the lr in log_train.csv is multiplied 0.1, and I found in the line marked TODO was written like this
data_time=data_time, loss=losses, top1=top1, top5=top5, lr=optimizer.param_groups[-1]['lr'] * 0.1)) # TODO
also can be seen in:
https://github.com/kaidic/LDAM-DRW/blame/master/cifar_train.py#L291
I wonder why lr times 0.1?
Hi,I meet "AttributeError" when running "cifar_train.py". Could you please tell me how to fix it ?
Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./datasets/imbalance_cifar10/cifar-10-python.tar.gz
212664376it [00:19, 38731722.29it/s]Traceback (most recent call last):
File "/xinfu/code/long_tail/BBN/main/train.py", line 69, in
train_set = eval(cfg.DATASET.DATASET)("train", cfg)
File "/xinfu/code/long_tail/BBN/lib/dataset/imbalance_cifar.py", line 25, in init
img_num_list = self.get_img_num_per_cls(self.cls_num, imb_type, imb_factor)
File "/xinfu/code/long_tail/BBN/lib/dataset/imbalance_cifar.py", line 44, in get_img_num_per_cls
img_max = len(self.data) / cls_num
AttributeError: 'IMBALANCECIFAR10' object has no attribute 'data'
Thanks for your paper and your code, they are great work and help me a lot.
I did experiments on tiny imagenet dataset following the settings revealed on your paper, howerer i can't achieve similar results, for
long tailed 1:100 tiny imagenet, the top-1 validation error I got is:
ERM SGD: 80.05
LDAM SGD: 72.8
It has a big gap with the results showed in your paper. So I wonder if there is any setting or trick I have missed?
In the you mentioned:<We perform
1 crop test with the validation images.> I wonder how it is done specifically.
For ResNet-18, I use:
backbone = models.resnet18(pretrained=True)
backbone.avgpool = nn.AdaptiveAvgPool2d(1)
num_ftrs = backbone.fc.in_features
if USE_NORM:
backbone.fc = NormedLinear(num_ftrs, 200)
else:
backbone.fc = nn.Linear(num_ftrs, 200)
Is it correct?
Looking forward to your reply, thank you very much!
Please let me know, thanks
Thanks for sharing your great job.
I have a question about the point for sampler in your code.
train_sampler is first declared at L167 in cifar_train.py then, train_loader gets the sampler in L169-171.
This seems to be fine in itself. But in the middle of training(train+validation) there is a part which seems to be for sampler in L186-L208. I understand this part is need for LDAM and DRW, but I think this new train_sampler object does not affect train_loader.
How's your opinion?
Thnks!
Lines 186 to 208 in 3193f05
Hi @kaidic
Thanks for your fantastic work, but when I tried to reproduce the focal loss result, I found that when gamma=0.5, the focal loss would lead to nan loss during training, but the focal loss in this repo can make it.
I checked the two different designed focal loss carefully and found the forward progress of them are the same but model parameters became different after backward, I am quite confused, could you please give me some advice?
Thanks for your contribution again!
Thanks for your paper and your code, they are great work and help me a lot.
Your article says that DRW is based on the number of samples, but your code is based on the weight of CB. I want to know whether the DRW reported in your article is CE + CB or CE + 1 / N?
Thanks for your great work.
I've got a question here, since LDAM extends the margin softmax loss which is commonly used in face recognition, have you ever tried some experiments on some face-recognition datasets?
Hi,
I believe that you have a wrong implementation of focal loss. I hope I have not misunderstood the code. Although the wrong implementation of focal loss will not effect the method you proposed. I hope the authors will spend some time correcting it.
You should compute -(1-p)^r * log(p) for every sample in the batch.
However, after you use F.cross_entropy at line 21 of losses.py , the output is already a single "value".
You then use this value as p to compute focal loss which is completely wrong.
An obvious indication of the wrong implementation is that you can actually remove the .mean() at line 11 in losses.py without causing any errors. It shows that you're indeed dealing with a single value but not vectors.
This might explain why your implementation is so different from https://github.com/Hsuxu/Loss_ToolBox-PyTorch/blob/master/FocalLoss/FocalLoss.py
or https://github.com/clcarwin/focal_loss_pytorch/blob/master/focalloss.py
You can also check the previous work you've cited https://github.com/vandit15/Class-balanced-loss-pytorch/blob/master/class_balanced_loss.py where the key point is that they make sure "reduction=none" when using F.binary_cross_entropy_with_logits.
Hi,
Thank you for opening the source code! The cifar model now is initialized with likely random parameters. I wonder if these models can be initialized by any pretrained model?
Thank you!
Firstly, Thanks for you sharing your code and paper.
I read your paper and used your code to be impressed.
As i read this paper, there are comments for 2 cases.
But you know, unfortunately real data is more imabalanced and challenging.
Then Here's my question to you.
Do you think this LDAM-DRW loss also works to this dataset?
I'm doing experiments changing betas, and delta j (m_list) so on.
i would be very much appreciated if you answer to me !
Ty so much kaidic :)
Thank you for your great work!
I find the backone in your code isn't the general Resnet. They are very different from the general.
And I try to replace resnet32 in the paper mentioned with resnet34 but the loss cannot converge and turn to nan finally.
This is bash command I tried (resnet32 has been changed to resnet34 realized by torchvision)
python cifar_train.py --arch resnet32 --gpu 0 --imb_type exp --imb_factor 0.01 --loss_type LDAM --train_rule DRW
Could you please provide further explanation?
Hello, thanks for the paper and the code. I just want to confirm in the code snip:
elif args.train_rule == 'DRW':
train_sampler = None
idx = epoch // 160
betas = [0, 0.9999]
effective_num = 1.0 - np.power(betas[idx], cls_num_list) # when epoch < 160, effective_num=1 (no reweighting). When epoch >160, reweighting with beta=0.9999
per_cls_weights = (1.0 - betas[idx]) / np.array(effective_num)
per_cls_weights = per_cls_weights / np.sum(per_cls_weights) * len(cls_num_list)
per_cls_weights = torch.FloatTensor(per_cls_weights).cuda(args.gpu)
This is the implementation of Class-Balanced (slightly differ with inverse of freq reported in the paper). Any reason to select Beta=0.9999
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.