kjunelee / metaoptnet Goto Github PK

Meta-Learning with Differentiable Convex Optimization (CVPR 2019 Oral)

License: Apache License 2.0

Python 100.00%

meta-learning few-shot convex-optimization few-shot-learning image-classification few-shot-recognition cvpr2019 metalearning

metaoptnet's Introduction

Meta-Learning with Differentiable Convex Optimization

This repository contains the code for the paper:
Meta-Learning with Differentiable Convex Optimization
Kwonjoon Lee, Subhransu Maji, Avinash Ravichandran, Stefano Soatto
CVPR 2019 (Oral)

Abstract

Many meta-learning approaches for few-shot learning rely on simple base learners such as nearest-neighbor classifiers. However, even in the few-shot regime, discriminatively trained linear predictors can offer better generalization. We propose to use these predictors as base learners to learn representations for few-shot learning and show they offer better tradeoffs between feature size and performance across a range of few-shot recognition benchmarks. Our objective is to learn feature embeddings that generalize well under a linear classification rule for novel categories. To efficiently solve the objective, we exploit two properties of linear classifiers: implicit differentiation of the optimality conditions of the convex problem and the dual formulation of the optimization problem. This allows us to use high-dimensional embeddings with improved generalization at a modest increase in computational overhead. Our approach, named MetaOptNet, achieves state-of-the-art performance on miniImageNet, tieredImageNet, CIFAR-FS and FC100 few-shot learning benchmarks.

Citation

If you use this code for your research, please cite our paper:

@inproceedings{lee2019meta,
  title={Meta-Learning with Differentiable Convex Optimization},
  author={Kwonjoon Lee and Subhransu Maji and Avinash Ravichandran and Stefano Soatto},
  booktitle={CVPR},
  year={2019}
}

Dependencies

Python 2.7+ (not tested on Python 3)
PyTorch 0.4.0+
qpth 0.0.11+
tqdm

Usage

Installation

Clone this repository:

git clone https://github.com/kjunelee/MetaOptNet.git
cd MetaOptNet

Download and decompress dataset files: miniImageNet (courtesy of Spyros Gidaris), tieredImageNet, FC100, CIFAR-FS
For each dataset loader, specify the path to the directory. For example, in MetaOptNet/data/mini_imagenet.py line 30:
```
_MINI_IMAGENET_DATASET_DIR = 'path/to/miniImageNet'
```

Meta-training

To train MetaOptNet-SVM on 5-way miniImageNet benchmark:
```
python train.py --gpu 0,1,2,3 --save-path "./experiments/miniImageNet_MetaOptNet_SVM" --train-shot 15 \
--head SVM --network ResNet --dataset miniImageNet --eps 0.1
```
As shown in Figure 2, of our paper, we can meta-train the embedding once with a high shot for all meta-testing shots. We don't need to meta-train with all possible meta-test shots unlike in Prototypical Networks.
You can experiment with varying base learners by changing '--head' argument to ProtoNet or Ridge. Also, you can change the backbone architecture to vanilla 4-layer conv net by setting '--network' argument to ProtoNet. For other arguments, please see MetaOptNet/train.py from lines 85 to 114.

To train MetaOptNet-SVM on 5-way tieredImageNet benchmark:

python train.py --gpu 0,1,2,3 --save-path "./experiments/tieredImageNet_MetaOptNet_SVM" --train-shot 10 \
--head SVM --network ResNet --dataset tieredImageNet

To train MetaOptNet-RR on 5-way CIFAR-FS benchmark:

python train.py --gpu 0 --save-path "./experiments/CIFAR_FS_MetaOptNet_RR" --train-shot 5 \
--head Ridge --network ResNet --dataset CIFAR_FS

To train MetaOptNet-RR on 5-way FC100 benchmark:

python train.py --gpu 0 --save-path "./experiments/FC100_MetaOptNet_RR" --train-shot 15 \
--head Ridge --network ResNet --dataset FC100

Meta-testing

To test MetaOptNet-SVM on 5-way miniImageNet 1-shot benchmark:

python test.py --gpu 0,1,2,3 --load ./experiments/miniImageNet_MetaOptNet_SVM/best_model.pth --episode 1000 \
--way 5 --shot 1 --query 15 --head SVM --network ResNet --dataset miniImageNet

Similarly, to test MetaOptNet-SVM on 5-way miniImageNet 5-shot benchmark:

python test.py --gpu 0,1,2,3 --load ./experiments/miniImageNet_MetaOptNet_SVM/best_model.pth --episode 1000 \
--way 5 --shot 5 --query 15 --head SVM --network ResNet --dataset miniImageNet

Acknowledgments

This code is based on the implementations of Prototypical Networks, Dynamic Few-Shot Visual Learning without Forgetting, and DropBlock.

metaoptnet's People

Contributors

Stargazers

Watchers

Forkers

gaoyiminggithub akatoshking pierrehao kelvinson hyzcn peterzhousz dlwbm123 bhaney russell0 csjunxu acechuse peterzs evcu ml-lab jarygrace 459548764 chunyanzhao hayeonlee hongguangzhang zhongjieyu wpfhtl fedorajzf zstbackcourt crashmoon liqi0126 sxxymzy megayeye yuanmengzhixing blue-blue272 zhangxueting gaojinghan getterk96 puzhao8 zialiu aliencegg luckycookiecookie gaieepo ryutian chenyuelu hokim98 sailfish009 patrickzh ruizhaoz yuanwanglll jingyi1997 liusimon567 sonali-mahendran jawaechan confusezius achbogga chrysts zhaoyang626 shi1997yee largefishpku wezard1991 scottke-wisc mldl ingvar-y domzhaomathematics zhenxuanfeng qinzhengmei legitqx zhaoxiaoyun dingyuan0118 lustoo wzjahucm jeffgan99 youngbigbird1985 liaojunjie-obj thupchnsky choovybi feimadecaogaozhi piaofu110 saitamandd lixixi89055465 zihangm myhan1996 howyoungchen yongwuml siyuwang12 miladabd patrickhennessy-dal acgdcs wangfp-516 chunde zxs1652 deve-w zjgans xfx88 qb3 tujun233 hitszjsy lohkokwee vancezhang

metaoptnet's Issues

Pretrained model

Hi,

Thanks for sharing the source code!
I'm wondering if you can provide the mode file you use to report the number in the paper. In particular, I would like to ask for the model file "MetaOptNet-SVM-trainval" evaluated on the 5-shot mini-ImageNet classification (which reports 80.00 ± 0.45%). Thanks!

the parameters config for the cifarfs,the accuracy is only 63%

could you tell me the config for the cifarfs for protonet (backbone is RESNET 12, 5way 1 shot) ? ,i use the default config to train ,the accuracy is just 63% much lower the 72%

qpth Error: Cannot perform LU factorization on Q.

Pytorch 1.0.1. Still have this problem with RTX2080ti.

How can I use this code to do predict on custom dataset without ground truth labels?

Thank you for your great work. I wonder if you can help me to figure out how to do predict on dataset without ground truth labels. The label space of this dataset might be the same as that of training dataset. Thanks.

OOM issue

continue with the previous issue:
I have temporarily modified line45 of train.py to
network = torch.nn.DataParallel(network, device_ids=[0])

and encounter this OOM issue:
(metaopnet) :~/MetaOptNet$ python train.py --gpu 0 --save-path "./experiments/miniImageNet_MetaOptNet_SVM" --train-shot 15 --head SVM --network ResNet --dataset miniImageNet --eps 0.1
Loading mini ImageNet dataset - phase train
Loading mini ImageNet dataset - phase val
('using gpu:', '0')
{'episodes_per_batch': 8, 'head': 'SVM', 'val_query': 15, 'test_way': 5, 'train_way': 5, 'eps': 0.1, 'save_epoch': 10, 'val_episode': 2000, 'num_epoch': 60, 'train_query': 6, 'save_path': './experiments/miniImageNet_MetaOptNet_SVM', 'train_shot': 15, 'val_shot': 5, 'gpu': '0', 'dataset': 'miniImageNet', 'network': 'ResNet'}
Train Epoch: 1 Learning Rate: 0.1000
0%| | 0/1000 [00:00<?, ?it/s]THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Exception KeyError: KeyError(<weakref at 0x7fd4c4687db8; to 'tqdm' at 0x7fd4a07cdc90>,) in <bound method tqdm.del of 0%| | 0/1000 [00:01<?, ?it/s]> ignored
Traceback (most recent call last):
File "train.py", line 201, in
emb_support = embedding_net(data_support.reshape([-1] + list(data_support.shape[-3:])))
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 112, in forward
return self.module(*inputs[0], **kwargs[0])
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/MetaOptNet/models/ResNet12_embedding.py", line 113, in forward
x = self.layer1(x)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/container.py", line 91, in forward
input = module(input)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/MetaOptNet/models/ResNet12_embedding.py", line 51, in forward
out = self.bn3(out)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/modules/batchnorm.py", line 49, in forward
self.training or not self.track_running_stats, self.momentum, self.eps)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/functional.py", line 1194, in batch_norm
training, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1524577523076/work/aten/src/THC/generic/THCStorage.cu:58

Error in the SVM Head

Hi,
I'm trying to run your code with the SVM-CS head. I get an error in line 296 of the "classification_heads.py" file. The error is as follows:

File "/home/user/Desktop/MetaLearning/models/classification_heads.py", line 396, in MetaOptNetHead_SVM_CS
qp_sol = QPFunction(verbose=False, maxIter=maxIter)(G, e.detach(), C.detach(), h.detach(), A.detach(), b.detach())
File "/home/user/.local/lib/python2.7/site-packages/qpth/qp.py", line 91, in forward
self.Q_LU, self.S_LU, self.R = pdipm_b.pre_factor_kkt(Q, G, A)
File "/home/user/.local/lib/python2.7/site-packages/qpth/solvers/pdipm/batch.py", line 399, in pre_factor_kkt
G_invQ_GT = torch.bmm(G, G.transpose(1, 2).btrisolve(*Q_LU))
TypeError: btrisolve() takes exactly 3 arguments (4 given)

I can't figure out how to fix this issue. I'm running the code in Python2 with the same package versions.
Thank you for your help,
Micah

Why dual formulation in Bertinetto's regression solver?

Hi, I'm new to this. I read both your paper and Bertinetto's. In their paper, the closed form solver is derived by classicial regression or Newton’s method. There is no dual form needed in theirs. But in your paper, their solution has a dual form. Does it contain some hidden processes? Thank you!

Parameters for ProtoNet using ResNet12 as backbone

Hello, I was trying to replicate the results reported in the paper but failed. Could you share the parameter setting when you run ResNet12 as backbone and ProtoNet head as the classifier. I am trying to replicate the results of accuracy 75.6% on 5-way-5-shot on miniImageNet. Thanks!

def MetaOptNetHead_Ridge()

Your work is really excellent. Could you give me a simple analysis of the function metaoptnethead_ridge in classification_heads.py. Thank you.

About"TypeError: btrisolve() takes 3 positional arguments but 4 were given"

Hello, thank you very much for your contribution to this field. I had the following question when reproducing your code: "TypeError: Btrisolve () takes 3 positional arguments but 4 were given", your code was downloaded and run, using only positional metaoptnet-RR training on the CIfar-FS benchmark. Looking forward to your reply. Thank you!

Why the accuracy of the Prototypical Network is higher than the reported version in paper?

When I use your prototypical network to perform miniImageNet classification, the accuracy obtained is higher than that reported in the paper? Specifically, I use

python train.py --gpu 0 --save_path 'Logs/miniImageNet_Proto_5w_1s' --train_way 5 --val_way 5 --train_shot 1 --val_shot 1 --train_query 6 --val_query 15 --head ProtoNet --network ProtoNet --dataset miniImageNet

The accuracy I get is 52.47 ± 0.67, which is higher than the 49.42 ± 0.78 reported in the paper。
Then I use

python train.py --gpu 0 --save_path 'Logs/miniImageNet_Proto_5w_5s' --train_way 5 --val_way 5 --train_shot 5 --val_shot 5 --train_query 6 --val_query 15 --head ProtoNet --network ProtoNet --dataset miniImageNet

The accuracy I get is 70.65±0.52, which is higher than the 68.20±0.66 reported in the paper。
Could you explain the reason?Is it related to data reading, or is it due to some standardization techniques?

Couldn't repeat your results [Instance Norm/Batch Norm]

Hello,
I've been trying to repeat your results for a bachelor project where I will be applying your algorithm in different meta datasets.

I obtained lower accuracies, about 5 to 10 % for each meta dataset you used (except for FC100 where I have obtained it relatively close accuracy).
I made some changes in your code in order to fit my gpu memory (lowered the amount of batches per episode to 4).
My first guess was that this could make a difference mainly in the Batch Normalization in ResNet so I changed it to Instance Normalization instead, as a result I've obtained even lower results.

Would you help me understand why?
Other than that, would you have any suggestions of meta-datasets we could use?

Thank you in advance.
Lucas Tonon Rodrigues.

Reproducability about ProtoNet on mini-ImageNet

I am currently trying to reproduce results on mini-ImageNet using ProtoNet based on your code.

However, when I run the below script for training ProtoNet, it achieves 57% on the test set, but the reported accuracy is 59%.

python train.py --gpu 0,1,2,3 --save-path "./experiments/miniImageNet_MetaOptNet_ProtoNet" --train-shot 1 --val-shot 1 \
--head ProtoNet --network ResNet --dataset miniImageNet --eps 0.1

I wonder how to achieve the reported accuracy.

If you don't mind, can you tell me how you train ProtoNet on mini-ImageNet?

Loading qpth fails on Enum

Hi, thanks for sharing your code!

It looks like qpth uses the enum module which, as I understand it, does not exist in python 2.7, and running train.py fails with the following error

Traceback (most recent call last):
  File "train.py", line 12, in <module>
    from models.classification_heads import ClassificationHead
  File "/misc/cephfs/home/smarts/ian/MetaOptNet/models/classification_heads.py", line 7, in <module>
    from qpth.qp import QPFunction
  File "/opt/smarts/envs/py27_pytorch/lib/python2.7/site-packages/qpth/__init__.py", line 3, in <module>
    from . import qp
  File "/opt/smarts/envs/py27_pytorch/lib/python2.7/site-packages/qpth/qp.py", line 6, in <module>
    from .solvers.pdipm import batch as pdipm_b
  File "/opt/smarts/envs/py27_pytorch/lib/python2.7/site-packages/qpth/solvers/pdipm/batch.py", line 2, in <module>
    from enum import Enum
ImportError: No module named enum

Can you explain how you're loading this in a python 2.7 environment? Thanks again!

question about "--episodes-per-batch"

Hi,

Thanks for your excellent work. Due to the limitation of my GPUs, I couldn't set --episodes-per-batch" to 8 as you did in your paper, instead I set it to 2 and used only one GPU to run your code. However, the result I achieved for miniImagenet 5 way 1 shot accuracy is 59%, which is much lower than your reported result. Could you please tell why "--episodes-per-batch" can influence the result quite significantly?

Thanks

what is the difference between novel categories and base category?

RuntimeError with qpth

Hello,

I've been trying to run some MetaOptNet experiments. I've created a conda environment with the packages you list as required. However, when I try to train MetaOptNet with SVM head, I'm getting the error:

"RuntimeError: Incompatible matrix sizes for lu_solve: each A matrix is 25 by 25 but each b matrix is 8 by 25"

I've tried using environments with different versions of python, but can't seem to get around this issue. I'm at a loss. Have you run into this issue before? Any advice would be greatly appreciated.

Data type problem

Thank you for your code.
I ran your program and encountered this problem, as shown in the figure. I can't solve it for now. I am using python3.6.

some questions about the paper and code

Thank you for sharing your paper and code.
After I finished reading, I have some questions. Implied in the article,‘For SVM and ridge regression, we observe that keeping meta-training shot higher than meta-testing shot leads to better test accuracies as shown in Figure 2.’

Does this mean that the number of training samples during the meta-train is different from the number of training samples during the meta-test? However，I think in order to better measure the ability of generalization , shouldn't these two parameters be consistent? In addition, when I studied the code, I was confused about ‘train-shot’, ‘val-shot’, ‘train-query’, ‘val-query’, I hope you can explain these.
Look forward to your reply.

Does the performance of different SVM heads vary largely?

qpth error

when I run the code using " python train.py --gpu 0 --save-path "./experiments/CIFAR_FS_MetaOptNet_RR" --train-shot 5 --head Ridge --network ResNet --dataset CIFAR_FS
",I met the error

RuntimeError:
qpth Error: Cannot perform LU factorization on Q.
Please make sure that your Q matrix is PSD and has
a non-zero diagonal.

How can I solve this error?

Protonet re-implementation details

Hi,

Thanks for the detailed documentation -- It was very helpful!
I have a question regarding the re-implementation of Protonet with the Resnet-12 backbone. How many ways were used to train for both 5shot and 1shot and was the same used for all datasets? Also, was label smoothing applied for the Protonet experiments too (again was it done all datasets)?

Thanks,
Amrith

Formula (11) in your paper.

Is ridge expression not linear regression plus L2 loss? Is this formula also ridge regression?

Is Label smoothing for SVM only used in miniImageNet?

I am sorry to ask so many questions. The papers I will follow will definitely include and reference MetaOptNet.

Why the SVM implemented by yourself is better than that in sklearn

Some external library has already implemented the SVM algorithm, e.g. sklearn. And I think sklearn's SVM algorithm is also implemented through the dual form. Why don't you just use the SVM in sklearn. What is the advantages of your own implemented SVM or what's special about it?

could you tell me the link which about the miniImageNet_category_split_train_phase_train.pickle ?

The number of val-shot choice.

Some as before. Thanks for your code! I think it is really great work.
The test epoch is chosen based on the accuracy on val set, right? I want to ask. If we need to use the 5-shot val to choose the model for the 5-shot test and the 1-shot val to choose the model for the 1-shot test. Or just use the 5-shot val to choose the model for both the 5-shot and 1-shot test.

I want to ask this because when I use 5-shot val to choose model for 1-shot test, I all-way cannot get the accuracy in your paper.

qpth warning: Returning an inaccurate and potentially incorrect solutino.

Some residual is large.
Your problem may be infeasible or difficult.

You can try using the CVXPY solver to see if your problem is feasible
and you can use the verbose option to check the convergence status of
our solver while increasing the number of iterations.

Advanced users:
You can also try to enable iterative refinement in the solver:
locuslab/qpth#6

Hi, could you tell me, is this normal? Why does this happen? Thank you.

there is error when the network is Resnet and the head is SVM.

when the code is perfomed by the head function which is MetaOptNetHead_SVM_CS. It calls QPFunction. When the function is called. The error comes up TypeError : btrisolve() takes 3 positional arguments but 4 were given.
The other options same as the default options except network, head.

Some questions about MetaOptNetHead_Ridge

I have read you code , and I think there may be some wrong about the function 'MetaOptNetHead_Ridge' in classification_heads.py. I think it should be

                    e = -1.0 * support_labels_one_hot

not

                   e = -2.0 * support_labels_one_hot

And the equation (11) should be minimize not maximize. Am I right?

Can this code be used for MetaOptNet-SVM-trainval?

I see the code there are not commands that can train MetaOptNet-SVM-trainval, can it?

about input data (pickle files)

Thanks for sharing the code.

I noticed that you use "miniImageNet_category_split_train_phase_val.pickle", where this file consists of 300 images of train class. How did you choose the images from the train class?
And, what is the difference between "miniImageNet_category_split_train_phase_val.pickle" and 'miniImageNet_category_split_train_phase_test.pickle'?

The results on miniImageNet

Thanks for your code! It is really good job.

I have tried almost experiments in paper, and most of them have got the accuracy reported in the paper. Except the results on miniImageNet.

The results of MetaOptNet-RR and MetaOptNet-SVM with and without label smoothing are close to 60.57 ± 0.44 (1shot) and 77.44 ± 0.33 (5shot).

Embedding: ResNet-12 without a global average pooling after the last residual block.
Drop_rate: 0.1, use DropBlock in last two resnet blocks and dropblock size is 5.
training shot: 15
training query: 6
nesterov momentum: 0.9
weight decay: 0.0005
mini-batch: 8
each epoch: 1000 episodes. (Here is 8000 tasks for each epoch, right?)
initial learning rate: 0.1
learing rate: changed to 0.006, 0.0012, and 0.00024 at epochs 20, 40 and 50, respectively.
label smoothing: 0.1
C of SVM: 0.1
Regularization of Ridge regression: 50.0
Iteration of QP solver: 15 (training), 3 (testing).

ProtoNets with Resnet-12 can get a result closed that in your paper, if we don't use label smooth. Is there something that I miss in MetaOptNet-RR and MetaOptNet-SVM? And is label smoothing used in MetaOptNet-RR, MetaOptNet-SVM and ProtoNets?

About accuracy in CIFAR_FS 5-way 5-shot and how to implement MetaOptNet-SVM-trainval

I like you work very much and it give us many inspirement.
But when I use it at CIFAR_FS 5-way 5-shot and I get 83.89 ± 0.51% accuracy. I think the reason maybe that I used MetaOptNet-SVM not MetaOptNet-SVM-trainval. So I want know how to implement MetaOptNet-SVM-trainval. Thank you very much.

Reproduced results on miniImagenet with prototypical network with 4-layer network

Hi, many thanks for the sharing of the code.
I reproduced the results of the prototypical network (4-layer network)on miniImagenet, under python=2.7 with following packages.
torch==1.0.0.

I trained with only train dataset, and the best model is selected on the validation set.
I tested the best model on the test data, the results are:
Accuracy: (1-shot) 43.79 ± 0.65 --- your paper: ( 53.47 )

I am wondering if the difference is reasonable in this task? I am also wondering if the difference is due to my running environments.

Many thanks!!

ResNet-12 channels is different to TADAM

Hello,

Thanks for your impressive work and sharing the code.

I am having a question about the ResNet-12 structure, that in the paper

We use a ResNet-12 network following [20, 18] in our experiments

However, in TADAM[20], they said

The number of filters for the first
ResNet block was set to 64 and it was doubled after each max-pool block

That is, TADAM's ResNet-12 channels are 64,128,256,512, while in this code it is 64,160,320,640 channels per layer.

Because I am considering to study few-shot, and I got really confused about which backbone I should choose for a fair comparison. Therefore, I am wondering if my understanding is correct, and what leads to the designing choice of model?

Thank you!

Where is the parameter gamma

In Equation (12) of your paper, you mentioned the learnable parameter 'gamma', which I couldn't find in your code. Could you point out the line of code?

TypeError: init() got an unexpected keyword argument 'dropblock_size'

Thanks for your help in previous issue (Sorry for having to create a new issue due to "you can't comment at this time." issue @github, and reported)

I can begin miniImageNet (without changing any original settings): but getting an issue:
TypeError: init() got an unexpected keyword argument 'dropblock_size'
May you help to check for this issue? Thanks!

Detailed log is as below: (you can see my python version is 2.7.16
(metaopnet) /MetaOptNet$ python -V
Python 2.7.16 :: Anaconda, Inc.
(metaopnet) /MetaOptNet$ python train.py --gpu 0 --save-path "./experiments/miniImageNet_MetaOptNet_SVM" --train-shot 15 --head SVM --network ResNet --dataset miniImageNet --eps 0.1
Loading mini ImageNet dataset - phase train
Loading mini ImageNet dataset - phase val
('using gpu:', '0')
{'episodes_per_batch': 8, 'head': 'SVM', 'val_query': 15, 'test_way': 5, 'train_way': 5, 'eps': 0.1, 'save_epoch': 10, 'val_episode': 2000, 'num_epoch': 60, 'train_query': 6, 'save_path': './experiments/miniImageNet_MetaOptNet_SVM', 'train_shot': 15, 'val_shot': 5, 'gpu': '0', 'dataset': 'miniImageNet', 'network': 'ResNet'}
Traceback (most recent call last):
File "train.py", line 164, in
(embedding_net, cls_head) = get_model(opt)
File "train.py", line 44, in get_model
network = resnet12(avg_pool=False, drop_rate=0.1, dropblock_size=5).cuda()
File "/MetaOptNet/models/ResNet12_embedding.py", line 124, in resnet12
model = ResNet(BasicBlock, keep_prob=keep_prob, avg_pool=avg_pool, **kwargs)
TypeError: init() got an unexpected keyword argument 'dropblock_size'

val and test set in FC100

Thanks for your code! I think it is really great work. But I have a problem about FC100. During the training, the training loss is not problem， gradually decrease as the number of epoch increases. However, accuracy on val and test set remains constant around 50%.
Here is the version of my package
Python 3.6.3
torch 1.0.1.post2
torchnet 0.0.4
qpth 0.0.13

Questions on layer1&2 's dropblock

Hi there,

Thanks for the work!

I got a bit confused for some details of dropblocks. Please correct me if I misunderstand something.
As mentioned in the paper, both for ImageNet and CIFAR dataset, the layer1 and layer2's setting of dropblock is DB(0.9, 1) which means that in the code the parameters should be drop_rate=0.1, drop_size=1;

However when I looked through the code, the initialization of layer1 and layer2 serves as:
self.layer1 = self._make_layer(block, 64, stride=2, drop_rate=drop_rate);
Although the default drop_size=1, the default drop_block=False, which means that if initialized as this, with the code:
if self.drop_rate > 0:
if self.drop_block == True:
feat_size = out.size()[2]
keep_rate = max(1.0 - self.drop_rate / (20 * 2000) * (self.num_batches_tracked), 1.0 - self.drop_rate)
gamma = (1 - keep_rate) / self.block_size ** 2 * feat_size ** 2 / (feat_size - self.block_size + 1) ** 2
out = self.DropBlock(out, gamma=gamma)
else:
out = F.dropout(out, p=self.drop_rate, training=self.training, inplace=True)

It is actually used the dropout rather than dropblock. I am wondering if this is correct and it is consistent with what mentioned in the paper?

Thanks for your time!

Accuracy of CIFAR-FS

Hi, I downloaded your code and run it on CIFAR-FS with this line:
python train.py --gpu 0 --save-path "./experiments/CIFAR_FS_MetaOptNet_RR" --train-shot 5
--head Ridge --network ResNet --dataset CIFAR_FS

but I got 'best accuracy' of 78%. What can I do in order to get 84%?

Thank you!
Sivan

Keep-rate scheduling of DropBlock in a multi-GPU environment

Hello,
I found an issue while trying to train your model.
In your code, the variable 'self.num_batches_tracked' should count the progress of the episode by increasing when the model is called.
But in the multi-GPU environment, the modification of the variable in the forward() is ignored because a DataParallel replicates the model into each GPU and the updates are destroyed after forward(). So the variable just moves up and down with 0 and 1.
I think this should be fixed. Thanks :)

some questions

Thank you for sharing the codebase

In your material: Download and decompress dataset files: miniImageNet << this link does not work for now.
miniImageNet_category_split_train_phase_train.pickle seem to be needed (maybe it can be downloaded in that link of question 1?)

device_ids

Thanks for the quick reply for the previous issue, you may close it.

Another issue is the device_ids seem to be hardcoded, I wonder if my observation is correct.

In train.py line 45:
network = torch.nn.DataParallel(network, device_ids=[0, 1, 2, 3])

python train.py --gpu 0 --save-path "./experiments/miniImageNet_MetaOptNet_SVM" --train-shot 15 --head SVM --network ResNet --dataset miniImageNet --eps 0.1

Traceback (most recent call last):
File "train.py", line 164, in
(embedding_net, cls_head) = get_model(opt)
File "train.py", line 45, in get_model
network = torch.nn.DataParallel(network, device_ids=[0, 1, 2, 3])
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 102, in init
_check_balance(self.device_ids)
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 17, in _check_balance
dev_props = [torch.cuda.get_device_properties(i) for i in device_ids]
File "/home/xxx/anaconda3/envs/metaopnet/lib/python2.7/site-packages/torch/cuda/init.py", line 292, in get_device_properties
raise AssertionError("Invalid device id")
AssertionError: Invalid device id

Meta gradient Computation

Hi,

Thanks for making your implementation public! Really loved your work :) I wasn't able to find the exact function which returned the meta gradient (computed using the implicit function theorem). Specifically I was looking for the Jacobian computation (involved in the inverse in Thm. 1 of the paper). Could you please point me to the correct function for this?

Thanks,
Amrith

Reproduced results on miniImagenet

Hi, many thanks for the sharing of the code.
I reproduced the results on miniImagenet, under python=3.7.2 with following packages.

torch==1.0.0.dev20190112
torchfile==0.1.0
torchnet==0.0.4
torchvision==0.2.1
tornado==6.0.2
tqdm==4.31.1
cvxpy==1.0.22

I trained with only train dataset, and the best model is selected on the validation set.
I tested the best model on the test data, the results are:
Accuracy: (1-shot) 61.34 ± 0.65 --- your paper: ( 62.64 ± 0.61 )
(5-shot) 77.95 ± 0.47 -- your paper: (78.63 ± 0.46)

I am wondering if the difference is reasonable in this task? I am also wondering if the difference is due to my running environments.

Many thanks!!

TypeError: btrisolve() takes 3 positional arguments but 4 were given

Loading mini ImageNet dataset - phase train
Loading mini ImageNet dataset - phase val
using gpu: 0,1,2,3
{'num_epoch': 60, 'save_epoch': 10, 'train_shot': 5, 'val_shot': 5, 'train_query': 6, 'val_episode': 2000, 'val_query': 15, 'train_way': 5, 'test_way': 5, 'save_path': './experiments/miniImageNet_MetaOptNet_SVM', 'gpu': '0,1,2,3', 'network': 'ResNet', 'head': 'SVM', 'dataset': 'miniImageNet', 'episodes_per_batch': 8, 'eps': 0.1}
Train Epoch: 1 Learning Rate: 0.1000
0%| | 0/1000 [00:00<?, ?it/s]/usr/local/python3/lib/python3.6/site-packages/qpth/solvers/pdipm/batch.py:14: UserWarning: torch.btrifact is deprecated in favour of torch.lu and will be removed in the next release. Please use torch.lu instead.
return x.btrifact(pivot=not x.is_cuda)
0%| | 0/1000 [00:08<?, ?it/s]
Traceback (most recent call last):
File "train.py", line 207, in
logit_query = cls_head(emb_query, emb_support, labels_support, opt.train_way, opt.train_shot)
File "/usr/local/python3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/zzt/MetaOptNet/models/classification_heads.py", line 550, in forward
return self.scale * self.head(query, support, support_labels, n_way, n_shot, **kwargs)
File "/home/zzt/MetaOptNet/models/classification_heads.py", line 396, in MetaOptNetHead_SVM_CS
qp_sol = QPFunction(verbose=False, maxIter=maxIter)(G, e.detach(), C.detach(), h.detach(), A.detach(), b.detach())
File "/usr/local/python3/lib/python3.6/site-packages/qpth/qp.py", line 91, in forward
self.Q_LU, self.S_LU, self.R = pdipm_b.pre_factor_kkt(Q, G, A)
File "/usr/local/python3/lib/python3.6/site-packages/qpth/solvers/pdipm/batch.py", line 401, in pre_factor_kkt
G_invQ_GT = torch.bmm(G, G.transpose(1, 2).btrisolve(*Q_LU))
TypeError: btrisolve() takes 3 positional arguments but 4 were given

Question about meta-validation and meta-testing

Thanks for sharing the codes.
When i am reading your codes, i find that you use 'CIFAR_FS_train.pickle' as the base categories in validation stage and testing stage. According to my understanding, the support set and query set in validation stage should be constructed only from the 'CIFAR_FS_val.pickle'.
Why did you use 'CIFAR_FS_train.pickle' as the base categories(support set) and 'CIFAR_FS_val.pickle' as the novel categories(query set) in the validation stage?

cannot repeat paper's results

python3 test.py --gpu 0,1,2,3 --load ./experiments/miniImageNet_MetaOptNet_SVM/best_model.pth --episode 1000 --way 5 --shot 5 --query 15 --head SVM --network ResNet --dataset miniImageNet
Loading mini ImageNet dataset - phase test
using gpu: 0,1,2,3
{'gpu': '0,1,2,3', 'load': './experiments/miniImageNet_MetaOptNet_SVM/best_model.pth', 'episode': 1000, 'way': 5, 'shot': 5, 'query': 15, 'network': 'ResNet', 'head': 'SVM', 'dataset': 'miniImageNet'}
5%|████████▎ | 49/1000 [00:14<01:43, 9.19it/s]Episode [50/1000]: Accuracy: 77.12 ± 2.30 % (94.67 %)
10%|████████████████▊ | 99/1000 [00:19<01:38, 9.18it/s]Episode [100/1000]: Accuracy: 76.51 ± 1.55 % (78.67 %)
15%|█████████████████████████▏ | 149/1000 [00:25<01:32, 9.19it/s]Episode [150/1000]: Accuracy: 76.93 ± 1.16 % (77.33 %)
20%|█████████████████████████████████▋ | 199/1000 [00:30<01:27, 9.12it/s]Episode [200/1000]: Accuracy: 77.11 ± 1.02 % (92.00 %)
25%|██████████████████████████████████████████ | 249/1000 [00:36<01:24, 8.86it/s]Episode [250/1000]: Accuracy: 77.03 ± 0.95 % (78.67 %)
30%|██████████████████████████████████████████████████▌ | 299/1000 [00:41<01:16, 9.15it/s]Episode [300/1000]: Accuracy: 77.30 ± 0.86 % (81.33 %)
35%|██████████████████████████████████████████████████████████▉ | 349/1000 [00:46<01:10, 9.27it/s]Episode [350/1000]: Accuracy: 77.24 ± 0.81 % (76.00 %)
40%|███████████████████████████████████████████████████████████████████▍ | 399/1000 [00:52<01:00, 9.89it/s]Episode [400/1000]: Accuracy: 77.34 ± 0.77 % (80.00 %)
45%|███████████████████████████████████████████████████████████████████████████▉ | 449/1000 [00:57<00:57, 9.65it/s]Episode [450/1000]: Accuracy: 77.58 ± 0.72 % (88.00 %)
50%|████████████████████████████████████████████████████████████████████████████████████▎ | 499/1000 [01:02<00:50, 9.93it/s]Episode [500/1000]: Accuracy: 77.84 ± 0.68 % (78.67 %)
55%|████████████████████████████████████████████████████████████████████████████████████████████▊ | 549/1000 [01:07<00:45, 9.92it/s]Episode [550/1000]: Accuracy: 77.78 ± 0.64 % (78.67 %)
60%|█████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 599/1000 [01:12<00:40, 9.92it/s]Episode [600/1000]: Accuracy: 77.80 ± 0.61 % (69.33 %)
65%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 649/1000 [01:17<00:35, 9.87it/s]Episode [650/1000]: Accuracy: 77.82 ± 0.59 % (82.67 %)
70%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 699/1000 [01:22<00:30, 9.91it/s]Episode [700/1000]: Accuracy: 77.75 ± 0.57 % (80.00 %)
75%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 749/1000 [01:28<00:27, 9.18it/s]Episode [750/1000]: Accuracy: 77.67 ± 0.55 % (76.00 %)
80%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 799/1000 [01:33<00:21, 9.18it/s]Episode [800/1000]: Accuracy: 77.65 ± 0.54 % (70.67 %)
85%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 849/1000 [01:38<00:16, 9.22it/s]Episode [850/1000]: Accuracy: 77.60 ± 0.52 % (89.33 %)
90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 899/1000 [01:44<00:10, 9.20it/s]Episode [900/1000]: Accuracy: 77.67 ± 0.51 % (80.00 %)
95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 949/1000 [01:49<00:05, 9.18it/s]Episode [950/1000]: Accuracy: 77.62 ± 0.50 % (68.00 %)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊| 999/1000 [01:55<00:00, 9.02it/s]Episode [1000/1000]: Accuracy: 77.55 ± 0.49 % (73.33 %)
100%|██

python3 test.py --gpu 0,1,2,3 --load ./experiments/miniImageNet_MetaOptNet_SVM/best_model.pth --episode 1000 --way 5 --shot 1 --query 15 --head SVM --network ResNet --dataset miniImageNet
Loading mini ImageNet dataset - phase test
using gpu: 0,1,2,3
{'gpu': '0,1,2,3', 'load': './experiments/miniImageNet_MetaOptNet_SVM/best_model.pth', 'episode': 1000, 'way': 5, 'shot': 1, 'query': 15, 'network': 'ResNet', 'head': 'SVM', 'dataset': 'miniImageNet'}
5%|████████▏ | 48/1000 [00:13<01:32, 10.29it/s]Episode [50/1000]: Accuracy: 59.89 ± 3.07 % (69.33 %)
10%|████████████████▋ | 98/1000 [00:18<01:27, 10.37it/s]Episode [100/1000]: Accuracy: 60.99 ± 2.20 % (62.67 %)
15%|█████████████████████████ | 148/1000 [00:23<01:23, 10.20it/s]Episode [150/1000]: Accuracy: 60.69 ± 1.78 % (60.00 %)
20%|█████████████████████████████████▍ | 198/1000 [00:28<01:27, 9.21it/s]Episode [200/1000]: Accuracy: 60.57 ± 1.55 % (62.67 %)
25%|█████████████████████████████████████████▉ | 248/1000 [00:32<01:07, 11.22it/s]Episode [250/1000]: Accuracy: 60.69 ± 1.40 % (60.00 %)
30%|██████████████████████████████████████████████████▎ | 298/1000 [00:37<01:02, 11.27it/s]Episode [300/1000]: Accuracy: 61.19 ± 1.24 % (69.33 %)
35%|██████████████████████████████████████████████████████████▊ | 348/1000 [00:41<00:58, 11.16it/s]Episode [350/1000]: Accuracy: 61.64 ± 1.13 % (62.67 %)
40%|███████████████████████████████████████████████████████████████████▎ | 398/1000 [00:46<00:53, 11.20it/s]Episode [400/1000]: Accuracy: 61.48 ± 1.05 % (40.00 %)
45%|███████████████████████████████████████████████████████████████████████████▋ | 448/1000 [00:50<00:49, 11.25it/s]Episode [450/1000]: Accuracy: 61.34 ± 0.99 % (60.00 %)
50%|████████████████████████████████████████████████████████████████████████████████████▏ | 498/1000 [00:55<00:44, 11.26it/s]Episode [500/1000]: Accuracy: 61.12 ± 0.92 % (34.67 %)
55%|████████████████████████████████████████████████████████████████████████████████████████████▌ | 548/1000 [00:59<00:40, 11.16it/s]Episode [550/1000]: Accuracy: 61.09 ± 0.88 % (50.67 %)
60%|█████████████████████████████████████████████████████████████████████████████████████████████████████ | 598/1000 [01:04<00:35, 11.19it/s]Episode [600/1000]: Accuracy: 60.95 ± 0.83 % (60.00 %)
65%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████▌ | 648/1000 [01:08<00:31, 11.17it/s]Episode [650/1000]: Accuracy: 60.97 ± 0.79 % (65.33 %)
70%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉ | 698/1000 [01:13<00:27, 11.17it/s]Episode [700/1000]: Accuracy: 61.00 ± 0.76 % (60.00 %)
75%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍ | 748/1000 [01:17<00:22, 11.19it/s]Episode [750/1000]: Accuracy: 60.86 ± 0.73 % (60.00 %)
80%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 798/1000 [01:22<00:18, 11.12it/s]Episode [800/1000]: Accuracy: 60.86 ± 0.71 % (64.00 %)
85%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎ | 848/1000 [01:26<00:13, 11.27it/s]Episode [850/1000]: Accuracy: 60.87 ± 0.68 % (61.33 %)
90%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▊ | 898/1000 [01:30<00:09, 11.18it/s]Episode [900/1000]: Accuracy: 60.84 ± 0.66 % (73.33 %)
95%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▏ | 948/1000 [01:35<00:04, 11.23it/s]Episode [950/1000]: Accuracy: 60.84 ± 0.66 % (26.67 %)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋| 998/1000 [01:39<00:00, 11.11it/s]Episode [1000/1000]: Accuracy: 60.89 ± 0.64 % (72.00 %)
100%|█

MiniImageNet 5way1shot acc

I followed the default setting to train on miniimagenet and use the best_model.pth, which returns acc 59.28%, with a huge gap to the reported one which i don't think is resulted from random choice of episodes. Any idea?