lenscloth / rkd Goto Github PK

Official pytorch Implementation of Relational Knowledge Distillation, CVPR 2019

Python 100.00%

deep-neural-networks computer-vision metric-learning knowledge-distillation deep-learning

rkd's Issues

EOFError: Ran out of Input

Hello,

So I followed the instruction in Readme but when I ran the last section to distill the trained model to student network I got the following error

Traceback (most recent call last): File "run.py", line 191, in <module> eval(model, loader_train_eval, 0) File "run.py", line 174, in eval for images, labels in test_iter: File "C:\ProgramData\Anaconda3\lib\site-packages\tqdm\std.py", line 1165, in __iter__ for obj in iterable: File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 355, in __iter__ return self._get_iterator() File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 301, in _get_iterator return _MultiProcessingDataLoaderIter(self) File "C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 914, in __init__ w.start() File "C:\ProgramData\Anaconda3\lib\multiprocessing\process.py", line 121, in start self._popen = self._Popen(self) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 224, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\context.py", line 327, in _Popen return Popen(process_obj) File "C:\ProgramData\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 93, in __init__ reduction.dump(process_obj, to_child) File "C:\ProgramData\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump ForkingPickler(file, protocol).dump(obj) _pickle.PicklingError: Can't pickle <function <lambda> at 0x000002CB8B93A3A0>: attribute lookup <lambda> on __main__ failed Traceback (most recent call last): File "<string>", line 1, in <module> File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main exitcode = _main(fd, parent_sentinel) File "C:\ProgramData\Anaconda3\lib\multiprocessing\spawn.py", line 126, in _main self = reduction.pickle.load(from_parent) EOFError: Ran out of input

I'm not too sure what went wrong since all previous 3 steps were done properly. I tried checking with the best.pth file but even when I didn't have a proper application to open it it is not empty either. I don't understand why the code is prompting me that it's at the end of file already.

Hi，could you release the cifar-100 classification experiment?

torch type may not matching in pytorch==1.2

Hi~ I think there is a small problem in pairsampler.py.
In pytorch ==1.0 works fine, but it's not work in pytorch == 1.2.

bool type can not opt with uint8 type
In pytorch == 1.2 you can change like this to fix it .

^——^

Eval vs Train

Sorry if this question seems rudimentary. After every epoch there are two numbers, eval and train, and I was wondering that what exactly the differences between these two are.

Ask about your model ResNet50 model??

Hello sir:
My name is Paul Chou, I am interested in your paper about " Relational Knowledge Distillation". I am curious about what is your ResNet50 architecture when you are running the CIFAR100 dataset?? Is this architecture the same as the ResNet50 with ImageNet224*224??

A question about paper setting

we need a pair of training examples in RKD-D and a triplet samples in RKD-A.
In paper, you sample all possible tuples in mini-batch.
I think the number of tuples are too many in common classification setting.(ex. CIFAR10, ImageNet)
How to sample these pair in Image Classification setting?

the calculation of relation

Where is the calculation of relation in the code？

can't run the "run_distill.py"

when I run the program follow the Quick Start,it failed with err as blow:

I can't find what's wrong with it, could you help me ?

Cifar 100 and Tiny Imagenet Training

Hello,
is there a version that works with Cifar 10 and Tiny Image net? The examples in the repository seems to have been designed for metric learning.

Can you release your trained model on cub200 ?

Hi @lenscloth ~
I can't get the same recall score on dataset cub200 emm.......
My test:
Teacher : resnet50 embedding : 512 batchsize: 128 sample: distanceweight
and the result is 61.0 (very close to your score 61.24)
But...student resnet18 is low...(58.74)without L2normal , dist_ratio=1 and angle_ratio=2

So can you release your trained model on cub200 ?(teacher and student model )
Any help would be deeply appreciated!

Student accuracy way below expected

Hello,

When I trained with the CUB200 dataset with the embedding size of 64 using resnet18, the student's accuracy is only at about 54 while I saw that on the paper it should be around 58. The teacher is trained under the same dataset with a embedding size of 512 and resnet 50, and its accuracy is still lower but not nearly as much at 59 against 61 from the paper. I was wondering if you knew anything that could cause this. No change was made to the code when I ran it.

metric/utils.py

Hello,Thanks for your work, I run the code, it raise the follwing error: In metric/utils.py prod = e @ e.t(): SyntaxError: invalid syntax

Few-shot learning experiments

Hi,
it would be really helpful if the few-shot learning experiments in the paper were pushed to the repo. Is this possible?

Greetings,
Sebastian

What is its value equal to opts.dark_ratio

dark_loss = opts.dark_ratio * dark_criterion(e, t_e)

Do you have any suggestions on how to choice the loss ratio of different distill methods?

dataset link don't work

http://www.vision.caltech.edu/visipedia-data/CUB-200-2011/CUB_200_2011.tgz
don't work

A question about recall

why the eval recall decrease with the training process goes on ?
Teacher model: resnet 50
student model: resnet 18
dataset: CUB2011

What are the numbers in the student and teacher tensors?

Hello,

So I was looking at the loss.py file and the distance and angle metric functions in it. There are two tensors used for parameter when calculating both angle and distance, the student and teacher. I believe each one is a 64 x 518 tensor. So, I was wondering, what do the values in each tensor refer to? My guess is that it is some sort of RGB value-like number indicating the one out of all the pixels, but I am unsure.

Hi, there is a similar idea in <<Correlation Congruence for Knowledge Distillation arXiv:1904.01802v1>>. Do you think which is more efficent ?

Hi, there is a similar idea in "Correlation Congruence for Knowledge Distillation arXiv:1904.01802v1". Do you think which is more efficent ?

test_set and train_set's lengths mismatch with those based on README from the dataset

Hello. When I ran the code it shows that there are 5864 images used for training and 5924 used for testing. However, based on the train_test_split.txt provided in the README file in the CUB200 dataset, it shows that there are supposed to be 5994 used for training and 5794 used for testing. I was wondering if you know what caused this inconsistency, and if so, do you mind pointing out which specific 130 images you swapped from testing to training?

Thanks a lot

in loss design , is the code " torch.no_grad " essential ?

Hi, glad to see u ,
i am reading your loss design now , and found code below
`
class RKdAngle(nn.Module):
def forward(self, student, teacher):
# N x C
# N x N x C

    with torch.no_grad():
        td = (teacher.unsqueeze(0) - teacher.unsqueeze(1))
        norm_td = F.normalize(td, p=2, dim=2)
        t_angle = torch.bmm(norm_td, norm_td.transpose(1, 2)).view(-1)

    sd = (student.unsqueeze(0) - student.unsqueeze(1))
    norm_sd = F.normalize(sd, p=2, dim=2)
    s_angle = torch.bmm(norm_sd, norm_sd.transpose(1, 2)).view(-1)

    loss = F.smooth_l1_loss(s_angle, t_angle, reduction='elementwise_mean')
    return loss

`
both in rkd angle and rkd distance , there is a " torch.no_grad" in teacher related code .
is that essential ? can that be removed ?

lenscloth / rkd Goto Github PK

rkd's Issues

Recommend Projects

Recommend Topics

Recommend Org