xyutao / fscil Goto Github PK

View Code? Open in Web Editor NEW

218.0 218.0 37.0 2.13 MB

Official repository for Few-Shot Class-Incremental Learning (FSCIL)

Python 83.81% Shell 16.19%

fscil's People

Contributors

Stargazers

Watchers

fscil's Issues

Code of tools and code of comparison methods

Hi, I have read your paper "Few-Shot Class-Incremental Learning", it is a very good and interesting work, it is a groundbreaking work, and the experimental results are also very good. Recently we are also carrying out a Few-Shot Class-Incremental Learning work. We hope to compare with you in the experiment, and we hope to get your code for easy comparison. Thank you very much if you can! Mainly the following two aspects of the code

Undisclosed part of the code on Github, such as "tools.ng_anchor", "tools.loss" and "tools.plot", etc.
The three comparison methods in Table 1 are about the implementation code of few-shot class-incremental. Would you please send me the code? I want to speed up my experiment progress, thank you very much if you can.

About edge E

Hello, I have a question about the role of edge 'E' in graph G indicated in the paper.

Is the collection of the edge 'E' (and age 'a') used in the learning process? I interpret that edge 'E' is not used in the learning process anywhere. Was it just used for plotting?

Thank you for your great works.

About variance Lambda

In the page 4, it mentioned that The variance Λ_j is estimated using the feature vectors whose winner is j.

Can you elaborate the meaning of the winner? Did you define the winner for variance via:

a distribution of a feature vector which has a same label c_j?
you set the certain distance value as a metric, and choose the cetroids if their distances of other centroids are below (which means closer to a centroid) to the metric. And you get a variance from distribution among those centroids?
you set the certain integer as a metric, and get a list of centroids which index value equals to that metric. And you get a variance from the distribution from that centroids of that list?

Thank you for reading it and and stay safe.

Code

Hello, is there any related code missing? Such as "tools.ng_anchor"，“tools.loss” and "tools.plot"

Unable to understand(Data preparation part)

The work really looks great. But only problem is to apply the same method on our own data for the task of classification.Unable to understand the data preparation part for the training.
It would be very helpful if you could take a new dataset(lets say 2 classes(dog,cat)) and then incrementally train the network for another 2 classes(like horse,cow). Now the resulting model should be able to classify all 4 classes.
Looking forward for your reply in this context.

Load Pre-trained model for CUB

For CUB200, there is a need to load the ImageNet-pretrained ResNet18 model for initialization. However, ImageNet contains prior knowledge about birds, part of whose images are in CUB-200. I am worried that if this operation is consistent with the few-shot setting.

Test set

For datasets like CUB, do you have a test.txt? If not, how did you gather your test dataset (is it all the test samples for each class in each session)?

Accuracy about base graph

Thank you for your good research.
I'm practicing implementation with code as an example of your research. I got a question while practicing implementation. In the process of verifying the accuracy of the base graph I made, I wondered what percentage of accuracy should be at least to say that the base graph was well made.

By any chance, when you make the graph for base class (G1), could you tell me the accuracy for the base data set?

Question about dataloader

Hi I am interested in this few shot class incremental learning setting, I have following questions regarding to the dataloader.py:

What's the meaning of fix_class in following code? how to use it?
def init(self, logger, root=os.path.join(base.data_dir(), 'datasets', 'cifar100'),
fine_label=False, train=True, transform=None, c_way=5, k_shot=5, fix_class=None, base_class=0)

fscil/dataloader/dataloader.py

Line 31 in 6dd827f

fine_label=False, train=True, transform=None, c_way=5, k_shot=5, fix_class=None, base_class=0):
When want to setup base class training dataset, how to call this function to form the dataset? set base_class=60?

Questions about the Table 4 in the paper and the FSCIL conditions

Hi @xyutao,

Thanks for the great work and I have a question for the ablation study of Comparison between "exemplars" and NG nodes:

Is the first row Memory represents the number of the exemplars and the number of nodes you are using for knowledge representation, respectively? If yes, I am confused about how can they compare with each other?

In my opinion, as the representation types are different, the definition of the Memory for each setting may not be aligned with each other, for example, the unit storage cost of an exemplar and a node may be different, then comparing these two settings may seem to be unfair, or even they should not be comparable. Could you elaborate more on this ablation study setting, such as the motivation and the implementation details?

I really appreciate any help you can provide.

Question about full-shot setting

Hello,
Thank you for your interesting work.
I have a question concerning the full-shot setting. When you compared TOPIC with other state-of-the-art approaches, how many exemplars did you use for these approaches? and did you choose the exemplars randomly or by herding?

Thank you.

about the datasets and dataloader

Thanks for your job. I began to enter the “few shot incremental” filed and not familiar with the datasets. Could you please release the datasets.py and dataloader first for reference? Thank you very much.

Figure 4 Results

Hi, thanks for all of your help.

Do you have the exact numbers for the accuracies in Fig 4 of the paper?

Pre-trained nets

Dear Xiaoyu,

I would like to congratulate you for your very interesting novel problem in few-shot class incremental learning. I am interested in working more in this direction and I was wondering whether it is possible to release the pre-trained network weights of your quicknet and Resnet18 for every dataset? so that fair comparisons can be made with other new methods that would like to use this work as a reference point

ResNet-18 architecture used

Hi, do you use the torchvision ResNet-18 architecture for all the datasets?
I am a bit confused since CIFAR-100 is of size 32x32 and miniimagenet is of 84x84.

Readme file wrong data?

Did you paste data wrongly for CIFAR100, ResNet18, 5-way 5-shot (Fig.4 (b)) in the readme file? Looks it's the same as the MiniImageNet one: miniImageNet, ResNet18, 5-way 5-shot (Fig.4 (d))?
Can I know how you calculate the accuracy for each session? I run the resnet18-ft-cnn.sh for cifar100 without -resume-from './params/CIFAR100/resnet18/baseline/baseline_resnet18_cifar_64.10.params' and I get first session(base session) best accuracy at 70.23% which is higher than the Fig.4 (b) Ft-CNN first session around 64%. Not sure why this big gap happened.

Code

Hello, I think your work is great. I don’t know if there is any intention to publish the code.

Node & Exemplar Saving

Hi,
For your experiments, e.g. Figure 4 and Table 2, how many nodes did you save? What is the memory overhead for saving such amount of nodes? Also for iCarl and EEIL how many exemplars did you save?

About the result

I wonder how to get the result of session > 2. We will train on the 25 images for some epochs. We evaluate the model after each epoch and choose the best result or just after the last epoch?

About loss Function

Hello, I am repetition the code with the method in the paper ( using cifar100 dataset and quick base net), however I am a little confused now:

What is the centroid vector in the NG network you chose? Is the output of maxsoft layer? If I choose the maxsoft as my cemtroid vector (100 dims), the diagonal matrix A's value is very small like 0.000000673. So, A-1 (inverted A) calculated is very large, and the questions happens.
What is the approximate value of AL loss? Why is the value of my AL_loss are very very large?
About MML loss, ''Given the new class training set D(t) and NG G(t), for a training sample (x; y) 2 D(t), we extract its feature vector f = f(x; θ(t)) and feed f to the NG. We hope f matches the node vj whose label cj = y, and d(f; mj) � d(f; mi); i 6=
j, so that x is more probable to be correctly classified.''
f is a node belong to new classes, How can I find a node whose label y equal to f's label? Because f is the new classes' data.

ResNet18 for cifar100

fscil/model/cifar_resnet_v1.py

Line 217 in e76d37c

def cifar_resnet20_v1(**kwargs):

Hi,
It seems that you are using ResNet-20 for cifar100 (1 + 3x2 + 3x2 +3x2 +1 =20). Have I misunderstood it?

About neural gas mechanism

I have the honor to read your work! And I have a simple but maybe stupid question about the mechanism of neural gas. In most CIL works, they set a parameter for the size of memory with fixed capacity. I wonder how to control the the memory size for old data in neural gas (e.g. node deletion)?

Thank you for taking the time to read this!

Some questions about the detail implementation of your great job

Thanks for your great job. I am interested in your work and attempt to implement your work in pytorch but there are several problems when I am doing it. I would appreciate it if you could answer my questions.
Q1: When session t=1, how do you initialize the value of centroid vector for each NG node? use k-means or random initialization?
Q2: When calculating the anchor loss, you extract the subgraph of G(t), is there a restriction on the subgraph? And G(t) has many subgraphs,which subgraph should be chosen to calculate the anchor loss?
Thank you very much!

code missing?

Hello, is there any related code missing? Such as “tools.loss” and "tools.plot"？When will the full code be released？

Related to base class performances

I have taken 100 CUB classes as base classes and learned the restnet-18 network. I followed the same setting (50 epochs) mentioned in paper and It is giving 69% as base task accuracy. But when I started with the same base classes using the NCM method (UCIR- where cosine normalization is applied on class weights and features are l2 normalized) the base task performance is giving around 74%. In the paper, base task performance is mentioned as 68.8% in both methods. Can you explain how it is possible? or any different training setting is needed for the NCM method for base class training?

Thank you.

Regarding the selection of 5 shots

Hi @xyutao, thanks for your amazing work on FSCIL! I can't seem to find how the training image indices for few-shot training of new classes are chosen. Is it random? I'm trying to do FSCIL on a new dataset. If it is random, is there a specific seed being used as standard practice?

Accuracy for each session

Do you report the average incremental accuracy [1] which is the weighted average accuracy of only those classes that have already been trained, such as the code of SDC[2] https://github.com/yulu0724/SDC-IL/blob/master/test.py

if k == 0:

acc_ave += acc*(float(args.base) / (args.base+task_idnum_class_per_task))
else:
acc_ave += acc(float(num_class_per_task) / (args.base+task_id*num_class_per_task))

[1] R. Aljundi, P. Chakravarty, and T. Tuytelaars. Expert gate: Lifelong learning with a network of experts. In CVPR, pages 3366–3375, 2017. 2, 4, 6, 8
[2] Semantic Drift Compensation for Class-Incremental Learning

When will the complete code be available? please!

When will the complete code be available?
Meanwhile, can you provide some instructions on how to run your code? For example, environment configuration, data preparation, running scripts, result analysis, etc. Besides, can you provide a PyTorch version of the code? Perhaps most people are not familiar with MXNet.Thank you very much.

Evaluation

Hi, thanks for sharing the training sets. I want to know how 5-way 5-shot in the evaluation stage is constructed. Do you sample 5-way and 5-shot from the training set (all classes and samples from the specific session)?

Please correct me if I'm wrong.
So let's say we are at session 2 (adding 5 classes) and there are 60+5 classes.
5-way are sampled from 65 classes and 5-shot are sampled from the training set (60 x initialsamples + 5x5). -> This is an episode.
Then for each episode, all queries from the rest of the samples (not included in the training set) are evaluated?

The split of validation dataset

Thanks for your fantastic work, but I have a question about the validation dataset: whether you split the training set into two parts: train and val, when train the base model, and the number of val samples per class in base and new sessions?

xyutao / fscil Goto Github PK

fscil's People

Contributors

Stargazers

Watchers

Forkers

fscil's Issues

Recommend Projects

Recommend Topics

Recommend Org