qinbinli / moon Goto Github PK

View Code? Open in Web Editor NEW

245.0 245.0 53.0 19 KB

Model-Contrastive Federated Learning (CVPR 2021)

License: MIT License

Python 100.00%

contrastive-learning cvpr2021 federated-learning pytorch

moon's People

Contributors

Stargazers

Watchers

moon's Issues

Processing of Datasets

Hey, I had a question about the dataset processing, whether the final version of MOON's transformation in the dataloader only applied 'transforms.ToTensor()' and nothing like 'transforms.Normalize()'.
I really need your help so that I can follow my work better.

Questions about settings of negative samples

Hi, I have read your paper and I am very interested in your work! I think it's a very good article!
I have some questions about the settings of negative samples. I wonder why z and zpre are as far apart as possible.
Does this speed up training or improve accuracy?
Do you think the increase in accuracy has anything to do with this? Or is it due to contrastive loss?
look forward to your kind reply.

Convergence theory analysis

Hello, this article I think is very novel, and the application of comparative learning ideas is simply brilliant. But when I wanted to integrate it into my own work, I encountered the problem of convergence theory analysis. Is there a convergence analysis for MOON?

question of compute_accuracy

About the model Resnet50

What's the difference between the "models.resnet50(pretrained=False)" and "ResNet50_cifar10()" in model.py?

Questions about the reported test accuracy.

Hi! Thanks for your inspiring work.
I have a few questions about the reported accuracy in your paper, because I am not familiar with the field of federated learning.

In the section 4.2 Accuracy Comparision, do all client participate in the training(Participation ratio=1.0)?
When under partial participation setting, each round we need to evaluate the current model, so which dataset will the evaluation be conducted on? Will the evaluation be conducted on the selected clients or all clients? If we adopt the first strategy(closer to the practical scenario), how do we get the reported top accuracy? Is the reported top accuracy is the average acc among all clients?

About w_i^t in paper.

I have learned a lot and this is an excellent piece of work！

I have some questions in paper and code:

In paper, I donot really understand how to get w_i^t(in Algorithm1 procedure 10 it shows we get w_i^t by w_t,how does this work?)
in main.py line499,
local_train_net(nets_this_round, args, net_dataidx_map, train_dl=train_dl, test_dl=test_dl, global_model = global_model, prev_model_pool=old_nets_pool, round=round, device=device),
I think w_i^t is in nets_this_round,and this is initialized from nets(). Is this(w_i^t) donot change with training?

Questions about SCAFFOLD code

Hi, I am very interested in your work. I found some results of SCAFFOLD in your paper, but there is no codes about it. Could i know how you reproduce it with your codes. If convenient, can you release your codes about SCAFFOLD.

The code seems inconsistent with the algoritm in paper

Thank authors for providing code of MOON, It is very useful to me. But one thing I am confusing in these lines

MOON/main.py

Lines 309 to 311 in 6c7a4ed

 labels = torch.zeros(x.size(0)).cuda().long() 

 loss2 = mu * criterion(logits, labels)

Could you help me to understand why "labels" as zeros vector is necessary in line 309?
And I think the loss2 always return 0 every iteration, isn't it?
I hope can receive your responds

Some questions about the metrics.

Hi, thanks very much for the codes. Recently, I have re-generated the released codes under the default setting. However, I cannot find the metrics presented in your paper. It seems that the released codes can only compute the metrics of the global network or each local network. Therefore, could you please tell us which metric is used in your paper?

L2 norm code issue

Hi, I am very interested in your work, is there no code with loss function L2 norm？

why we need to requires_grad=True for batch input, which (not sure) may lead to out of memory error.

Hello,

Thanks for sharing the interesting work, I just have one question about the codehttps://github.com/QinbinLi/MOON/blob/main/main.py#L212

You set

x.requires_grad = True
in main.py Line 212. I just cannot understand why it is needed. As I met memory leak when I implement this into my own framework. After I remove this line, it works.

Looking forwards to your reply.

Questions about T-SNE

Hello,

Thanks for sharing the interesting work, I just have one question about the T-SNE part. Could i know more detailed information about how you generate such amazing T-SNE results and i want to reproduce them. And actually i have tried TSNE of sklearn and open-tsne but they did not work.

Same label for positive and negative cases

Hi,

Thank you for your interesting work. I have a question about the label used for the contrastive loss. Why positive and negative cases have the same label as 0 (lines 309, 297, 303)? It must be 0 for positive and 1 for negative pairs?

Another question about the result of Scaffold published in your paper. This code does not have Scaffold. So, did you run it from https://github.com/Xtra-Computing/NIID-Bench?

Thank you in advance for your feedback.

Quesiton for code

Hi, I have read your paper and code and found it to be an interesting work!

I have a question. In your code,

loss2 = mu * criterion(logits, labels) # (main.py line311).

I know it is uesd to caculate the con_loss (Eq.3 in your paper), but why it is implemented by cross-entropy loss with labels (zeros tensor)?

Hi,

Thank you for your wonderful work.
I a little confused about the code:

` for previous_net in previous_nets:
previous_net.cuda()
_, pro3, _ = previous_net(x)
nega = cos(pro1, pro3)
logits = torch.cat((logits, nega.reshape(-1,1)), dim=1)

            previous_net.to('cpu')`

First, In paper, the negetive representation seem from the local model of last round, instead of all previous rounds. I want to know if the code here makes sense. Second, "logits = torch.cat((logits, nega.reshape(-1,1)), dim=1)", what is the purpose of splicing? Looking forward to your reply, Thank you!!!

Time for Training on CIFAR-100 and Tiny-ImageNet

Hello, thanks for the good work. I'm trying to reproduce the results shown in the paper. Training on CIFAR-100 and Tiny-ImageNet seems to be very slow. I'm using Titan Xp. For CIFAR-100, it took 4 hours to train for 4 global epochs; the test accuracy right now is 7.1%. For Tiny-ImageNet, it took 3 hours to train 1 global epoch; the test accuracy right now is 0.5%. I followed the preprocessing steps you suggested. The command lines are exactly what you had on the GitHub project page. I also followed the hyperparameters outlined in the paper.

Here are the command lines I used:

python main.py --dataset=cifar100 --alg=moon --lr=0.01 --mu=1 --epochs=10 --comm_round=100 --n_parties=10 --partition=noniid --beta=0.5 --logdir='./logs/' --datadir='./data/'

python main.py --dataset=tinyimagenet --alg=moon --lr=0.01 --mu=1 --epochs=10 --comm_round=20 --n_parties=10 --partition=noniid --beta=0.5 --logdir='./logs/' --datadir='./data/'

Is it normal to take such a long time to train on CIFAR-100 and Tiny-ImageNet? Can I ask how long it took you to finish training on CIFAR-100 and Tiny-ImageNet?

Question about dirichlet non-iid

Hi, thanks for your excellent work. I am confused about the construction of non-iid data using dirichlet distribution, which is located at line 126 in utils.py. What's the function of line 126? Could you please give me some intuition? I have no idea about what "len(idx_j) < N / n_parties" means. Thanks a lot!

getting debug message

when i am applying on differenet data set getting below bug messages in log file
02-15 15:45 INFO cuda:0
02-15 15:45 INFO ####################################################################################################
02-15 15:45 INFO Partitioning data
02-15 15:45 INFO Data statistics: {0: {0: 1383}, 1: {0: 6, 1: 441, 2: 21}, 2: {0: 457, 1: 428}, 3: {0: 1, 1: 24, 2: 566}, 4: {0: 17, 1: 39, 2: 345}}
02-15 15:45 INFO Initializing nets
02-15 15:45 INFO in comm round:0
02-15 15:45 INFO Training network 0. n_training: 1383
02-15 15:45 INFO Training network 0
02-15 15:45 INFO n_training: 21
02-15 15:45 INFO n_test: 37
02-15 15:45 DEBUG STREAM b'IHDR' 16 13
02-15 15:45 DEBUG STREAM b'tIME' 41 7
02-15 15:45 DEBUG b'tIME' 41 7 (unknown)
02-15 15:45 DEBUG STREAM b'IDAT' 60 8192
02-15 15:45 DEBUG STREAM b'IHDR' 16 13
02-15 15:45 DEBUG STREAM b'tIME' 41 7
02-15 15:45 DEBUG b'tIME' 41 7 (unknown)
02-15 15:45 DEBUG STREAM b'IDAT' 60 8192
02-15 15:45 DEBUG STREAM b'IHDR' 16 13
02-15 15:45 DEBUG STREAM b'tIME' 41 7
02-15 15:45 DEBUG b'tIME' 41 7 (unknown)
02-15 15:45 DEBUG STREAM b'IDAT' 60 8192
02-15 15:45 DEBUG STREAM b'IHDR' 16 13
02-15 15:45 DEBUG STREAM b'tIME' 41 7
02-15 15:45 DEBUG b'tIME' 41 7 (unknown)
02-15 15:45 DEBUG STREAM b'IDAT' 60 8192
02-15 15:45 DEBUG STREAM b'IHDR' 16 13
02-15 15:45 DEBUG STREAM b'tIME' 41 7
02-15 15:45 DEBUG b'tIME' 41 7 (unknown)
02-15 15:45 DEBUG STREAM b'IDAT' 60 8192
02-15 15:45 DEBUG STREAM b'IHDR' 16 13
02-15 15:45 DEBUG STREAM b'tIME' 41 7

        posi = cos(pro1, pro2)
        for previous_net in previous_nets:
            previous_net.cuda()
            _, pro3, _ = previous_net(x)
            nega = cos(pro1, pro3)
            logits = torch.cat((logits, nega.reshape(-1,1)), dim=1)
            previous_net.to('cpu')
        logits /= temperature
        labels = torch.zeros(x.size(0)).cuda().long()
        loss2 = mu * criterion(logits, labels)###here

Question about FedAvg code

Thanks for sharing your code, it is an amazing work！
But I have a question on the implementation of FedAvg：

MOON/main.py

Line 587 in dbf6344

global_w[key] += net_para[key] * fed_avg_freqs[net_id]

why here aggregation without a "mean" operation ?

	labels = torch.zeros(x.size(0)).cuda().long()

	loss2 = mu * criterion(logits, labels)

qinbinli / moon Goto Github PK

moon's People

Contributors

Stargazers

Watchers

Forkers

moon's Issues

Recommend Projects

Recommend Topics

Recommend Org