Git Product home page Git Product logo

fedprox's Introduction

Federated Optimization in Heterogeneous Networks

This repository contains the code and experiments for the paper:

Federated Optimization in Heterogeneous Networks

MLSys 2020

Federated Learning is a distributed learning paradigm with two key challenges that differentiate it from traditional distributed optimization: (1) significant variability in terms of the systems characteristics on each device in the network (systems heterogeneity), and (2) non-identically distributed data across the network (statistical heterogeneity). In this work, we introduce a framework, FedProx, to tackle heterogeneity in federated networks, both theoretically and empirically.

This repository contains a set of detailed empirical evaluation across a suite of federated datasets. We show that FedProx allows for more robust convergence than FedAvg. In particular, in highly heterogeneous settings, FedProx demonstrates significantly more stable and accurate convergence behavior relative to FedAvg—improving absolute test accuracy by 22% on average.

General Guidelines

Note that if you would like to use FedProx as a baseline and run our code:

  • If you are using different datasets, then at least the learning rates and the mu parameter need to be tuned based on your metric. You might want to tune mu from {0.001, 0.01, 0.1, 0.5, 1}. There are no default mu values that would work for all settings.

  • If you are using the same datasets as those used here, then need to use the same learning rates and mu reported in our paper.

Preparation

Dataset generation

We already provide four synthetic datasets that are used in the paper under corresponding folders. For all datasets, see the README files in separate data/$dataset folders for instructions on preprocessing and/or sampling data.

The statistics of real federated datasets are summarized as follows.

Dataset Devices Samples Samples/device
mean (stdev)
MNIST 1,000 69,035 69 (106)
FEMNIST 200 18,345 92 (159)
Shakespeare 143 517,106 3,616 (6,808)
Sent140 772 40,783 53 (32)

Downloading dependencies

pip3 install -r requirements.txt  

Run on synthetic federated data

(1) You don't need a GPU to run the synthetic data experiments:

export CUDA_VISIBLE_DEVICES=

(2) Run the instructions as follows, and the log files will be automatically stored for drawing figures later.

bash run_fedavg.sh synthetic_iid 0 | tee log_synthetic/synthetic_iid_client10_epoch20_mu0
bash run_fedprox.sh synthetic_iid 0 1 | tee log_synthetic/synthetic_iid_client10_epoch20_mu1
bash run_fedavg.sh synthetic_0_0 0 | tee log_synthetic/synthetic_0_0_client10_epoch20_mu0
bash run_fedprox.sh synthetic_0_0 0 1 | tee log_synthetic/synthetic_0_0_client10_epoch20_mu1
bash run_fedavg.sh synthetic_0.5_0.5 0 | tee log_synthetic/synthetic_0.5_0.5_client10_epoch20_mu0
bash run_fedprox.sh synthetic_0.5_0.5 0 1 | tee log_synthetic/synthetic_0.5_0.5_client10_epoch20_mu1
bash run_fedavg.sh synthetic_1_1 0 | tee log_synthetic/synthetic_1_1_client10_epoch20_mu0
bash run_fedprox.sh synthetic_1_1 0 1 | tee log_synthetic/synthetic_1_1_client10_epoch20_mu1

(3) Draw figures to reproduce results on synthetic data

python plot_fig2.py loss     # training loss
python plot_fig2.py accuracy # testing accuracy
python plot_fig2.py dissim   # dissimilarity metric

The training loss, testing accuracy, and dissimilarity metric figures are saved as loss.pdf, accuracy.pdf and dissim.pdf respectively, under the current folder where you call plot_fig2.py. You can check that these figures reproduce the results in Figure 2 in the paper. Make sure to use the default hyper-parameters in run_fedavg.sh/run_fedprox.sh for synthetic data.

For example, the training loss for synthetic datasets would look like this:

Run on real federated datasets

(1) Specify a GPU id if needed:

export CUDA_VISIBLE_DEVICES=available_gpu_id

Otherwise just run to CPUs [might be slow if testing on Neural Network models]:

export CUDA_VISIBLE_DEVICES=

(2) Run on one dataset. First, modify the run_fedavg.sh and run_fedprox.sh scripts, specify the corresponding model of that dataset (choose from flearn/models/$DATASET/$MODEL.py and use $MODEL as the model name), specify a log file name, and configure all other parameters such as learning rate (see all hyper-parameters values in the appendix of the paper).

For example, for all the synthetic data:

fedavg.sh:

python3  -u main.py --dataset=$1 --optimizer='fedavg'  \
            --learning_rate=0.01 --num_rounds=200 --clients_per_round=10 \
            --eval_every=1 --batch_size=10 \
            --num_epochs=20 \
            --drop_percent=$2 \
            --model='mclr' 

fedprox.sh:

python3  -u main.py --dataset=$1 --optimizer='fedprox'  \
            --learning_rate=0.01 --num_rounds=200 --clients_per_round=10 \
            --eval_every=1 --batch_size=10 \
            --num_epochs=20 \
            --drop_percent=$2 \
            --model='mclr' \
            --mu=$3

Then run:

mkdir synthetic_1_1
bash run_fedavg.sh synthetic_1_1 0 | tee synthetic_1_1/fedavg_drop0
bash run_fedprox.sh synthetic_1_1 0 0 | tee synthetic_1_1/fedprox_drop0_mu0
bash run_fedprox.sh synthetic_1_1 0 1 | tee synthetic_1_1/fedprox_drop0_mu1

bash run_fedavg.sh synthetic_1_1 0.5 | tee synthetic_1_1/fedavg_drop0.5
bash run_fedprox.sh synthetic_1_1 0.5 0 | tee synthetic_1_1/fedprox_drop0.5_mu0
bash run_fedprox.sh synthetic_1_1 0.5 1 | tee synthetic_1_1/fedprox_drop0.5_mu1

bash run_fedavg.sh synthetic_1_1 0.9 | tee synthetic_1_1/fedavg_drop0.9
bash run_fedprox.sh synthetic_1_1 0.9 0 | tee synthetic_1_1/fedprox_drop0.9_mu0
bash run_fedprox.sh synthetic_1_1 0.9 1 | tee synthetic_1_1/fedprox_drop0.9_mu1

And the test accuracy, training loss, and dissimilarity numbers will be saved in the log files.

(3) After you collect logs for all the 5 datasets in Figure 1 (synthetic, mnist, femnist, shakespeare, sent140) (the log directories should be [synthetic_1_1, mnist, femnist, shakespeare, sent140]), run:

python plot_final_e20.py loss

to reproduce results in Figure 1 (the generated figure is called loss_full.pdf).

Note: If you only want to quickly verify the results on the first synthetic dataset, you can modify the plot_final_e20.py script by changing range(5) in Line 54 to range(1), and run python plot_final_e20.py loss.

Note: It might take a much longer time to run on real datasets than synthetic data because real federated datasets are larger and some of the models are deep neural networks.

References

See our FedProx paper for more details as well as all references.

fedprox's People

Contributors

litian96 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

fedprox's Issues

Algorithm 2 inconsistent with code

Thanks for the work :)
I have read the code, and corresponding issues #10, but there are some places I still feel inconsistent with the paper. Please correct me if I am wrong.

  • In the Algorithm 2 line 7, we are calculating the norm between local model and global model. But the code is using l2 norm for local model without considering global model. Taks mnist/mclr.py line 40 for example.

  • I also checked the NLP experiments Shakespare, but I didn't find the regularization part in create_model. shakespare/stacked_lstm.py create_model

Thank you!

fig1

Dynamic μ

Does the current implementation provide the option for heuristic μ as discussed in "C.3.3 Adaptively setting μ" from https://arxiv.org/pdf/1812.06127.pdf?

We decrease μ by 0.1 when the loss continues to decrease for 5 rounds and increase μ by 0.1 when we see the loss increase.

I assume that you mean that you use the same μ for all clients, and that you refer to the global loss, right?

Thank you

Where is the gamma in the code implemetion?

According to Algorithm 2, there is a parameter gamma in the input which measures how much local
computation is performed to solve the local subproblem on device k at the t-th round.
image
But I can't find gamma in the code implemention.
In (https://github.com/litian96/FedProx/blob/master/flearn/models/mnist/mclr.py) there is only a variable num_epochs.
def solve_inner(self, data, num_epochs=1, batch_size=32):
'''Solves local optimization problem'''
for _ in trange(num_epochs, desc='Epoch: ', leave=False, ncols=120):
for X, y in batch_data(data, batch_size):
with self.graph.as_default():
self.sess.run(self.train_op,
feed_dict={self.features: X, self.labels: y})
soln = self.get_params()
comp = num_epochs * (len(data['y'])//batch_size) * batch_size * self.flops
return soln, comp
So could please help me find gamma?

No module named 'flearn.models.nist.stacked_lstm'

hey~
when run main.py
have an error:
Traceback (most recent call last):

File "", line 1, in
runfile('C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py', wdir='C:/Users/Administrator/Desktop/federated learning/code/FedProx-master')

File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 678, in runfile
execfile(filename, namespace)

File "E:\anaconda\Anaconda\lib\site-packages\spyder_kernels\customize\spydercustomize.py", line 106, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 130, in
main()

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 118, in main
options, learner, optimizer = read_options()

File "C:/Users/Administrator/Desktop/federated learning/code/FedProx-master/main.py", line 94, in read_options
mod = importlib.import_module(model_path)

File "E:\anaconda\Anaconda\lib\importlib_init_.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)

File "", line 978, in _gcd_import

File "", line 961, in _find_and_load

File "", line 936, in _find_and_load_unlocked

File "", line 205, in _call_with_frames_removed

File "", line 978, in _gcd_import

File "", line 961, in _find_and_load

File "", line 948, in _find_and_load_unlocked

ModuleNotFoundError: No module named 'flearn.models.nist.stacked_lstm'

And flearn.models.nist.stacked_lstm does not exist exactly. Why?
Thank you so much.

about personalized FL

FedProx, only the performance about clients' own testset was concerned, without global testset. I know that sometimes personalized FL concerns about client's testset, but why not do we compare personialized FL with local train in clients?
If we just concern about clients' own testset, I think comparsion experiments with the profermance of local train are necessary.

module version

Can you put the right version of the module to requirements.txt? Or it will download the latest version.

Tensorflow installation

Hi, I got this problem over Mac OS and windows:

~ % pip install tensorflow-gpu==1.10
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu==1.10 (from versions: none)
ERROR: No matching distribution found for tensorflow-gpu==1.10

same with pip3

Did I missed anything?
Thanks

Obtain \nabla h_k(w_t, w_t) in FedProx

Hi,

I studied your paper/code and I am trying to obtain \nabla h_k(w_t, w_t) to use as a local optimization criteria. In the fedprox and pgd codes, it is not clear to me where the gradients \nabla h_k(w, w_t) are evaluated. Could you help me with this?

If I can understand where these gradients are evaluated, I could simply pass w_t, the self.latest_model in fedprox, to this function instead of using the local model.

Best Regards,
Mairton

All clients are sharing the same underlying learner.

self.clients = self.setup_clients(dataset, self.client_model)

Please take a look at this line. It seems that all clients are using the same ML model for local training. In other words, there is no local model, but a global model which is sequentially trained on each client.

This can be verified by the following code snippet (I have tested it on flearn/trainers/fedavg.py).

            csolns = []  # buffer for receiving client solutions

            lastc = None
            for idx, c in enumerate(active_clients.tolist()):  # simply drop the slow devices
                print(i, idx)
                if lastc is not None:
                  for j in range(len(lastc)):
                    print('Is the parameters of the current client (before training) the same as the parameters of the previous client (after training)?: %s' % (c.get_params()[j] == lastc[j]).all())
                  from time import sleep
                  sleep(1)
                else:
                  print('The first client.')
                # communicate the latest model
                c.set_params(self.latest_model)

                # solve minimization locally
                soln, stats = c.solve_inner(num_epochs=self.num_epochs, batch_size=self.batch_size)
                lastc = c.get_params()

                # gather solutions from client
                csolns.append(soln)

                # track communication cost
                self.metrics.update(rnd=i, cid=c.id, stats=stats)

            # update models
            self.latest_model = self.aggregate(csolns)

In my opinion, this is not expected for federated learning.

No model of "'mnist.cnn"

Hi there,
I do see you have the option of using CNN on the MNIST dataset. But I don't see the implementation in the model.
Would you provide it later?

BTW, I was also on ICML this year, but was unable to attend the poster session. Would you put your poster on your homepage as well?

H.

Should the global model replace the client model?

Hi, I read your paper and code, and this work has inspired me a lot in my work on Federated Learning Optimization. I am trying to reproduce FedProx using PyTorch and I am confused on a small detail. In the algorithm in the paper, the local client model seems to have no replacement operation, i.e. w_k^t=w^t$

image

But when I read your code, I found that there is actually a REPLACE operation.

self.latest_model = self.aggregate(csolns)
self.client_model.set_params(self.latest_model)

And I also found similar operations in a PyTorch replication repo's.

FedMA

https://github.com/IBM/FedMA/blob/4b586a5a22002dc955d025b890bc632daa3c01c7/main.py#L863-L883

Q1: Actually, should I use this aggregated model to replace the local client model after aggregation?

Q2: When not replacing, it can be interpreted as the local model $w_k^t$ trying to approximate the global model $w^t$. From another point of view, does it count to alleviate the catastrophic forgetting problem?

If I have misunderstood something, please let me know. I look forward to hearing from you.

python version

What python version are you using? I use python3.6 and some packages do not support import, so I switched to 3.5 because 3.5 has been deprecated and the dependencies cannot be downloaded.

problems when run shakespeare and sent140

Dear Tian:
when i run below on CPU:
python3 -u main.py --dataset='sent140' --optimizer='fedprox'
--learning_rate=0.01 --num_rounds=200 --clients_per_round=10
--mu=0 --eval_every=1 --batch_size=10
--num_epochs=1
--model='stacked_lstm' | tee logs/‘logs_sent140_mu0_E1_fedprox’

it runs very very slow,and the worst is the outputs are the same numbers! Result is below:
5726 Clients in Total
Training with 10 workers ---
At round 0 accuracy: 0.4060871469235822
At round 0 training accuracy: 0.40770690942001303
At round 0 training loss: 0.6931471925528921
gradient difference: 0.3779687893000023
At round 1 accuracy: 0.5939128530764178
At round 1 training accuracy: 0.5922930905799869
At round 1 training loss: 0.682659032131717
gradient difference: 0.6406151359028104
At round 2 accuracy: 0.4060871469235822
At round 2 training accuracy: 0.40770690942001303
At round 2 training loss: 0.6951613189004014
gradient difference: 1.0240842395041418
At round 3 accuracy: 0.5939128530764178
At round 3 training accuracy: 0.5922930905799869
At round 3 training loss: 0.6845133630735032
gradient difference: 1.334649037607692
At round 4 accuracy: 0.4060871469235822
At round 4 training accuracy: 0.40770690942001303
At round 4 training loss: 0.7872438000397856
gradient difference: 3.8706158347478246
At round 5 accuracy: 0.5939128530764178
At round 5 training accuracy: 0.5922930905799869
At round 5 training loss: 0.676954747225743
gradient difference: 2.8532703690523324
At round 6 accuracy: 0.4060871469235822
At round 6 training accuracy: 0.40770690942001303
At round 6 training loss: 0.6952778442305486
gradient difference: 2.9297919740883964
At round 7 accuracy: 0.5939128530764178
At round 7 training accuracy: 0.5922930905799869
At round 7 training loss: 0.7021283723042158
gradient difference: 4.2864026772781
At round 8 accuracy: 0.5939128530764178
At round 8 training accuracy: 0.5922930905799869
At round 8 training loss: 0.6761318949424154
gradient difference: 4.987087255237341
At round 9 accuracy: 0.4060871469235822
At round 9 training accuracy: 0.40770690942001303
At round 9 training loss: 0.8113437744137745
gradient difference: 9.235964830922306
At round 10 accuracy: 0.5939128530764178
At round 10 training accuracy: 0.5922930905799869
At round 10 training loss: 0.7755919640498169
gradient difference: 6.982072813031079
At round 11 accuracy: 0.5939128530764178
At round 11 training accuracy: 0.5922930905799869
At round 11 training loss: 0.7091725448816267
gradient difference: 6.115867566149534
At round 12 accuracy: 0.5939128530764178
At round 12 training accuracy: 0.5922930905799869
At round 12 training loss: 0.7398191231275261
gradient difference: 7.72441549160035
At round 13 accuracy: 0.5939128530764178
At round 13 training accuracy: 0.5922930905799869
At round 13 training loss: 1.0417891773572328
gradient difference: 15.32712477985914

And the same result happened when i run shakespeare.
But mnist and nist performs good.
how can i solve this? is there something wrong of stacked_lstm?

problem about setup

Hello,when I tried to run this code and typed pip3 install -r requirements.txt in my anaconda, an error showed up like this
image
how do I solve it ? Thanks.

The FEMNIST data generation

In my_sample.py file for generating FEMNIST data, the < seems should be > in line. Otherwise, the retrieved samples will be the same for the same class at the beginning. I checked the data files shared by google drive. There are indeed several same images for the identical class, each user.

ModuleNotFoundError: No module named 'FedML'

After running this: !python experiments/centralized/moleculenet/molecule_classification_multilabel.py

Getting this Error Message:
Traceback (most recent call last):
File "experiments/centralized/moleculenet/molecule_classification_multilabel.py", line 11, in
from data_preprocessing.molecule.data_loader import get_dataloader, get_data
File "/content/drive/My Drive/Colab Notebooks/FedGraphNN/data_preprocessing/molecule/data_loader.py", line 12, in
from FedML.fedml_core.non_iid_partition.noniid_partition import partition_class_samples_with_dirichlet_distribution
ModuleNotFoundError: No module named 'FedML'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.