Git Product home page Git Product logo

nvflare's Introduction

Blossom-CI documentation license pypi pyversion downloads

NVIDIA FLARE

NVIDIA FLARE (NVIDIA Federated Learning Application Runtime Environment) is a domain-agnostic, open-source, extensible SDK that allows researchers and data scientists to adapt existing ML/DL workflows to a federated paradigm. It enables platform developers to build a secure, privacy-preserving offering for a distributed multi-party collaboration.

Features

FLARE is built on a componentized architecture that allows you to take federated learning workloads from research and simulation to real-world production deployment.

Application Features

  • Support both deep learning and traditional machine learning algorithms (eg. PyTorch, TensorFlow, Scikit-learn, XGBoost etc.)
  • Support horizontal and vertical federated learning
  • Built-in Federated Learning algorithms (e.g., FedAvg, FedProx, FedOpt, Scaffold, Ditto, etc.)
  • Support multiple server and client-controlled training workflows (e.g., scatter & gather, cyclic) and validation workflows (global model evaluation, cross-site validation)
  • Support both data analytics (federated statistics) and machine learning lifecycle management
  • Privacy preservation with differential privacy, homomorphic encryption, private set intersection (PSI)

From Simulation to Real-World

  • FLARE Client API to transition seamlessly from ML/DL to FL with minimal code changes
  • Simulator and POC mode for rapid development and prototyping
  • Fully customizable and extensible components with modular design
  • Deployment on cloud and on-premise
  • Dashboard for project management and deployment
  • Security enforcement through federated authorization and privacy policy
  • Built-in support for system resiliency and fault tolerance

Take a look at NVIDIA FLARE Overview for a complete overview, and What's New for the lastest changes.

Installation

To install the current release:

$ python3 -m pip install nvflare

Getting Started

You can quickly get started using the FL simulator. A detailed getting started guide is available in the documentation.

Examples and notebook tutorials are located at NVFlare/examples.

Community

We welcome community contributions! Please refer to the contributing guidelines for more details.

Ask and answer questions, share ideas, and engage with other community members at NVFlare Discussions.

Related Talks and Publications

Take a look at our growing list of talks, blogs, and publications related to NVIDIA FLARE.

License

NVIDIA FLARE is released under an Apache 2.0 license.

nvflare's People

Contributors

apatole avatar can-zhao avatar chesterxgchen avatar dependabot[bot] avatar eordentlich avatar guopengf avatar holgerroth avatar isaacyangsla avatar jeffwan avatar kkersten avatar madil90 avatar nanaha1003 avatar nvidianz avatar nvkevlu avatar pxli avatar rongou avatar shuoer86 avatar syangster avatar taleinat avatar wangxiaoyunnv avatar wyli avatar xander-aphe-hatschi avatar yanchengnv avatar yanxuanliu avatar yhwen avatar yiheng-wang-nv avatar yinqingh avatar yuantinghsieh avatar zhijinl avatar ziyuexu77 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nvflare's Issues

Errors in streaming.py

@yanchengnv notice some issues in nvflare/app_common/widgets/streaming.py:

  • Line 47 to Line 52, the checking of the args and error messages are wrong.
  • All these write_xxx() methods, should check the tag and data arg and make sure they are what we expect (str, dict, โ€ฆ)
  • Line 257, in the call self.log_xxx(), we should set send_event=False; otherwise it may cause recursive events
  • Since fed events are handled by a separate thread, there is a potential racing condition that a fed event could be fired after END_RUN event. In the Receiver code, we need to make sure to discard other events after END_RUN (and hence finalize) is done.

shell scripts missing x permission in poc

The shell script files generated from poc command do not have original permission settings, especially the execute permission, after switching to shutil.unpack_archive.

Deploying FL on multiple computers.

I am trying to run NVFlare as a realistic setup with multiple computers. After the provisioning steps, I ran the server and clients, admin by startup package. The sever is started but the client and admin computers yielded the communication error.

2022-01-05 21:37:08,624 - Communicator - ERROR - Action: client_registration grpc communication error. retry: 1500, First start till now: 0.0013239383697509766 seconds.
2022-01-05 21:37:08,624 - Communicator - ERROR - Could not connect to server: imtl-85545-3:8765 Setting flag for stopping training. failed to connect to all addresses

I try listing up the listening ports on the server by the nmap and it showed up 127.0.1.1:8002 which means the server is listening only to the localhost but not another computer. This makes me wonder whether the current NVFlare support running realistic scenario or only POC (prove of concept) ? Please help me to solve this problem, thank you.

NVFlare python version not compatible with Google colab or Google Vertex AI Notebooks

NVFlare requires python 3.8.10 or higher per the pypi page, and both Google Colab and Google Vertex AI Notebooks currently run python 3.7.12 and 3.7.10 respectively. Upgrading these environments is relatively undocumented and complex.

For reference PyTorch 1.10 works with python 3.5 or greater.

Can the dependencies on python 3.8.10 be reduced so that python 3.7 will suffice?

Tenseal dependency for HE is not available on ARM aarch64

The tenseal dependency is not available for the ARM aarch64 platform, causing installation to fail. This has been reported for local development on Mac M1 and will affect other non-x86 architectures, Jetson, Clara AGX, IBM POWER, etc..

The tenseal dependency is only required when using the HEBuilder module, and it looks like all other functionality could be used without this dependency. Can tenseal be made optional, with the caveat that HE is not available without tenseal?

One option would be providing an alternate install, a requirements-no-tenseal.txt that includes everything but tenseal. For example, I generated this file in a clean venv on my linux machine using:

pip download nvflare -d /tmp -v \
    | grep Collecting \
    | awk '{print $2}' \
    | tr '[:upper:]' '[:lower:]' \
    | grep -v tenseal \
    | tee requirements-no-tenseal.txt

and verified that I can install nvflare and all deps except tenseal by copying to an aarch64 system (in this case a Jetson TX2) with:

python3 -m pip install --no-deps -r requirements-no-tenseal.txt

This is a pretty awkward solution. It would be much cleaner to remove the tenseal dependency in the default packaging, since HE is optional, and note in the docs that tenseal must be installed when using HE.

Prostate example

Add multi-site prostate example to show monai usage, fedprox algorithm, and non-iid FL scenario

License header in source files is outdated.

The copyright year in license header needs to include this year, 2022. Therefore the first line should change to

Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.

to include year 2022.

Multiple FL servers on the same machine

When running multi FL servers on the same machine, even with their individual ports for admin and client communications, the secure grpc communication encounters issues:

E0120 10:25:20.267690287 12242 ssl_transport_security.cc:1468] Handshake failed with fatal error SSL_ERROR_SSL: error:1000007d:SSL routines:OPENSSL_internal:CERTIFICATE_VERIFY_FAILED.

It seemed /tmp/fl_server contained only one of the multiple FL servers configurations.

Workflow for automatically building documentation is not working for apidocs

The apidocs are being omitted from being checked in because of the following line in .gitignore:

docs/apidocs/nvflare.*

Since the workflow is automatically using what is checked out from the main branch to run the docs build, the .gitignore is being used and the generated apidocs html files are not checked into the docs branch and thus they do not make it to pages.

Improve examples

hello examples are not as refined as cifar10 example. Improve all examples so they're of same quality.

Deploy command

Hi there,

when I tried to get hello-monai deployed with deploy_app hello-monai as stated in the README, I get an error.
It works, if I add either client or server behind the command:

deploy_app hello-monai server
or
deploy_app hello-monai client

Document page states NVFlare only compatible with one single Python version

Previously, NVFlare 1.X was compatible with (and ran on) Python 3.8.10 due to the pip package was released with pyc files only. Those pyc files were compiled by Python 3.8.10 interpreter and thus must run in Python 3.8.10 environment.
In NVFlare 2.x, the pip packages are source codes, in stead of pyc files. Therefore, the original statement may cause confusion.

image

Time lag on fed events

The server side fed event runner can handle 10 events per sec. When lots of fed events are coming, it could take too long to process all of them.

admin command "sys_info client" error

admin command "sys_info client" result with error stack_trace.

File "/opt/conda/lib/python3.8/site-packages/nvflare/fuel/hci/server/reg.py", line 104, in process_command
self._do_command(conn, command)
File "/opt/conda/lib/python3.8/site-packages/nvflare/fuel/hci/server/reg.py", line 92, in _do_command
handler(conn, args)
File "/opt/conda/lib/python3.8/site-packages/nvflare/private/fed/server/sys_cmd.py", line 66, in sys_info
self._process_replies(conn, replies)
File "/opt/conda/lib/python3.8/site-packages/nvflare/private/fed/server/sys_cmd.py", line 77, in _process_replies
conn.append_string("Client " + r.client_name)
AttributeError: 'ClientReply' object has no attribute 'client_name'

Use Learner API for examples

NVFLARE now defines a Learner class and a built-in executor that can work with a Learner implementation. Federated deep learning apps should be written as Learners instead of Executors.

Currently all examples use Executors, please change to use Learner API.

No module named 'pt'

Hi there,

I am trying to get the cifar10 example running with Federated Learning.

I followed all the steps mentioned here https://nvidia.github.io/NVFlare/quickstart.html and then uploaded and deployed the app from the admin terminal. When I am trying to start the app, I am getting the following error:

./run_1/app_server/config/config_fed_server.json in JSON element components.#5: No module named 'pt'

Event though, in the run_1 folder of the clients is a folder called pt with the specified learners. Do I have to configure the path to the custom folders somewhere?

Error in pt_file_model_persistor.py

I am using NVFLare version 2.0.6
However, when I starting the app on my system (includes 4 clients), the server got error like this:

2022-01-27 04:48:10,374 - ServerRunner - ERROR - [run=1]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: expect model to be torch.nn.Module but got <class 'dict'>
2022-01-27 04:48:10,374 - ServerRunner - INFO - [run=1]: asked to abort - triggered abort_signal to stop the RUN
2022-01-27 04:48:10,374 - ServerRunner - INFO - [run=1]: starting workflow scatter_gather_ctl (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) ...
2022-01-27 04:48:10,374 - ScatterAndGather - INFO - [run=1]: Initializing ScatterAndGather workflow.
2022-01-27 04:48:10,374 - PTFileModelPersistor - ERROR - [run=1]: error getting state_dict from model object
Traceback (most recent call last):
  File "/home/jupyter-test/.conda/envs/fl/lib/python3.8/site-packages/nvflare/app_common/pt/pt_file_model_persistor.py", line 202, in load_model
    data = self.model.state_dict() if self.model is not None else OrderedDict()
AttributeError: 'dict' object has no attribute 'state_dict'
2022-01-27 04:48:10,374 - ServerRunner - ERROR - [run=1]: Aborting current RUN due to FATAL_SYSTEM_ERROR received: cannot create state_dict from model object
2022-01-27 04:48:10,374 - ServerRunner - INFO - [run=1]: asked to abort - triggered abort_signal to stop the RUN
2022-01-27 04:48:10,375 - ServerRunner - INFO - [run=1]: Workflow scatter_gather_ctl (<class 'nvflare.app_common.workflows.scatter_and_gather.ScatterAndGather'>) started
2022-01-27 04:48:10,375 - ScatterAndGather - INFO - [run=1, wf=scatter_gather_ctl]: Beginning ScatterAndGather training phase.
2022-01-27 04:48:10,375 - ScatterAndGather - INFO - [run=1, wf=scatter_gather_ctl]: Abort signal received. Exiting at round 0.
2022-01-27 04:48:10,375 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: Workflow: scatter_gather_ctl finalizing ...
2022-01-27 04:48:12,877 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: ABOUT_TO_END_RUN fired
2022-01-27 04:48:12,877 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: END_RUN fired
2022-01-27 04:48:12,878 - ServerRunner - INFO - [run=1, wf=scatter_gather_ctl]: Server runner finished.
2022-01-27 04:48:13,376 - FederatedServer - INFO - Server app stopped.

Please help me resolving this problem, thank you.

Add log_critical to FLComponent

We have log_info, log_warning, log_error, log_exception, log_debug functions already.

Add log_critical to be consistent with Python logger.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.