fets-ai / challenge Goto Github PK

View Code? Open in Web Editor NEW

48.0 48.0 29.0 11.53 MB

The repo for the FeTS Challenge

Home Page: https://www.synapse.org/#!Synapse:syn28546456

Python 89.54% R 9.05% Dockerfile 1.41%

challenge federated-learning

challenge's People

Contributors

Stargazers

Watchers

challenge's Issues

openfl-fets error

I install the dependencies according to the Fets_Challenge official guide, but get something wrong:

`No 'TrainOrVal' column found in split_subdirs csv, so performing automated split using percent_train of 0.8
[08:53:18] INFO Updating aggregator.settings.rounds_to_train to 70... native.py:83
INFO Updating aggregator.settings.db_store_rounds to 2... native.py:83
/home/zss/anaconda3/envs/FL/lib/python3.7/site-packages/pandas/core/frame.py:4913: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
errors=errors,
Traceback (most recent call last):
File "fets_challenge_task1.py", line 654, in
device=device)
File "/data1/zsscode/Fong/code/FL/HFL/Task_1/fets_challenge/experiment.py", line 289, in run_challenge_experiment
task_runner = copy(plan).get_task_runner(list(collaborator_data_loaders.values())[0])
File "/home/zss/anaconda3/envs/FL/lib/python3.7/site-packages/openfl/federated/plan/plan.py", line 340, in get_task_runner
self.runner_ = Plan.Build(**defaults)
File "/home/zss/anaconda3/envs/FL/lib/python3.7/site-packages/openfl/federated/plan/plan.py", line 179, in Build
module = import_module(module_path)
File "/home/zss/anaconda3/envs/FL/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/root/.local/workspace/impl/fets_challenge_model.py", line 53, in
from fets.models.pytorch.brainmage.losses import MCD_loss, MCD_MSE_loss, dice_loss
File "/home/zss/anaconda3/envs/FL/lib/python3.7/site-packages/fets/models/pytorch/brainmage/init.py", line 1, in
from .brainmage import BrainMaGeModel
File "/home/zss/anaconda3/envs/FL/lib/python3.7/site-packages/fets/models/pytorch/brainmage/brainmage.py", line 38, in
from openfl import load_yaml
ImportError: cannot import name 'load_yaml' from 'openfl' (/home/zss/anaconda3/envs/FL/lib/python3.7/site-packages/openfl/init.py)
(FL) root@omnisky:/data1/zsscode/Fong/code/FL/HFL/Task_1# python
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

from openfl.models.pytorch import PyTorchFLModel
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'openfl.models'

`
and I also try to install from source according to the specified branch, but i get same wrong. Is there anything wrong with my installation process？

what is the Cross validation setup?

what is the Cross validation setup? How is the data ratio within the training data used to evaluate and compute the performance metrics? If the data from one colab is used to evaluate after training the Unet from the other colabs in a select epoch or is there any other scheme?

Baseline functions for aggregator & collaborator

what are the aggregator & collaborator functions that you are going to use as baseline?

GANDLF hash in setup for task_1 doesn't seem to pip install

Hi, I tried using the setup.py and also independently installing GANDLF with pip,

pip install GANDLF@git+https://github.com/CBICA/GaNDLF.git@e4d0d4bfdf4076130817001a98dfb90189956278

but both get stuck on a git checkout. Was wondering if there's an alternate hash?

git checkout -q e4d0d4bfdf4076130817001a98dfb90189956278

**IP安装Singularity时，出现超时问题，我踩过了，避免大家踩坑吧

踩坑两天。超时问题使用ss or clash在控制台代理都不可行，安装全凭运气，因为GO语言安装代理在**是阻塞的。服务器上安装成功了，linux虚拟机失败了。

运行./mconfig后，运行cd builddir and make时出现超时。
进入builddir文件夹
文本打开Makefile文件
找到并修改修改GOPROXY := https://proxy.golang.org改为GOPROXY := https://proxy.golang.cn

So, why not use Docker? emmmm...

Installation issue : conflicting dependencies

Hey, I'm following the instructions in https://github.com/FETS-AI/Challenge/tree/main/Task_1#getting-started

But at step pip install .
I get this error:


INFO: pip is looking at multiple versions of nnunet to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of medpy to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of batchgenerators to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of fets to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of <Python from Requires-Python> to determine which version is compatible with other requirements. This could take a while.
INFO: pip is looking at multiple versions of fets-challenge to determine which version is compatible with other requirements. This could take a while.
ERROR: Cannot install fets and fets-challenge because these package versions have conflicting dependencies.

The conflict is caused by:
    nnunet 1.6.6 depends on torch>=1.6.0a
    gandlf 0.0.14.dev0 depends on torch==1.8.2

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

ERROR: ResolutionImpossible: for help visit https://pip.pypa.io/en/latest/topics/dependency-resolution/#dealing-with-dependency-conflicts

Note that I installed pytorch using this command:
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

if error occurs checkpoint issues to run again

Traceback (most recent call last):
File ".\FeTS_Challenge.py", line 584, in
restore_from_checkpoint_folder = restore_from_checkpoint_folder)
File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\experiment.py", line 364, in run_challenge_experiment
checkpoint_folder = setup_checkpoint_folder()
File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\checkpoint_utils.py", line 18, in setup_checkpoint_folder
checkpoint_num = sorted([int(x.replace('checkpoint/experiment_','')) for x in existing_checkpoints])[-1] + 1
File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\checkpoint_utils.py", line 18, in
checkpoint_num = sorted([int(x.replace('checkpoint/experiment_','')) for x in existing_checkpoints])[-1] + 1
ValueError: invalid literal for int() with base 10: 'checkpoint\experiment_1'

Code access for Federated Training (task 1) and evaluation metrics?

Is there any code implementation available for optimisation for task 1 which requires improving the weight aggregation and Is there any code implementation available for evaluation metrics namely Dice Similarity Coefficient (DSC), 95% Hausdorff distance (HD)?

No module named 'openfl.federated.data.loader_fets_challenge'

I had an error:
No module named 'openfl.federated.data.loader_fets_challenge'

Would anyone help me out? Thank you so much!

Cannot find where is package 'fets'

Hi all, I am working on Task 1 and when I try to run the baseline code for the very first time, I got this error

"Task_1/fets_challenge/gandlf_csv_adapter.py", line 13, in from fets.data.base_utils import get_appropriate_file_paths_from_subject_dir
ModuleNotFoundError: No module named 'fets'

also pls refer to the following screenshot

BTW, I have already installed openfl.

Thank you very much. Looking forward to your kind helps.

Installation error on step 'pip install .'

Hi,
I am getting this error while on the step on pip install . given in the readme.
I have been trying to solve this for the past 1 week I do not understand the cause for this error.

Here's my pytorch config:

Please let me know what I might be doing wrong or the changes I need to make, I'm running the exact same code mentioned in the readme even for the venv setup.

Thank you

Related to GaNDLF pathing issues with Windows

OpenFL had to be installed independently of Setup.py because OpenFL its self doesn't install on windows

Successfully installed packages from C:\Users\15702\.local\workspace/requirements.txt.

New workspace directory structure:
workspace
├── .workspace
├── agg_to_col_one_signed_cert.zip
├── agg_to_col_two_signed_cert.zip
├── cert
├── data
├── logs
├── partitioning_1.csv
├── partitioning_2.csv
├── plan
│   ├── cols.yaml
│   ├── data.yaml
│   ├── defaults
│   └── plan.yaml
├── requirements.txt
├── save
├── small_split.csv
├── src
│   ├── challenge_assigner.py
│   ├── fets_challenge_model.py
│   └── __init__.py
└── validation.csv

6 directories, 15 files
Setting Up Certificate Authority...

1.  Create Root CA
1.1 Create Directories
1.2 Create Database
1.3 Create CA Request and Certificate
2.  Create Signing Certificate
2.1 Create Directories
2.2 Create Database
2.3 Create Signing Certificate CSR
2.4 Sign Signing Certificate CSR
3   Create Certificate Chain

Done.
Creating AGGREGATOR certificate key pair with following settings: CN=openvessel.ptd.net, SAN=DNS:openvessel.ptd.net
  Writing AGGREGATOR certificate key pair to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/server
The CSR Hash for file server/agg_openvessel.ptd.net.csr = 2dffb3b6b3429066358c48f7817b37def87f94c4b6538a7511d9ec15d3eb64227561744b638709da4b7b3119e3f8062d
 Signing AGGREGATOR certificate
Creating COLLABORATOR certificate key pair with following settings: CN=one, SAN=DNS:one       
  Moving COLLABORATOR certificate to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/col_one
The CSR Hash for file col_one.csr = 788406d2db5277603ff520c9f86237fead29fd890db72e5f01eaec7671b2c92dd13eb69b28eb8f821958e5d67e3c7e2d
 Signing COLLABORATOR certificate

Registering odeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert\client\col_one in C:\Users\15702\.local\workspace\plan\cols.yaml
Creating COLLABORATOR certificate key pair with following settings: CN=two, SAN=DNS:two
  Moving COLLABORATOR certificate to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/col_two
The CSR Hash for file col_two.csr = cc4568bc2f495a46ceff0f3708b1a24a43ddff4aa6797cfd00eb51abdfa9a1078b8ed349a0fd0fcfb5ccabc9815b7f88
 Signing COLLABORATOR certificate

Registering odeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert\client\col_two in C:\Users\15702\.local\workspace\plan\cols.yaml
C:\Users\15702\.local\workspace
No 'TrainOrVal' column found in split_subdirs csv, so performing automated split using percent_train of 0.8
Traceback (most recent call last):
  File ".\FeTS_Challenge.py", line 566, in <module>
    restore_from_checkpoint_folder = restore_from_checkpoint_folder)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\experiment.py", line 254, in run_challenge_experiment
    gandlf_csv_path)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\gandlf_csv_adapter.py", line 147, in construct_fedsim_csv
    inner_dict = get_appropriate_file_paths_from_subject_dir(os.path.join(pardir, subdir), include_labels=True)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\fets\data\base_utils.py", line 14, in get_appropriate_file_paths_from_subject_dir
    filesInDir = os.listdir(dir_path)
FileNotFoundError: [WinError 3] The system cannot find the path specified: '/raid/datasets/FeTS22/MICCAI_FeTS2022_TrainingData\\FeTS2022_01333'

So the workspace is set up in VScode but experiment.py is not finding /raid/datasets/FeTS22MICCAI_FeTS2022_TrainingData\FeTS2022_01333

Validation function performance issues

In many of my tests, validation metrics consume the lion's share of the compute time, resulting in very long experiment runtimes (e.g. a week+ to converge). This makes it hard to iterate on ideas.

I'd like to propose two ideas to address this:

We add a "choose_collaborators_to_validate" function, just like our training equivalent, that allows for controlling how often these functions are run. For example, I may want to aggregate often in order to reduce local weight divergence, but I may want to reduce validation to once every 10 rounds. This would be very easy to implement in the custom assigner, experiment loop, and simulated time calculator.
We try to optimize the validation metrics functions themselves. The validation outputs could be cached, then the metrics all computed at once at the end of the round, where we might be able to leverage parallelism/acceleration better.

ignore_label_validation error

(venv) PS C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1> python .\FeTS_Challenge.py
Creating Workspace Directories
Creating Workspace Templates
Requirement already satisfied: torchvision in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from -r C:\Users\15702\.local\workspace/requirements.txt (line 1)) (0.9.2+cu111)
Requirement already satisfied: torch in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from -r C:\Users\15702\.local\workspace/requirements.txt (line 2)) (1.8.2+cu111)
Requirement already satisfied: numpy in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from torchvision->-r C:\Users\15702\.local\workspace/requirements.txt (line 1)) (1.21.0)
Requirement already satisfied: pillow>=4.1.1 in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from torchvision->-r C:\Users\15702\.local\workspace/requirements.txt (line 1)) (9.1.1)
Requirement already satisfied: typing-extensions in c:\coderepos\moanisandbox\fets-ai\challenge\task_1\venv\lib\site-packages (from torch->-r C:\Users\15702\.local\workspace/requirements.txt (line 2)) (4.2.0)
Successfully installed packages from C:\Users\15702\.local\workspace/requirements.txt.

New workspace directory structure:
workspace
├── .workspace
├── agg_to_col_one_signed_cert.zip
├── agg_to_col_two_signed_cert.zip
├── cert
├── checkpoint
├── data
├── gandlf_paths.csv
├── logs
├── output_validation
│   └── 0
├── partitioning_1.csv
├── partitioning_2.csv
├── plan
│   ├── cols.yaml
│   ├── data.yaml
│   ├── defaults
│   └── plan.yaml
├── raid
│   └── datasets
│       └── FeTS22
├── requirements.txt
├── save
│   └── fets_seg_test_init.pbuf
├── seg_test_train.csv
├── seg_test_val.csv
├── small_split.csv
├── src
│   ├── challenge_assigner.py
│   ├── fets_challenge_model.py
│   ├── __init__.py
│   └── __pycache__
│       ├── challenge_assigner.cpython-37.pyc
│       ├── fets_challenge_model.cpython-37.pyc
│       └── __init__.cpython-37.pyc
└── validation.csv

13 directories, 22 files
Setting Up Certificate Authority...

1.  Create Root CA
1.1 Create Directories
1.2 Create Database
1.3 Create CA Request and Certificate
2.  Create Signing Certificate
2.1 Create Directories
2.2 Create Database
2.3 Create Signing Certificate CSR
2.4 Sign Signing Certificate CSR
3   Create Certificate Chain

Done.
Creating AGGREGATOR certificate key pair with following settings: CN=openvessel.ptd.net, SAN=DNS:openvessel.ptd.net
  Writing AGGREGATOR certificate key pair to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/server
The CSR Hash for file server/agg_openvessel.ptd.net.csr = f713b37863866bd5a82473efd30b8e494ef0243b4470fae2ae40e7d75f5415475f38c91986391d95436bce024df14bf1
 Signing AGGREGATOR certificate
Creating COLLABORATOR certificate key pair with following settings: CN=one, SAN=DNS:one
  Moving COLLABORATOR certificate to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/col_one
The CSR Hash for file col_one.csr = 58fdc5a503366177f1556335d22295b6d598078341ad3b40ad7301c2cf3dac5252d8feea1f03bb7fa6077b2541562860
 Signing COLLABORATOR certificate

Registering odeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert\client\col_one in C:\Users\15702\.local\workspace\plan\cols.yaml
Creating COLLABORATOR certificate key pair with following settings: CN=two, SAN=DNS:two
  Moving COLLABORATOR certificate to: C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert/col_two
The CSR Hash for file col_two.csr = 374efb23a8b7af15d53eb824db7136e5996b418c38e9b65a12384788aff27fb0c5d59de2418784030bc3196d4342cf27
 Signing COLLABORATOR certificate

Registering odeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\cert\client\col_two in C:\Users\15702\.local\workspace\plan\cols.yaml
C:\Users\15702\.local\workspace\gandlf_paths.csv
No 'TrainOrVal' column found in split_subdirs csv, so performing automated split using percent_train of 0.8
[]
[]
[]
[20:09:15] INFO     Updating aggregator.settings.rounds_to_train to 5...                                                                           native.py:102
           INFO     Updating aggregator.settings.db_store_rounds to 5...                                                                           native.py:102
           WARNING  Did not find tasks.train.aggregation_type in config. Make sure it should exist. Creating...                                    native.py:105
           INFO     Updating task_runner.settings.device to cpu...                                                                                 native.py:102
           WARNING  Did not find task_runner.settings.fets_config_dict.data_preprocessing in config. Make sure it should exist. Creating...        native.py:105
           WARNING  Did not find task_runner.settings.fets_config_dict.ignore_label_validation in config. Make sure it should exist. Creating...   native.py:105
           INFO     FL-Plan hash is 601cd0b67629af4d8ea0527f65b8a6613cc7d60f28d1a035e5167db87264c20e2fc1f2844d0df0c45d72ae1b29dcff48                 plan.py:234
{
    "aggregator.settings.best_state_path": "save/fets_seg_test_best.pbuf",
    "aggregator.settings.db_store_rounds": 2,
    "aggregator.settings.init_state_path": "save/fets_seg_test_init.pbuf",
    "aggregator.settings.last_state_path": "save/fets_seg_test_last.pbuf",
    "aggregator.settings.rounds_to_train": 3,
    "aggregator.settings.write_logs": true,
    "aggregator.template": "openfl.component.Aggregator",
    "assigner.settings.training_tasks.0": "aggregated_model_validation",
    "assigner.settings.training_tasks.1": "train",
    "assigner.settings.training_tasks.2": "locally_tuned_model_validation",
    "assigner.settings.validation_tasks.0": "aggregated_model_validation",
    "assigner.template": "src.challenge_assigner.FeTSChallengeAssigner",
    "collaborator.settings.db_store_rounds": 1,
    "collaborator.settings.delta_updates": false,
    "collaborator.settings.opt_treatment": "RESET",
    "collaborator.template": "openfl.component.Collaborator",
    "compression_pipeline.settings": {},
    "compression_pipeline.template": "openfl.pipelines.NoCompressionPipeline",
    "data_loader.settings.feature_shape.0": 32,
    "data_loader.settings.feature_shape.1": 32,
    "data_loader.settings.feature_shape.2": 32,
    "data_loader.template": "openfl.federated.data.loader_fets_challenge.FeTSChallengeDataLoaderWrapper",
    "network.settings.agg_addr": "openvessel.ptd.net",
    "network.settings.agg_port": 54937,
    "network.settings.cert_folder": "cert",
    "network.settings.client_reconnect_interval": 5,
    "network.settings.disable_client_auth": false,
    "network.settings.hash_salt": "auto",
    "network.settings.tls": true,
    "network.template": "openfl.federation.Network",
    "task_runner.settings.device": "cpu",
    "task_runner.settings.fets_config_dict.batch_size": 1,
    "task_runner.settings.fets_config_dict.data_augmentation": {},
    "task_runner.settings.fets_config_dict.data_postprocessing": {},
    "task_runner.settings.fets_config_dict.enable_padding": false,
    "task_runner.settings.fets_config_dict.in_memory": true,
    "task_runner.settings.fets_config_dict.inference_mechanism.grid_aggregator_overlap": "crop",
    "task_runner.settings.fets_config_dict.inference_mechanism.patch_overlap": 0,
    "task_runner.settings.fets_config_dict.learning_rate": 0.001,
    "task_runner.settings.fets_config_dict.loss_function": "dc",
    "task_runner.settings.fets_config_dict.medcam_enabled": false,
    "task_runner.settings.fets_config_dict.metrics.0": "dice",
    "task_runner.settings.fets_config_dict.metrics.1": "dice_per_label",
    "task_runner.settings.fets_config_dict.metrics.2": "hd95_per_label",
    "task_runner.settings.fets_config_dict.model.amp": true,
    "task_runner.settings.fets_config_dict.model.architecture": "resunet",
    "task_runner.settings.fets_config_dict.model.base_filters": 32,
    "task_runner.settings.fets_config_dict.model.class_list.0": 0,
    "task_runner.settings.fets_config_dict.model.class_list.1": 1,
    "task_runner.settings.fets_config_dict.model.class_list.2": 2,
    "task_runner.settings.fets_config_dict.model.class_list.3": 4,
    "task_runner.settings.fets_config_dict.model.dimension": 3,
    "task_runner.settings.fets_config_dict.model.final_layer": "softmax",
    "task_runner.settings.fets_config_dict.model.norm_type": "instance",
    "task_runner.settings.fets_config_dict.nested_training.testing": 1,
    "task_runner.settings.fets_config_dict.nested_training.validation": -5,
    "task_runner.settings.fets_config_dict.num_epochs": 1,
    "task_runner.settings.fets_config_dict.optimizer.type": "sgd",
    "task_runner.settings.fets_config_dict.output_dir": ".",
    "task_runner.settings.fets_config_dict.parallel_compute_command": "",
    "task_runner.settings.fets_config_dict.patch_sampler": "label",
    "task_runner.settings.fets_config_dict.patch_size.0": 64,
    "task_runner.settings.fets_config_dict.patch_size.1": 64,
    "task_runner.settings.fets_config_dict.patch_size.2": 64,
    "task_runner.settings.fets_config_dict.patience": 100,
    "task_runner.settings.fets_config_dict.pin_memory_dataloader": false,
    "task_runner.settings.fets_config_dict.print_rgb_label_warning": true,
    "task_runner.settings.fets_config_dict.q_max_length": 100,
    "task_runner.settings.fets_config_dict.q_num_workers": 0,
    "task_runner.settings.fets_config_dict.q_samples_per_volume": 40,
    "task_runner.settings.fets_config_dict.q_verbose": false,
    "task_runner.settings.fets_config_dict.save_output": false,
    "task_runner.settings.fets_config_dict.save_training": false,
    "task_runner.settings.fets_config_dict.scaling_factor": 1,
    "task_runner.settings.fets_config_dict.scheduler.type": "triangle_modified",
    "task_runner.settings.fets_config_dict.track_memory_usage": false,
    "task_runner.settings.fets_config_dict.verbose": false,
    "task_runner.settings.fets_config_dict.version.maximum": "0.0.14",
    "task_runner.settings.fets_config_dict.version.minimum": "0.0.14",
    "task_runner.settings.fets_config_dict.weighted_loss": true,
    "task_runner.settings.train_csv": "seg_test_train.csv",
    "task_runner.settings.val_csv": "seg_test_val.csv",
    "task_runner.template": "src.fets_challenge_model.FeTSChallengeModel",
    "tasks.aggregated_model_validation.function": "validate",
    "tasks.aggregated_model_validation.kwargs.apply": "global",
    "tasks.aggregated_model_validation.kwargs.metrics.0": "valid_loss",
    "tasks.aggregated_model_validation.kwargs.metrics.1": "valid_dice",
    "tasks.locally_tuned_model_validation.function": "validate",
    "tasks.locally_tuned_model_validation.kwargs.apply": "local",
    "tasks.locally_tuned_model_validation.kwargs.metrics.0": "valid_loss",
    "tasks.locally_tuned_model_validation.kwargs.metrics.1": "valid_dice",
    "tasks.settings": {},
    "tasks.train.function": "train",
    "tasks.train.kwargs.epochs": 1,
    "tasks.train.kwargs.metrics.0": "loss",
    "tasks.train.kwargs.metrics.1": "train_dice"
}
           INFO     Building 🡆  Object FeTSChallengeDataLoaderWrapper from openfl.federated.data.loader_fets_challenge Module.                        plan.py:173
           INFO     Building 🡆  Object FeTSChallengeDataLoaderWrapper from openfl.federated.data.loader_fets_challenge Module.                        plan.py:173
           INFO     Building 🡆  Object FeTSChallengeDataLoaderWrapper from openfl.federated.data.loader_fets_challenge Module.                        plan.py:173
           INFO     Building 🡆  Object FeTSChallengeModel from src.fets_challenge_model Module.                                                       plan.py:173
Constructing queue for train data: 100%|████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.10it/s]
Calculating weights
Constructing queue for penalty data: 100%|██████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.26it/s] 
Looping over training data for penalty calculation: 100%|███████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  1.77it/s] 
Constructing queue for validation data: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.38it/s]
All Keys :  ['subject_id', '2', 'spacing', '3', '4', '5', 'label', 'path_to_metadata']
Since Device is CPU, Mixed Precision Training is set to False
[20:09:22] INFO     Building 🡆  Object FeTSChallengeModel from src.fets_challenge_model Module.                                                       plan.py:173
Constructing queue for train data: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.71it/s] 
Calculating weights
Constructing queue for penalty data: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.66it/s] 
Looping over training data for penalty calculation: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.01it/s] 
Constructing queue for validation data: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.51it/s]
All Keys :  ['subject_id', '2', 'spacing', '3', '4', '5', 'label', 'path_to_metadata']
Since Device is CPU, Mixed Precision Training is set to False
[20:09:25] INFO     Building 🡆  Object FeTSChallengeModel from src.fets_challenge_model Module.                                                       plan.py:173
Constructing queue for train data: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.65it/s] 
Calculating weights
Constructing queue for penalty data: 100%|██████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.86it/s] 
Looping over training data for penalty calculation: 100%|███████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  2.10it/s] 
Constructing queue for validation data: 100%|███████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.80it/s]
All Keys :  ['subject_id', '2', 'spacing', '3', '4', '5', 'label', 'path_to_metadata']
Since Device is CPU, Mixed Precision Training is set to False
Loading pretrained model...
[20:09:28] INFO     Building 🡆  Object NoCompressionPipeline from openfl.pipelines Module.                                                            plan.py:173
[20:09:29] INFO     Creating aggregator...                                                                                                     experiment.py:323
           INFO     Building 🡆  Object FeTSChallengeAssigner from src.challenge_assigner Module.                                                      plan.py:173
           INFO     Building 🡆  Object Aggregator from openfl.component Module.                                                                       plan.py:173
           INFO     Creating collaborators...                                                                                                  experiment.py:330
           INFO     Building 🡆  Object Collaborator from openfl.component Module.                                                                     plan.py:173
           INFO     Building 🡆  Object Collaborator from openfl.component Module.                                                                     plan.py:173
           INFO     Building 🡆  Object Collaborator from openfl.component Module.                                                                     plan.py:173
           INFO     Starting experiment                                                                                                        experiment.py:338
           INFO                                                                                                                                experiment.py:366
                    Created experiment folder experiment_1...                                                                                          

           INFO     Collaborators chosen to train for round 0:                                                                                 experiment.py:403
                            ['1', '2', '3']                                                                                                            

           INFO     Hyper-parameters for round 0:                                                                                              experiment.py:425
                            learning rate: 5e-05                                                                                                       

                            epochs_per_round: 1                                                                                                        

           INFO     Waiting for tasks...                                                                                                     collaborator.py:178
           INFO     Sending tasks to collaborator 3 for round 0                                                                                aggregator.py:312
           INFO     Received the following tasks: ['aggregated_model_validation', 'train', 'locally_tuned_model_validation']                 collaborator.py:168
[20:09:30] INFO     Using TaskRunner subclassing API                                                                                         collaborator.py:253
********************
Starting validation :
********************
Looping over validation data:   0%|                                                                                             | 0/1 [00:02<?, ?it/s] 
Traceback (most recent call last):
  File ".\FeTS_Challenge.py", line 584, in <module>
    restore_from_checkpoint_folder = restore_from_checkpoint_folder)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\experiment.py", line 468, in run_challenge_experiment
    collaborators[col].run_simulation()
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\component\collaborator\collaborator.py", line 170, in run_simulation
    self.do_task(task, round_number)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\component\collaborator\collaborator.py", line 259, in do_task 
    **kwargs)
  File "C:\Users\15702\.local\workspace\src\fets_challenge_model.py", line 48, in validate
    mode="validation")
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\forward_pass.py", line 276, in validate_network       
    result = step(model, image, label, params, train=True)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\step.py", line 88, in step
    loss, metric_output = get_loss_and_metrics(image, label, output, params)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\loss_and_metric.py", line 141, in get_loss_and_metrics
    metric_function, predicted, ground_truth, params
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\loss_and_metric.py", line 13, in get_metric_output    
    metric_output = metric_function(predicted, ground_truth, params).detach().cpu()
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\metrics\segmentation.py", line 42, in multi_class_dice        
    if i != params["model"]["ignore_label_validation"]:
KeyError: 'ignore_label_validation'

solution override the plan.yaml as shown below set to false

overrides = {
    'aggregator.settings.rounds_to_train': rounds_to_train,
    'aggregator.settings.db_store_rounds': db_store_rounds,
    'tasks.train.aggregation_type': aggregation_wrapper,
    'task_runner.settings.device': device,
    'task_runner.settings.fets_config_dict.data_preprocessing': {},
    'task_runner.settings.fets_config_dict.model.ignore_label_validation': False
}

RAM issues

File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\torch\nn\functional.py", line 2186, in instance_norm
input, weight, bias, running_mean, running_var, use_input_stats, momentum, eps, torch.backends.cudnn.enabled
RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:75] data. DefaultCPUAllocator: not enough memory: you tried to allocate 67108864 bytes. Buy new RAM!

[bug report] UnicodeDecodeError occurring in openfl

Describe the bug
A clear and concise description of what the bug is.
During step 8 in https://github.com/FeTS-AI/Challenge/tree/main/Task_1 , a openfl installation error occurred while executing 'openfl @ git+https://github.com/intel/openfl.git@771fc05d57612e2fd0f133ee301f5cd9678cf9d9z', among install_requires in setup.py.

To Reproduce
Steps to reproduce the behavior:

Execute step8 (pip install .)

Challenge/Task_1/README.md

Line 29 in 524d6b9

8. ```pip install .```
Error occurred at line 33 of the setup.py code

Challenge/Task_1/setup.py

Line 31 in 524d6b9

'openfl @ git+https://github.com/intel/openfl.git@f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f',

The following error occurs:

Expected behavior
I found an issue with the same bug in the openfl repository and a commit that improved it.
Please check the link below
securefederatedai/openfl@771fc05

So I took this commit number and modified the path to install openfl in setup.py.

change

Challenge/Task_1/setup.py

Line 31 in 524d6b9

 'openfl @ git+https://github.com/intel/openfl.git@f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f', 

to
'openfl @ git+https://github.com/intel/openfl.git@771fc05d57612e2fd0f133ee301f5cd9678cf9d9',

Desktop (please complete the following information):

OS: window10
Version : 22hz

what’s the pretrain model

the model are based on ResUnet?I use a ResUnet as baseline and load the weight.The result shows it does's not help at all.

Missing preprocessing in dict parameters

OS: WIndows 11
Python 3.7.9

we run python .\FeTS_Challenge.py

we get this error, I understand that we are loading data with the data_loader and its gets a dict with headers built in experiment.py

with task runner

File ".\FeTS_Challenge.py", line 584, in <module>
    restore_from_checkpoint_folder = restore_from_checkpoint_folder)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\fets_challenge\experiment.py", line 292, in run_challenge_experiment
    task_runner = copy(plan).get_task_runner(collaborator_data_loaders[col])
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\federated\plan\plan.py", line 389, in get_task_runner
    self.runner_ = Plan.build(**defaults)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\federated\plan\plan.py", line 182, in build
    instance = getattr(module, class_name)(**settings)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\openfl\federated\task\runner_fets_challenge.py", line 43, in __init__
    model, optimizer, train_loader, val_loader, scheduler, params = create_pytorch_objects(fets_config_dict, train_csv=train_csv, val_csv=val_csv, device=device)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\compute\generic.py", line 48, in create_pytorch_objects
    train_loader = get_train_loader(parameters)
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\data\__init__.py", line 24, in get_train_loader
    loader_type="train",
  File "C:\CodeRepos\MoaniSandbox\FETS-AI\Challenge\Task_1\venv\lib\site-packages\GANDLF\data\ImagesFromDataFrame.py", line 65, in ImagesFromDataFrame
    preprocessing = parameters["data_preprocessing"]
KeyError: 'data_preprocessing'

we print() out the parameters we can see that "data_preprocessing" is actually missing, is this a breaking change with GANDLF?

{'batch_size': 1, 'data_augmentation': {}, 
'data_postprocessing': {},
 'enable_padding': False,
 'in_memory': True,
 'inference_mechanism': {'grid_aggregator_overlap': 'crop', 'patch_overlap': 0},
 'learning_rate': 0.001, 
'loss_function': 'dc',
 'medcam_enabled': False,
 'metrics': ['dice', 'dice_per_label', 'hd95_per_label'],
 'model': {'amp': True, 'architecture': 'resunet', 'base_filters': 32, 'class_list': [0, 1, 2, 4], 'dimension': 3, 'final_layer': 'softmax', 'norm_type': 'instance', 'type': 'torch', 'num_channels': 4, 'num_classes': 4},
 'nested_training': {'testing': 1, 'validation': -5}, 
'num_epochs': 1, 'optimizer': {'type': 'sgd'}, 
'output_dir': '.', 'parallel_compute_command': '', 'patch_sampler': 'label', 
'patch_size': [64, 64, 64], 'patience': 100, 'pin_memory_dataloader': False, 'print_rgb_label_warning': True, 'q_max_length': 100, 'q_num_workers': 0,    restore_from_checkpoint_folder = restore_from_checkpoint_folder)

Error(s) in loading state_dict for FeTSChallengeModel

Thank you for being generous with your time and organizing this challenge in such a polished manner. I followed the instructions exactly as they are detailed in README and went through your notebook, changing only the dataset path to the one corresponding to my own directory where the MICCAI_FeTS2021_TrainingData is stored.
However, when I try to run all cells, I am getting this error at the final cell:

I wonder if anyone else had had this problem so far?

Questions about Task2

Dear Organizers:
Thank you very much for your hard work in organizing this challenge. I have three questions about Task 2:

What are the training data and validation data used in Task 2? Are they the same as Task 1?
Is there an official federate split (data partition for federated training)?
How can you judge whether the model we submitted being trained in a federated way? Since the paper says that "Note that training on pooled data will be allowed in this task, so that the participants can develop methods that optimally exploit the meta-information of data origin." Does this mean training the model with datasets jointed together directly is also allowed in task 2, participants are not restricted to federated learning, but only the test will be done in a federated way?
Could you please kindly answer the questions above when you are free? Thanks a lot!

Session got killed

Hi,

I try to run the code with partition_1.csv, but I got my session killed at very earlier stage of the session, see the screenshot below

Does anyone has a clue on this?

No url found for submodule path 'GANDLF' in .gitmodules

Dear FeTS-AI team,

following the instructions on how to install the infrastructure for Task 1, I encountered the following error for step 8, the pip install step:

~/Challenge/Task_1$ pip install .
Processing ~/Challenge/Task_1
  Preparing metadata (setup.py) ... done
Collecting openfl@ git+https://github.com/intel/openfl.git@f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f (from fets-challenge==2.0)
  Cloning https://github.com/intel/openfl.git (to revision f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f) to /tmp/pip-install-tglnj1g1/openfl_bf417e151ecd4f06983321e60bc4d466
  Running command git clone --filter=blob:none --quiet https://github.com/intel/openfl.git /tmp/pip-install-tglnj1g1/openfl_bf417e151ecd4f06983321e60bc4d466
  Running command git rev-parse -q --verify 'sha^f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f'
  Running command git fetch -q https://github.com/intel/openfl.git f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f
  Running command git checkout -q f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f
  Resolved https://github.com/intel/openfl.git to commit f4b28d710e2be31cdfa7487fdb4e8cb3a1387a5f
  Preparing metadata (setup.py) ... done
Collecting GANDLF@ git+https://github.com/CBICA/GaNDLF.git@e4d0d4bfdf4076130817001a98dfb90189956278 (from fets-challenge==2.0)
  Cloning https://github.com/CBICA/GaNDLF.git (to revision e4d0d4bfdf4076130817001a98dfb90189956278) to /tmp/pip-install-tglnj1g1/gandlf_c4f0036a896d455eae2d9f7d2fd57d46
  Running command git clone --filter=blob:none --quiet https://github.com/CBICA/GaNDLF.git /tmp/pip-install-tglnj1g1/gandlf_c4f0036a896d455eae2d9f7d2fd57d46
  Running command git rev-parse -q --verify 'sha^e4d0d4bfdf4076130817001a98dfb90189956278'
  Running command git fetch -q https://github.com/CBICA/GaNDLF.git e4d0d4bfdf4076130817001a98dfb90189956278
  Running command git checkout -q e4d0d4bfdf4076130817001a98dfb90189956278
  Resolved https://github.com/CBICA/GaNDLF.git to commit e4d0d4bfdf4076130817001a98dfb90189956278
  Running command git submodule update --init --recursive -q
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Collecting fets@ git+https://github.com/FETS-AI/Algorithms.git@fets_challenge (from fets-challenge==2.0)
  Cloning https://github.com/FETS-AI/Algorithms.git (to revision fets_challenge) to /tmp/pip-install-tglnj1g1/fets_a6232463deb046dfa00c89ef47b688d0
  Running command git clone --filter=blob:none --quiet https://github.com/FETS-AI/Algorithms.git /tmp/pip-install-tglnj1g1/fets_a6232463deb046dfa00c89ef47b688d0
  Running command git checkout -b fets_challenge --track origin/fets_challenge
  Switched to a new branch 'fets_challenge'
  Branch 'fets_challenge' set up to track remote branch 'fets_challenge' from 'origin'.
  Resolved https://github.com/FETS-AI/Algorithms.git to commit 60e0b8761229edde18e3d707e3e3e5eb0c0fb80f
  Running command git submodule update --init --recursive -q
  fatal: No url found for submodule path 'GANDLF' in .gitmodules
  error: subprocess-exited-with-error

  × git submodule update --init --recursive -q did not run successfully.
  │ exit code: 128
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git submodule update --init --recursive -q did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

Looking at Algorithms/.gitmodules, I noticed that the GANDLF submodule has been renamed to GANDLF_module a while back.

I got around this by manually cloning openfl, GANDFL and fets from the repos as listed in Challenge/Task_1/setup.py within the install_requires block, checking out the corresponding commits and pip installing these into the env. Then, I commented out the install_requires block and ran pip install . on Challenge/Task_1 again.

My conda env was then still missing the correct GLIB version, erroring out with libstdc++.so.6: version GLIBCXX_3.4.30' not found.
conda install -c conda-forge libgcc=5.2.0 fixed this for me.

Lastly, when installing torchvision, one needs to pin it to an older version, as the newer ones do not provide the required torch.ao module:

pip install torchvision==0.9.1

So, easy fix. Just leave this here in case someone else stumbles upon this. 🙂

Best,
Manu

OpenFL installation error with pip

Hello, I received the following errors when installing OpenFL using pip on windows 10:

ERROR: Could not find a version that satisfies the requirement openfl (from versions: none)
ERROR: No matching distribution found for openfl

my python version is 3.7

Adding manual seed for reproducibility

Dear Organizers,

You may consider adding these lines to the internal codes. I think that it is also good for comparing team results at the end of the challenge. But I am not sure that they are enough :)

torch.manual_seed(torch_manual_seed)
torch.cuda.manual_seed_all(torch_manual_seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

(maybe random_state can be fixed for train_test_split )

Best,
Ece

Originally posted by @eceisik in #80 (comment)

fets-ai / challenge Goto Github PK

challenge's People

Contributors

Stargazers

Watchers

Forkers

challenge's Issues

踩坑两天。超时问题使用ss or clash在控制台代理都不可行，安装全凭运气，因为GO语言安装代理在**是阻塞的。服务器上安装成功了，linux虚拟机失败了。

So, why not use Docker? emmmm...

Recommend Projects

Recommend Topics

Recommend Org