Comments (4)
Thank you very much @sharkovsky, your solution works pretty well! I comment on some minor modifications that you need to add to YourTrainingPlan class:
- Add imports to your notebook because you copy two functions from
_torchnn.py
from typing import Any, Dict, List, Tuple, OrderedDict, Optional, Union, Iterator
from fedbiomed.common.training_plans._training_iterations import MiniBatchTrainingIterationsAccountant
from fedbiomed.common.logger import logger
from fedbiomed.common.training_plans._base_training_plan import BaseTrainingPlan
ModelInputType = Union[torch.Tensor, Dict, List, Tuple]
- Add those imports to your training plan
init_dependencies
function
def init_dependencies(self):
deps = [
...
'from typing import Any, Dict, List, Tuple, OrderedDict, Optional, Union, Iterator',
'ModelInputType = Union[torch.Tensor, Dict, List, Tuple]',
'from fedbiomed.common.training_plans._training_iterations import MiniBatchTrainingIterationsAccountant',
'from fedbiomed.common.logger import logger',
'from fedbiomed.common.training_plans._base_training_plan import BaseTrainingPlan',
]
return deps
- Concerning the step 6 from the initial issue, I added code to the
training_routine
function inside the if statement that checksif history_monitor is not None
, to iterate through my custom metrics dictionary using_create_metric_result_dict
from theBaseTrainingPlan
class:
for metric_name, metric_value in metrics.items():
m_dict = BaseTrainingPlan._create_metric_result_dict(metric=metric_value, metric_name=metric_name)
history_monitor.add_scalar(
metric=m_dict,
iteration=num_iter,
epoch=epoch_to_report,
train=True,
num_samples_trained=num_samples,
num_batches=num_iter_max,
total_samples=num_samples_max, batch_samples=batch_size
)
from fedbiomed.
Amazing work @manos-mark , thank you for getting back to us!
it was not an easy customization, congrats on making it!
Do you think we can close the issue?
from fedbiomed.
Thanks for your kind words, but you provided all the instructions 😄
From my point of view, the issue is resolved, but there is still space for improvement. Right now, the training metrics are plotted per batch but the validation metrics are plotted per global epoch. So, the training metrics on each batch should be averaged to be able to plot them next to the validation metrics. I am not sure if my explanation is very clear, so I will try to implement the code soon and provide it in the following comments.
from fedbiomed.
Yeah, there is a lot of space for improvement for our metrics reporting in general...
I will close this issue for now, but I'll be happy to receive a pull request from you in case you want to extend/improve Fed-BioMed's reporting some time in the future 😄 😄
from fedbiomed.
Related Issues (20)
- Create message types for additive secret sharing HOT 1
- `SecaggSetup` (node) implementation for additive secret sharing on node
- Create researcher `SecaggAdditiveKeyContext` to launch the setup phase for JL secagg using additive secret sharing
- Implement node endpoint for N2N message to handle `AddtiveSSharingRequest`, `AdditiveSSharingReply` in `fedbiomed/node/request/_n2n_controller.py`
- Merge all the tasks and test additive secret sharing
- Create researcher `SecaggKeyContext ` for additive secret sharing in `fedbiomed/researcher/secagg/secagg_context` HOT 1
- Researcher notebook requires authentication HOT 1
- Secure node to node communication for honest but curious scenario HOT 1
- Handle the request `secagg-additive-ss-setup-request` in the `Node` class
- Nonce security in LOM secure aggregation
- Implement `serialize` and `desearialize` methods for Message classes
- Remove MP-SPDZ dependency
- Design of secure node to node communication for honest but curious scenario doing
- Use symmetric encryption for node to node communications
- Unified interface to send messages on node side
- [New issue]: Redesign `nodes.requests` module
- batch_size issue
- Improve checks for `Message` class
- Experiment run returns unclear message if given node id is not existing in gRPC server
- LOM secure aggregation fails with 10+ nodes
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fedbiomed.