An issue raised while I was trying to plot training curves with Tensorboard on trainin

Thank you very much <a class="user-mention notranslate" data-hovercard-type="user" dat

Amazing work <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Training-Validation curves on Tensorboard about fedbiomed HOT 4 CLOSED

manos-mark commented on September 26, 2024

Training-Validation curves on Tensorboard

from fedbiomed.

Comments (4)

manos-mark commented on September 26, 2024 1

Thank you very much @sharkovsky, your solution works pretty well! I comment on some minor modifications that you need to add to YourTrainingPlan class:

Add imports to your notebook because you copy two functions from _torchnn.py

from typing import Any, Dict, List, Tuple, OrderedDict, Optional, Union, Iterator
from fedbiomed.common.training_plans._training_iterations import MiniBatchTrainingIterationsAccountant
from fedbiomed.common.logger import logger
from fedbiomed.common.training_plans._base_training_plan import BaseTrainingPlan

ModelInputType = Union[torch.Tensor, Dict, List, Tuple]

Add those imports to your training plan init_dependencies function

def init_dependencies(self):
    deps = [
        ...           
        'from typing import Any, Dict, List, Tuple, OrderedDict, Optional, Union, Iterator',
        'ModelInputType = Union[torch.Tensor, Dict, List, Tuple]',
        'from fedbiomed.common.training_plans._training_iterations import MiniBatchTrainingIterationsAccountant',
        'from fedbiomed.common.logger import logger',
        'from fedbiomed.common.training_plans._base_training_plan import BaseTrainingPlan',
    ]
    return deps

Concerning the step 6 from the initial issue, I added code to the training_routine function inside the if statement that checks if history_monitor is not None, to iterate through my custom metrics dictionary using _create_metric_result_dict from the BaseTrainingPlan class:

for metric_name, metric_value in metrics.items():
    m_dict = BaseTrainingPlan._create_metric_result_dict(metric=metric_value, metric_name=metric_name)
    history_monitor.add_scalar(
        metric=m_dict,
        iteration=num_iter,
        epoch=epoch_to_report,
        train=True,
        num_samples_trained=num_samples,
        num_batches=num_iter_max,
        total_samples=num_samples_max, batch_samples=batch_size
    )

from fedbiomed.

sharkovsky commented on September 26, 2024 1

Amazing work @manos-mark , thank you for getting back to us!
it was not an easy customization, congrats on making it!

Do you think we can close the issue?

from fedbiomed.

manos-mark commented on September 26, 2024 1

Thanks for your kind words, but you provided all the instructions 😄

From my point of view, the issue is resolved, but there is still space for improvement. Right now, the training metrics are plotted per batch but the validation metrics are plotted per global epoch. So, the training metrics on each batch should be averaged to be able to plot them next to the validation metrics. I am not sure if my explanation is very clear, so I will try to implement the code soon and provide it in the following comments.

from fedbiomed.

sharkovsky commented on September 26, 2024 1

Yeah, there is a lot of space for improvement for our metrics reporting in general...
I will close this issue for now, but I'll be happy to receive a pull request from you in case you want to extend/improve Fed-BioMed's reporting some time in the future 😄 😄

from fedbiomed.

Training-Validation curves on Tensorboard about fedbiomed HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent