Git Product home page Git Product logo

Comments (4)

manos-mark avatar manos-mark commented on September 26, 2024 1

Thank you very much @sharkovsky, your solution works pretty well! I comment on some minor modifications that you need to add to YourTrainingPlan class:

  1. Add imports to your notebook because you copy two functions from _torchnn.py
from typing import Any, Dict, List, Tuple, OrderedDict, Optional, Union, Iterator
from fedbiomed.common.training_plans._training_iterations import MiniBatchTrainingIterationsAccountant
from fedbiomed.common.logger import logger
from fedbiomed.common.training_plans._base_training_plan import BaseTrainingPlan

ModelInputType = Union[torch.Tensor, Dict, List, Tuple]
  1. Add those imports to your training plan init_dependencies function
def init_dependencies(self):
    deps = [
        ...           
        'from typing import Any, Dict, List, Tuple, OrderedDict, Optional, Union, Iterator',
        'ModelInputType = Union[torch.Tensor, Dict, List, Tuple]',
        'from fedbiomed.common.training_plans._training_iterations import MiniBatchTrainingIterationsAccountant',
        'from fedbiomed.common.logger import logger',
        'from fedbiomed.common.training_plans._base_training_plan import BaseTrainingPlan',
    ]
    return deps
  1. Concerning the step 6 from the initial issue, I added code to the training_routine function inside the if statement that checks if history_monitor is not None, to iterate through my custom metrics dictionary using _create_metric_result_dict from the BaseTrainingPlan class:
for metric_name, metric_value in metrics.items():
    m_dict = BaseTrainingPlan._create_metric_result_dict(metric=metric_value, metric_name=metric_name)
    history_monitor.add_scalar(
        metric=m_dict,
        iteration=num_iter,
        epoch=epoch_to_report,
        train=True,
        num_samples_trained=num_samples,
        num_batches=num_iter_max,
        total_samples=num_samples_max, batch_samples=batch_size
    )

from fedbiomed.

sharkovsky avatar sharkovsky commented on September 26, 2024 1

Amazing work @manos-mark , thank you for getting back to us!
it was not an easy customization, congrats on making it!

Do you think we can close the issue?

from fedbiomed.

manos-mark avatar manos-mark commented on September 26, 2024 1

Thanks for your kind words, but you provided all the instructions 😄

From my point of view, the issue is resolved, but there is still space for improvement. Right now, the training metrics are plotted per batch but the validation metrics are plotted per global epoch. So, the training metrics on each batch should be averaged to be able to plot them next to the validation metrics. I am not sure if my explanation is very clear, so I will try to implement the code soon and provide it in the following comments.

from fedbiomed.

sharkovsky avatar sharkovsky commented on September 26, 2024 1

Yeah, there is a lot of space for improvement for our metrics reporting in general...
I will close this issue for now, but I'll be happy to receive a pull request from you in case you want to extend/improve Fed-BioMed's reporting some time in the future 😄 😄

from fedbiomed.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.