Git Product home page Git Product logo

Comments (4)

achoum avatar achoum commented on May 4, 2024

Hi Jose,

There are different reasons and different solutions to train multiple models on the same (or similar dataset). Can you give details of your objective?

For example, if you are re-training multiple models to test hyper-parameters, or to create an ensemble of sub-models, there is no generic solution to avoid re-training the full models.

However, if you want to update an existing model with a small amount of new training examples, there are existing works. Take a look at the "online learning" domain. Currently, YDF does not offer any direct solution for that.

However, and while this is likely not as good as an online learning method, you can either resume training of a model on a new dataset, or ensemble a set of models, each trained on a different snapshot of data.

from yggdrasil-decision-forests.

JoseAF avatar JoseAF commented on May 4, 2024

Hi Mathieu

Thank for the reply.

It's the second option that I'm interested in, updating an existing model several times with small amounts of new training examples. I'd be interested to look into the possibility you mention regarding resuming training of a model with a new dataset. I assume the new dataset would be only the small extra training examples. Can you point me to the code that does this?

from yggdrasil-decision-forests.

achoum avatar achoum commented on May 4, 2024

I assume the new dataset would be only the small extra training examples.
This will depend on your problem, the size of the dataset and the type of learning algorithm.
It might be that you need to include some of the past data (not all, otherwise, you don't have any benefit).

You need to experiment to figure what works.
This is something I have little experience with, but here is what I would try:

  1. Look at the literature if there are some published meta-methods that work on top of a standard learning algorithm.

  2. With TF-DF; If you use GBT

Set the temp argument (e.g. temp_directory="/tmp/training_cache") and enable resuming training (try_resume_training=True).
Train 200 trees on the first dataset (i.e. model.fit(first_dataset)) and then train an extra 200 trees on the second dataset:

model.learner_params["num_trees"] = 200 + 200 # Add an extra 200 trees
model.fit(second_dataset).

  1. With both GBT and RF:

Train the two models independently on the different datasets, and then combine them (e.g. average the predictions) using the keras functional API. Look at the composition colab for some example of model composition.

from yggdrasil-decision-forests.

JoseAF avatar JoseAF commented on May 4, 2024

Hi Mathieu

Thanks for the suggestions. I'll think through these...

from yggdrasil-decision-forests.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.