Comments (4)
Hi Jose,
There are different reasons and different solutions to train multiple models on the same (or similar dataset). Can you give details of your objective?
For example, if you are re-training multiple models to test hyper-parameters, or to create an ensemble of sub-models, there is no generic solution to avoid re-training the full models.
However, if you want to update an existing model with a small amount of new training examples, there are existing works. Take a look at the "online learning" domain. Currently, YDF does not offer any direct solution for that.
However, and while this is likely not as good as an online learning method, you can either resume training of a model on a new dataset, or ensemble a set of models, each trained on a different snapshot of data.
from yggdrasil-decision-forests.
Hi Mathieu
Thank for the reply.
It's the second option that I'm interested in, updating an existing model several times with small amounts of new training examples. I'd be interested to look into the possibility you mention regarding resuming training of a model with a new dataset. I assume the new dataset would be only the small extra training examples. Can you point me to the code that does this?
from yggdrasil-decision-forests.
I assume the new dataset would be only the small extra training examples.
This will depend on your problem, the size of the dataset and the type of learning algorithm.
It might be that you need to include some of the past data (not all, otherwise, you don't have any benefit).
You need to experiment to figure what works.
This is something I have little experience with, but here is what I would try:
-
Look at the literature if there are some published meta-methods that work on top of a standard learning algorithm.
-
With TF-DF; If you use GBT
Set the temp argument (e.g. temp_directory="/tmp/training_cache") and enable resuming training (try_resume_training=True).
Train 200 trees on the first dataset (i.e. model.fit(first_dataset)) and then train an extra 200 trees on the second dataset:
model.learner_params["num_trees"] = 200 + 200 # Add an extra 200 trees
model.fit(second_dataset).
- With both GBT and RF:
Train the two models independently on the different datasets, and then combine them (e.g. average the predictions) using the keras functional API. Look at the composition colab for some example of model composition.
from yggdrasil-decision-forests.
Hi Mathieu
Thanks for the suggestions. I'll think through these...
from yggdrasil-decision-forests.
Related Issues (20)
- Go module buried in this repo prevents the module importing properly HOT 14
- Go module only supports binary classification HOT 1
- Go model serving does not support DISCRETIZED_NUMERICAL HOT 4
- Running quick Scorer Extended Model HOT 2
- Windows Build Fails - Compiling .cc files results in syntax error HOT 2
- Missing whitespace on page /cli_install.html and please use sudo in build_binary_release.sh HOT 5
- Don't pollute my home !! HOT 1
- minor typo on page /intro_df.html HOT 1
- Cannot import ydf from windows vsc HOT 5
- Cannot compile standalone example on macOS HOT 1
- No aarch64 wheel or source distro HOT 2
- rich reports not rendering graphs in vscode HOT 3
- Cannot use 'discretize_numerical_columns' in tuner HOT 1
- Loading big models is slow HOT 9
- On MacOSX, Mac M Hardware (ARM), a segmentation fault happened with YDF when pyarrow is installed HOT 8
- MHLD_OBLIQUE is unknown while mentioned in the documentation HOT 2
- porting an example from tensorflow HOT 4
- `to_tensorflow_function()` fails if added to the quickstart HOT 4
- import ydf returns error: ydf.so (no such file) and ydf.so (not a mach-o file) HOT 1
- `tree_plot.html()` Fails HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yggdrasil-decision-forests.