Comments (10)
Added for classification and regression. Is this also needed for probability prediction and survival?
from ranger.
Thanks! I personally need it only for regression at the moment, but I guess it would be consistent to have this for all forests.
from ranger.
this may be related, so I'm posting it here. The out of bag predictions in the ranger class object $predictions, are these also majority votes (for classification) or averages (for regression) ?
I would be interested in all oob predictions from all trees.
from ranger.
Sorry for the delay. Yes these are also majority votes or averages. I will try to set the predict.all option for growing, too.
from ranger.
Is it possible for the predict method to predict with a fewer number of trees than the original model? This may be useful for model selection.
from ranger.
You could use predict.all = TRUE
and compute manual majority votes or averages on a subset of the trees.
from ranger.
Hi, I was wondering if "predict.all" could work for probability trees as wel, i.e. a probability result per tree. This would be useful for statistical stability analysis via sub-sampling and aggregating results as well as for easy estimation of the optimal tree number (similar to the initial post). Thanks.
from ranger.
Yes that's possible and the reason why this issue is still open. I'll definitely add it for probability prediction and survival.
from ranger.
Ok, great.
I devised an interim solution by slicing the forest into smaller sub-forest and making individual predictions on these sub-forests. One can then aggregate a number of sub-predictions for robustness studies etc.
A quick implementation reads as follows
# the original forest is called "rf"
slice.size <- 10 # size of sub-forests
slice.number <- ceiling(rf$num.trees/slice.size) # number of sub-forests
rf_slice <- list()
for(slice.index in 1:slice.number) {
print(paste("Predicting slice",slice.index,"of",length(slice.number)))
From <- (slice.index - 1 )* slice.size + 1
To <- min(slice.index * slice.size,rf$num.trees)
rf_temp <- rf # copy forest object
# extract "slice.size" trees from original forest
rf_temp$num.trees <- slice.size
rf_temp$forest$child.nodeIDs <- rf$forest$child.nodeIDs[From:To]
rf_temp$forest$split.values <- rf$forest$split.values[From:To]
rf_temp$forest$split.varIDs <- rf$forest$split.varIDs[From:To]
rf_temp$forest$terminal.class.counts <- rf$forest$terminal.class.counts[From:To]
rf_temp$forest$num.trees <- slice.size
rf_slice[[slice.index]] <- rf_temp
}
# Prediction
Prediction_slice <- list()
for(slice in 1:length(rf_slice)) {
print(paste("Predicting slice",slice,"of",length(rf_slice)))
Prediction_temp <- predict(object = rf_slice[[slice]], data = testFeatures, num.threads = 1, predict.all = F)$prediction[,2] # extracts only probability of second class
Prediction_slice[[slice]] <- Prediction_temp
}
Prediction_slice <- do.call(cbind,Prediction_slice) # results in a matrix of predictions
I opted for a slice.size of 10, i.e. each of the sub-forests contains 10 trees. Note that this piece of code only extracts the probabilities of the second class - however extension to more classes is straightforward.
Marvin, do you think this is a possible (quick) solution?
from ranger.
Done.
from ranger.
Related Issues (20)
- Add C++14 specification (`std::make_unique` is only avaiable from C++14 onwards) HOT 2
- classProbs are not in line with the predicted label HOT 4
- Trees summary statistics: height, splits HOT 2
- Matrices without colnames. HOT 2
- A check on inbag size would be nice
- Feature Request: inclusion of the trivial random forest model HOT 2
- compilation failed for package 'ranger' HOT 2
- Clarify Gini index calculation HOT 2
- What is the difference between case.weights, class.weights and sample.fraction? HOT 1
- Identifying out-of-bag observations for each tree HOT 4
- parallel execution of importance_pvalues HOT 1
- Simple fix for installation issue found? HOT 1
- consistent errors with ranger on MacBook Pro Sonoma 14.1.1 HOT 1
- Create a list of forks, extensions etc.
- Multicollinearity and Variable Importance HOT 2
- Error: Too many levels in unordered categorical variable. Only 63 levels allowed on this system. Ranger will EXIT now. HOT 3
- Ranger does not throw an error when predicting with a categorical variable with new/unseen categories HOT 1
- Major Performance Degradation when mtry Parameter is Large HOT 2
- FindBestSplit of C++ version may not find the best split HOT 1
- Question about Calculation of Feature Importance in Regression
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ranger.