When you have an empty tree in a Random Forest model, Learni

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Dealing with empty trees in Random Forests about pyxai HOT 4 CLOSED

atakemura commented on June 11, 2024

Dealing with empty trees in Random Forests

from pyxai.

Comments (4)

szczepanskiNicolas commented on June 11, 2024 1

Hello, thanks for all information in your previous reply (it really helped me).

The problem is actually solved in a new version (1.0.8), on the git and on pypi.
I've added a unit test for this issue.

However, I've found that the given hyperparameters build a model where all the trees are, in each case, a leaf (for the Iris dataset). This gives us a model where all instances have the same prediction because no conditions on features are used in the trees. In this specific case, explanations are useless.

Please let me know if you still have a problem with this new version.

Best regards

from pyxai.

szczepanskiNicolas commented on June 11, 2024

Hello, thank you very much for reporting this problem.

We consider that each tree must necessarily return a class for the calculation of a prediction of a given instance. This line of code doesn't solve the problem because other bugs will appear in the calculation of explanations:

root = nodes[0] if 0 in nodes else DecisionNode(1, left=None, right=None)

I don't think the tree is empty, because theoretically, all trees must represent at least one class. On the other hand, it's possible that the tree is just a leaf representing the class. I've seen this happen with boosted trees:

elif "leaf" in tree_JSON:
            # Special case when the tree is just a leaf, this append when no split is realized by the solver, but the weight have to be taken into account
            return LeafNode(tree_JSON["leaf"])

And in this case, a leaf must be created.

In order to solve this problem, I need to reproduce it to obtain additional information. Can you please send me here or privately your model ([email protected])? or reproduce it on a simple public dataset like Iris? Maybe it's enough if you can print all the fields in the sk_raw_tree object?

The problem is that this situation is very rare, I've never had the case on more than 20 datasets tested.

In the meantime, I'll try to reproduce it on my own. I'll keep you informed.

Don't hesitate to contact me if you have any new information or questions.

from pyxai.

atakemura commented on June 11, 2024

@szczepanskiNicolas Thank you for the reply.

On the other hand, it's possible that the tree is just a leaf representing the class.

Yes, this seems to be the case here.

And in this case, a leaf must be created.

This makes sense to me, so I guess this line should be more like

root = nodes[0] if 0 in nodes else LeafNode(sk_raw_tree.value[0][0])

I've managed to reproduce it with the iris dataset, although the parameters are a bit contrived.
In the original example, I was using the credit australia datasset (https://archive.ics.uci.edu/dataset/143/statlog+australian+credit+approval) after some hyperparameter tuning.

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from pyxai import Learning

iris = load_iris()
features = iris.data
rf = RandomForestClassifier(n_estimators=300, max_depth=8, min_samples_leaf=0.5,
                            min_samples_split=0.3, min_weight_fraction_leaf=0.1, criterion='entropy',
                            random_state=2023)
rf.fit(features, iris.target)

px_learner, px_model = Learning.import_models(rf, feature_names=['sepal_length', 'sepal_width',
                                                                 'petal_length', 'petal_width'])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-5a023b46d641> in <module>
     12 rf.fit(features, iris.target)
     13
---> 14 px_learner, px_model = Learning.import_models(rf, feature_names=['sepal_length', 'sepal_width',
     15                                                                  'petal_length', 'petal_width'])
     16

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/Learning.py in import_models(models, feature_names)
    103         for l in learner_information: l.set_feature_names(feature_names)
    104
--> 105     result_output = learner.convert_model(evaluation_output, learner_information)
    106
    107     Tools.verbose("---------------   Explainer   ----------------")

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/learner.py in convert_model(self, output, learner_information)
    264                 return self.to_DT_CLS(self.learner_information)
    265             elif output == EvaluationOutput.RF:
--> 266                 return self.to_RF_CLS(self.learner_information)
    267             elif output == EvaluationOutput.BT:
    268                 return self.to_BT_CLS(self.learner_information)

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/scikitlearn.py in to_RF_CLS(self, learner_information)
    103             for sk_tree in random_forest:
    104                 sk_raw_tree = sk_tree.tree_
--> 105                 decision_trees.append(self.classifier_to_DT(sk_tree, sk_raw_tree, id_solver_results))
    106             random_forests.append(
    107                 RandomForest(decision_trees, n_classes=len(sk_tree.classes_), learner_information=self.learner_information[id_solver_results]))

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/scikitlearn.py in classifier_to_DT(self, sk_tree, sk_raw_tree, id_solver_results)
    140                 nodes[i].right = nodes[id_right] if id_right in nodes else LeafNode(numpy.argmax(sk_raw_tree.value[id_right][0]))
    141
--> 142         root = nodes[0] if 0 in nodes else DecisionNode(1, 0, sk_raw_tree.value[0][0])
    143         return DecisionTree(sk_tree.n_features_in_, root, sk_tree.classes_, id_solver_results=id_solver_results,
    144                             learner_information=self.learner_information[id_solver_results])

TypeError: __init__() takes 2 positional arguments but 4 were given

scikit-learn                       1.3.2
numpy                              1.20.3

fields in the sk_raw_tree object

sk_raw_tree.feature
Out[1]: array([-2], dtype=int64)
sk_raw_tree.value
Out[2]: array([[[49., 51., 50.]]])
sk_raw_tree.children_left
Out[3]: array([-1], dtype=int64)
sk_raw_tree.children_right
Out[4]: array([-1], dtype=int64)
sk_raw_tree.n_leaves
Out[5]: 1
sk_raw_tree.node_count
Out[6]: 1

Thanks!

from pyxai.

atakemura commented on June 11, 2024

@szczepanskiNicolas I can confirm that this issue is now solved in 1.0.8. Thank you for the update!

from pyxai.

Dealing with empty trees in Random Forests about pyxai HOT 4 CLOSED

Comments (4)

Related Issues (10)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent