Comments (4)
Hello, thanks for all information in your previous reply (it really helped me).
The problem is actually solved in a new version (1.0.8), on the git and on pypi.
I've added a unit test for this issue.
However, I've found that the given hyperparameters build a model where all the trees are, in each case, a leaf (for the Iris dataset). This gives us a model where all instances have the same prediction because no conditions on features are used in the trees. In this specific case, explanations are useless.
Please let me know if you still have a problem with this new version.
Best regards
from pyxai.
Hello, thank you very much for reporting this problem.
We consider that each tree must necessarily return a class for the calculation of a prediction of a given instance. This line of code doesn't solve the problem because other bugs will appear in the calculation of explanations:
root = nodes[0] if 0 in nodes else DecisionNode(1, left=None, right=None)
I don't think the tree is empty, because theoretically, all trees must represent at least one class. On the other hand, it's possible that the tree is just a leaf representing the class. I've seen this happen with boosted trees:
elif "leaf" in tree_JSON:
# Special case when the tree is just a leaf, this append when no split is realized by the solver, but the weight have to be taken into account
return LeafNode(tree_JSON["leaf"])
And in this case, a leaf must be created.
In order to solve this problem, I need to reproduce it to obtain additional information. Can you please send me here or privately your model ([email protected])? or reproduce it on a simple public dataset like Iris? Maybe it's enough if you can print all the fields in the sk_raw_tree object?
The problem is that this situation is very rare, I've never had the case on more than 20 datasets tested.
In the meantime, I'll try to reproduce it on my own. I'll keep you informed.
Don't hesitate to contact me if you have any new information or questions.
from pyxai.
@szczepanskiNicolas Thank you for the reply.
On the other hand, it's possible that the tree is just a leaf representing the class.
Yes, this seems to be the case here.
And in this case, a leaf must be created.
This makes sense to me, so I guess this line should be more like
root = nodes[0] if 0 in nodes else LeafNode(sk_raw_tree.value[0][0])
I've managed to reproduce it with the iris
dataset, although the parameters are a bit contrived.
In the original example, I was using the credit australia datasset (https://archive.ics.uci.edu/dataset/143/statlog+australian+credit+approval) after some hyperparameter tuning.
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from pyxai import Learning
iris = load_iris()
features = iris.data
rf = RandomForestClassifier(n_estimators=300, max_depth=8, min_samples_leaf=0.5,
min_samples_split=0.3, min_weight_fraction_leaf=0.1, criterion='entropy',
random_state=2023)
rf.fit(features, iris.target)
px_learner, px_model = Learning.import_models(rf, feature_names=['sepal_length', 'sepal_width',
'petal_length', 'petal_width'])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-5a023b46d641> in <module>
12 rf.fit(features, iris.target)
13
---> 14 px_learner, px_model = Learning.import_models(rf, feature_names=['sepal_length', 'sepal_width',
15 'petal_length', 'petal_width'])
16
/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/Learning.py in import_models(models, feature_names)
103 for l in learner_information: l.set_feature_names(feature_names)
104
--> 105 result_output = learner.convert_model(evaluation_output, learner_information)
106
107 Tools.verbose("--------------- Explainer ----------------")
/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/learner.py in convert_model(self, output, learner_information)
264 return self.to_DT_CLS(self.learner_information)
265 elif output == EvaluationOutput.RF:
--> 266 return self.to_RF_CLS(self.learner_information)
267 elif output == EvaluationOutput.BT:
268 return self.to_BT_CLS(self.learner_information)
/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/scikitlearn.py in to_RF_CLS(self, learner_information)
103 for sk_tree in random_forest:
104 sk_raw_tree = sk_tree.tree_
--> 105 decision_trees.append(self.classifier_to_DT(sk_tree, sk_raw_tree, id_solver_results))
106 random_forests.append(
107 RandomForest(decision_trees, n_classes=len(sk_tree.classes_), learner_information=self.learner_information[id_solver_results]))
/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/scikitlearn.py in classifier_to_DT(self, sk_tree, sk_raw_tree, id_solver_results)
140 nodes[i].right = nodes[id_right] if id_right in nodes else LeafNode(numpy.argmax(sk_raw_tree.value[id_right][0]))
141
--> 142 root = nodes[0] if 0 in nodes else DecisionNode(1, 0, sk_raw_tree.value[0][0])
143 return DecisionTree(sk_tree.n_features_in_, root, sk_tree.classes_, id_solver_results=id_solver_results,
144 learner_information=self.learner_information[id_solver_results])
TypeError: __init__() takes 2 positional arguments but 4 were given
scikit-learn 1.3.2
numpy 1.20.3
fields in the sk_raw_tree object
sk_raw_tree.feature
Out[1]: array([-2], dtype=int64)
sk_raw_tree.value
Out[2]: array([[[49., 51., 50.]]])
sk_raw_tree.children_left
Out[3]: array([-1], dtype=int64)
sk_raw_tree.children_right
Out[4]: array([-1], dtype=int64)
sk_raw_tree.n_leaves
Out[5]: 1
sk_raw_tree.node_count
Out[6]: 1
Thanks!
from pyxai.
@szczepanskiNicolas I can confirm that this issue is now solved in 1.0.8. Thank you for the update!
from pyxai.
Related Issues (10)
- Command docker
- pyXAI GUI crashing on Debian 12 HOT 5
- Classification: Inconsistent labelling of learner predictions HOT 2
- Linux GitHub Installation Problem HOT 1
- Virtual Machines (WSL, Docker, ..) do not support PyQT6 (for the PyXAI's GUI) HOT 1
- When using the explainer, labels may be displayed as int instead of str HOT 3
- Bug in minimal_tree_specific_reason() in BT for time series with multi class classification problem
- XGBOOSTClassifier
- Add support for ARM architecture on Linux
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyxai.