Git Product home page Git Product logo

Comments (4)

szczepanskiNicolas avatar szczepanskiNicolas commented on June 11, 2024 1

Hello, thanks for all information in your previous reply (it really helped me).

The problem is actually solved in a new version (1.0.8), on the git and on pypi.
I've added a unit test for this issue.

However, I've found that the given hyperparameters build a model where all the trees are, in each case, a leaf (for the Iris dataset). This gives us a model where all instances have the same prediction because no conditions on features are used in the trees. In this specific case, explanations are useless.

Please let me know if you still have a problem with this new version.

Best regards

from pyxai.

szczepanskiNicolas avatar szczepanskiNicolas commented on June 11, 2024

Hello, thank you very much for reporting this problem.

We consider that each tree must necessarily return a class for the calculation of a prediction of a given instance. This line of code doesn't solve the problem because other bugs will appear in the calculation of explanations:

root = nodes[0] if 0 in nodes else DecisionNode(1, left=None, right=None)

I don't think the tree is empty, because theoretically, all trees must represent at least one class. On the other hand, it's possible that the tree is just a leaf representing the class. I've seen this happen with boosted trees:

elif "leaf" in tree_JSON:
            # Special case when the tree is just a leaf, this append when no split is realized by the solver, but the weight have to be taken into account
            return LeafNode(tree_JSON["leaf"])

And in this case, a leaf must be created.

In order to solve this problem, I need to reproduce it to obtain additional information. Can you please send me here or privately your model ([email protected])? or reproduce it on a simple public dataset like Iris? Maybe it's enough if you can print all the fields in the sk_raw_tree object?

The problem is that this situation is very rare, I've never had the case on more than 20 datasets tested.

In the meantime, I'll try to reproduce it on my own. I'll keep you informed.

Don't hesitate to contact me if you have any new information or questions.

from pyxai.

atakemura avatar atakemura commented on June 11, 2024

@szczepanskiNicolas Thank you for the reply.

On the other hand, it's possible that the tree is just a leaf representing the class.

Yes, this seems to be the case here.

And in this case, a leaf must be created.

This makes sense to me, so I guess this line should be more like

root = nodes[0] if 0 in nodes else LeafNode(sk_raw_tree.value[0][0])

I've managed to reproduce it with the iris dataset, although the parameters are a bit contrived.
In the original example, I was using the credit australia datasset (https://archive.ics.uci.edu/dataset/143/statlog+australian+credit+approval) after some hyperparameter tuning.

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from pyxai import Learning

iris = load_iris()
features = iris.data
rf = RandomForestClassifier(n_estimators=300, max_depth=8, min_samples_leaf=0.5,
                            min_samples_split=0.3, min_weight_fraction_leaf=0.1, criterion='entropy',
                            random_state=2023)
rf.fit(features, iris.target)

px_learner, px_model = Learning.import_models(rf, feature_names=['sepal_length', 'sepal_width',
                                                                 'petal_length', 'petal_width'])


---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-5a023b46d641> in <module>
     12 rf.fit(features, iris.target)
     13
---> 14 px_learner, px_model = Learning.import_models(rf, feature_names=['sepal_length', 'sepal_width',
     15                                                                  'petal_length', 'petal_width'])
     16

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/Learning.py in import_models(models, feature_names)
    103         for l in learner_information: l.set_feature_names(feature_names)
    104
--> 105     result_output = learner.convert_model(evaluation_output, learner_information)
    106
    107     Tools.verbose("---------------   Explainer   ----------------")

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/learner.py in convert_model(self, output, learner_information)
    264                 return self.to_DT_CLS(self.learner_information)
    265             elif output == EvaluationOutput.RF:
--> 266                 return self.to_RF_CLS(self.learner_information)
    267             elif output == EvaluationOutput.BT:
    268                 return self.to_BT_CLS(self.learner_information)

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/scikitlearn.py in to_RF_CLS(self, learner_information)
    103             for sk_tree in random_forest:
    104                 sk_raw_tree = sk_tree.tree_
--> 105                 decision_trees.append(self.classifier_to_DT(sk_tree, sk_raw_tree, id_solver_results))
    106             random_forests.append(
    107                 RandomForest(decision_trees, n_classes=len(sk_tree.classes_), learner_information=self.learner_information[id_solver_results]))

/opt/conda/envs/pyxai/lib/python3.9/site-packages/pyxai/sources/learning/scikitlearn.py in classifier_to_DT(self, sk_tree, sk_raw_tree, id_solver_results)
    140                 nodes[i].right = nodes[id_right] if id_right in nodes else LeafNode(numpy.argmax(sk_raw_tree.value[id_right][0]))
    141
--> 142         root = nodes[0] if 0 in nodes else DecisionNode(1, 0, sk_raw_tree.value[0][0])
    143         return DecisionTree(sk_tree.n_features_in_, root, sk_tree.classes_, id_solver_results=id_solver_results,
    144                             learner_information=self.learner_information[id_solver_results])

TypeError: __init__() takes 2 positional arguments but 4 were given
scikit-learn                       1.3.2
numpy                              1.20.3

fields in the sk_raw_tree object

sk_raw_tree.feature
Out[1]: array([-2], dtype=int64)
sk_raw_tree.value
Out[2]: array([[[49., 51., 50.]]])
sk_raw_tree.children_left
Out[3]: array([-1], dtype=int64)
sk_raw_tree.children_right
Out[4]: array([-1], dtype=int64)
sk_raw_tree.n_leaves
Out[5]: 1
sk_raw_tree.node_count
Out[6]: 1

Thanks!

from pyxai.

atakemura avatar atakemura commented on June 11, 2024

@szczepanskiNicolas I can confirm that this issue is now solved in 1.0.8. Thank you for the update!

from pyxai.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.