vidalt / ba-trees Goto Github PK

Born-Again Tree Ensembles: Transforms a random forest into a single, minimal-size, tree with exactly the same prediction function in the entire feature space (ICML 2020).

Home Page: https://arxiv.org/pdf/2003.11132.pdf

License: MIT License

C++ 66.76% Makefile 1.35% Shell 0.45% Python 31.44%

ba-trees's People

Contributors

Stargazers

Watchers

Forkers

gassantos chebuu jyxxhyx feliferr zakaria010

ba-trees's Issues

Build classifier

Hi. I am very impressed with your paper. So, I tried to apply your algorithm to my data, but I ran into a problem. Could you please elaborate on the function arguments corresponding to build_tree(persistence.py) in your code?. We would really appreciate if you could provide sample code using example data.

Function to save Random Forest model to txt file.

After reading some issues I realized that would be useful to share a function that I made while working with your project.
This function receives a sklearn Random Forest class and read each tree writing a list with all info.

It is possible that it is not 100% correct.

def model_to_txt(self, index, show: bool = True, save: bool = False):
    # https://scikit-learn.org/stable/auto_examples/tree/plot_unveil_tree_structure.html#sphx-glr-auto-examples-tree-plot-unveil-tree-structure-py
    forest = self.estimators_
    model_info = list()
    model_info.append(
        f"DATASET_NAME: {config['DATASET']['NAME']}.train{index}.csv"
        f"\nENSEMBLE: RF"
        f"\nNB_TREES: {len(forest)}"
        f"\nNB_FEATURES: {forest[0].tree_.n_features}"
        f"\nNB_CLASSES: {forest[0].tree_.n_classes[0]}"
        f"\nMAX_TREE_DEPTH: {forest[0].tree_.max_depth}"
        "\nFormat: node / node type (LN - leave node, IN - internal node) "
        "left child / right child / feature / threshold / node_depth / "
        "majority class (starts with index 0)"
    )
    for tree_idx, est in enumerate(forest):
        tree = est.tree_
        n_nodes = tree.node_count
        children_left = tree.children_left
        children_right = tree.children_right

        node_depth = np.zeros(shape=n_nodes, dtype=np.int64)
        is_leaves = np.zeros(shape=n_nodes, dtype=bool)
        stack = [(0, 0)]  # start with the root node id (0) and its depth (0)
        model_info.append(f"\n\n[TREE {tree_idx}]\nNB_NODES: {n_nodes}")
        while len(stack) > 0:
            node_id, depth = stack.pop()
            node_depth[node_id] = depth

            if children_left[node_id] != children_right[node_id]:
                stack.append((children_left[node_id], depth + 1))
                stack.append((children_right[node_id], depth + 1))
            else:
                is_leaves[node_id] = True
        for i in range(n_nodes):
            class_idx = np.argmax(tree.value[i][0])
            if is_leaves[i]:
                model_info.append(f"\n{i} LN -1 -1 -1 -1 {node_depth[i]} {class_idx}")
            else:
                model_info.append(
                    f"\n{i} IN {children_left[i]} {children_right[i]} "
                    f"{tree.feature[i]} {tree.threshold[i]} {node_depth[i]} -1"
                )
    model_info.append("\n\n")
    if show:
        print(*model_info)
    if save:
        with open(
                f"./data/processed/forests/{config['DATASET']['NAME']}.RF{index}.txt",
                "w"
        ) as f:
            for item in model_info:
                f.write(item)

How to get .RF.txt files

Hi, it's a great job to integrate random forest into a decision tree. I can successfully run your example code, but when I tried to use my own data, I'm not sure how to change the random forest model into your .RF.txt format in src/resources/forests. Can you give me some help? Thanks.

Solving possible "Can't set attribute" error in build_classifier from persistence.py

I applied your method to the STULONG atherosclerosis dataset. However, when I was trying to run illustrative_example.ipynb I got the following error on the build_classifier function:

Traceback (most recent call last):
File "...\BA-Trees\venv\lib\site-packages\IPython\core\interactiveshell.py", line 3553, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 68, in
current_fold, n_trees, return_file=True)
File "...\BA-Trees\src\random_forests.py", line 85, in load
clf = pr.classifier_from_file(filename, X, y, pruning=True, num_trees=n_trees)
File "...\BA-Trees\src\persistence.py", line 320, in classifier_from_file
return build_classifier(trees)
File "...\BA-Trees\src\persistence.py", line 289, in build_classifier
clf.n_features_ = trees[0].n_features
AttributeError: can't set attribute

So, after a bit of investigation I found that the lines 279 and 289 were using a deprecated attribute name, "n_features_".
The solution is to change both to "n_features_in_", like below.

def build_classifier(trees):

  def build_decision_tree(t):
      dt = DecisionTreeClassifier(random_state=0)
      dt.n_features_in_ = t.n_features
      dt.n_outputs_ = t.n_outputs
      dt.n_classes_ = t.n_classes[0]
      dt.classes_ = np.array([x for x in range(dt.n_classes_)])
      dt.tree_ = t
      return dt
  
  if len(trees) > 1:
      clf = RandomForestClassifier(random_state=0, n_estimators=len(trees))
      clf.estimators_ = [build_decision_tree(t) for t in trees]
      clf.n_features_in_ = trees[0].n_features
      clf.n_outputs_ = trees[0].n_outputs
      clf.n_classes_ = trees[0].n_classes[0]
      clf.classes_ = np.array([x for x in range(clf.n_classes_)])
  else:
      clf = build_decision_tree(trees[0])
  return clf

vidalt / ba-trees Goto Github PK

ba-trees's People

Contributors

Stargazers

Watchers

Forkers

ba-trees's Issues

Build classifier

Function to save Random Forest model to txt file.

How to get .RF.txt files

Solving possible "Can't set attribute" error in build_classifier from persistence.py

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent