Git Product home page Git Product logo

decisiontreeexplorer's Introduction

Decision Tree Explorer

This single-page Shiny app is intended to help people learn (and teach) about fundamentals of data science, and (hopefully) to make it easier for subject matter experts to participate in the initial rounds of the modeling process. The basic idea is to interactively visualize the effects of simplification (pruning) on the performance of decision trees, so that you can try to discover simple rules that describe or predict the realtionships between inputs and outcomes in the data. For more details see the Powerpoint presentation.

Here is a screenshot of the app in action:

screen_shot

To run it yourself, download the repo, open the RMarkdown file decision_tree_explorer.Rmd in RStudio, and run it.

Lift plots and cumulative gain curves are calculated from pre-computed values stored in the tree, so only the model (not the actual data) is required to visualize the performance of a pruned tree. The rule for reaching each node in the tree, as well as the number of cases and proportion of positive cases in the test set for each node are pre-computed and attached to the tree object, so the app only needs to perform simple calculations to draw the visualizations.

Instructions

Choose a model file from the drop-down list. These models are decision trees that have been trained on one subset of data (the training dataset), and their performance has been validated on another subset (the test data). The test set performance indicates how well the model works on data it has not seen before.

Use the "complexity adjustment" control to make the model simpler or more complex by pruning the tree.

Display the "training" dataset to see the model performance on the data it was trained on. Differences between training set performance and test set performance usually reflect overfitting.

The lift plot shows a bar for each leaf in the tree. The width of the bar shows how many cases it covers, and the height shows the percent of those cases that are positive. The order of the bars is based on their height in the training set. If bars for the test set are not in order from tallest to shortest, this is a sign of overfitting.

The cumulative gain plot is basically the integral of the lift plot, and it has the same x-axis. The dotted gray line shows what the performance of a perfect classifier would look like, and the yellow curve shows the performance of the unpruned tree. An unpruned tree will usually perform better on the training set than on the test set, due to overfitting.

Training "Validated" Decision Tree Models

The Shiny app uses augmented rpart decision trees created by the train_validated_tree function in the file decision_tree_explorer_lib.R. The function generate_titanic_validated_tree is just a wrapper for example code that illustrates the process.

If the MicrosoftML RevoScaleR package is available it will be used to build the initial trees; otherwise trees will be trained using the open source rpart package. The RevoScaleR trees will be converted to rpart objects.

TO DO:

  • documentation
  • simplified rules
  • report complexity parameter value in info box

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

decisiontreeexplorer's People

Contributors

microsoft-github-policy-service[bot] avatar microsoftopensource avatar msftgits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

decisiontreeexplorer's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.