Git Product home page Git Product logo

random-forest-from-scratch's Introduction

Random-Forest-from-Scratch

Random forests is a supervised learning algorithm. It can be used both for classification and regression. It is also the most flexible and easy to use algorithm. A forest is comprised of trees. It is said that the more trees it has, the more robust a forest is. Random forests creates decision trees on randomly selected data samples, gets prediction from each tree and selects the best solution by means of voting. It also provides a pretty good indicator of the feature importance.

Working of the Algorithm

It works in four steps:

  • Select random samples from a given dataset.
  • Construct a decision tree for each sample and get a prediction result from each decision tree.
  • Perform a vote for each predicted result.
  • Select the prediction result with the most votes as the final prediction.


Documentation of Random Forest:

https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html


Parameters of the Algorithm:

  • min_samples_split (default value = 2)
  • nodes cannot be further seperated below this value.
  • criterion : optional (default=”gini”)
  • It controls how a Decision Tree decides where to split the data. It is the measure of impurity in a bunch of examples. This parameter allows us to use the different-different attribute selection measure. Supported criteria are “gini” for the Gini index and “entropy” for the information gain.
  • n_estimators(default value = 100
  • This parameter tells is the number of trees in the forest.

Evaluation of the Algorithm:

a) Without Parameter Tuning:

         precision    recall  f1-score   support

      0       1.00      1.00      1.00        14
      1       0.94      0.94      0.94        17
      2       0.93      0.93      0.93        14
avg / total   0.96      0.96      0.96        45

Accuracy: 0.9555555555555556

b) After Parameter Tuning:

        precision    recall  f1-score   support

      0       1.00      1.00      1.00        14
      1       1.00      0.94      0.97        17
      2       0.93      1.00      0.97        14
avg / total   0.98      0.98      0.98        45

Accuracy: 0.9777777777777777


Finding Imoprtant Features using Seaborn Library:

Finding important features or selecting features in the IRIS dataset.

random-forest-from-scratch's People

Contributors

geekquad avatar

Stargazers

 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.