Git Product home page Git Product logo

boston-dataset's Introduction

BOSTON-DATASET

This dataset contains information collected by the U.S Census Service concerning housing in the area of Boston Mass. It was obtained from the StatLib archive (http://lib.stat.cmu.edu/datasets/boston), and has been used extensively throughout the literature to benchmark algorithms. However, these comparisons were primarily done outside of Delve and are thus somewhat suspect. The dataset is small in size with only 506 cases. The data was originally published by Harrison, D. and Rubinfeld, D.L. 'Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. The name for this dataset is simply boston. It has two prototasks: nox, in which the nitrous oxide level is to be predicted; and price, in which the median value of a home is to be predicted. We start by spliting the boston dataset into a training and testing set . we start by applying Decision tree , usind rpart() in R . Then we check for missing values and imbalance in the dataset . There are ni missing values but the dataset is imbalanced , so we will use the SMOTE algorithm to work on the imbalance training set . After this we try and apply a different algorithm XGboost and compare its accuracy and quality with the base rpart() model . We furthur improve the model by dimmiensionality reduction by PCA and use that dimensionally reduced data to build a decision tree and this gives the best result .

R

As usual, we will first download our datasets locally, and then we will load them into data frames in both, R . Source of dataset : https://archive.ics.uci.edu/ml/datasets/Housing In R, we use read.csv to read CSV files into data.frame variables. Although the R function read.csv can work with URLs, https is a problem for R in many cases, so you need to use a package like RCurl to get around it. Libraries used : 1)library(readxl) #to read .xlsv file . 2)library(caTools) #for sample.split . 3)library(rpart) #for prediction() , performance() . 4)library(rpart.plot) #for plotting ROC curve . 5)library(xgboost) #for applying XGboost . 6)library(DMwR) #for applying SMOTE . 7)library(factoextra) #for PCA

boston-dataset's People

Contributors

meet-sapu avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.