Git Product home page Git Product logo

rstuido-housing-prices-analysis-prediction's Introduction

rstuido-housing-prices-analysis-prediction

Explore and Predict Housing Sales Price in Ames, IA

This project seeks to explore and forecast housing prices in Ames, Iowa. The goal is to identify significant predicting factors, develop a simple linear regression model, and determine how well the model fits the data or the accuracy of the predictions. The training data set of 1000 observations and 25 variables will fit into the linear regression model. The testing data set of 460 observations and 25 variables will determine how well the predicted prices fit the observed costs.

Interpretation

The summary statistics show the maximum and minimum sale prices. The median price is $161,750.The median is a better indicator of average prices than the mean given many high outliers. The median price is less affected by those outliers than the mean. The testing set’s histogram and boxplot are right-skewed, demonstrating most housing prices in are on the lower end, at or below $210,000. The majority are between approximately $100,000 and $200,000. The boxplot demonstrates multiple outliers. After combining the training and testing data sets and replotting the information, the data is still right-skewed with multiple outliers. The boxplot is narrower in the combined graph given the larger number of observations, with most prices falling at or below $200,000. Again, most houses fall between $100,000 and $200,000.

A linear regression model uses the characteristics of each sale to predict housing prices. A summary of the linear regression model shows several significant predictors for the SalePrice. These predictors include Lot Area, Overall Quality, Overall Condition, Year Built, MasVnrArea (masonry veneer area), Total BsmtSF (basement square footage), GrLivArea (ground living area), BedroomAbvGr (bedrooms above ground), KitchenAbvGr (kitchens above ground), TotRmsAbvGrd (total rooms above ground), and GarageArea. All except the BedroomAbvGr and KitchenAbvGr variables have positive coefficients. Therefore, when all variables are considered, as BedroomAbvGr and KitchenAbvGr increase, the SalePrice decreases. As all other variables with a positive coefficient increase, so does the SalePrice.

The significant predictor variables are indicated by the low p-value < 0.05. Overfitting could be an issue with simple regression models; however, the random sampling is large and includes all necessary variables to test. In overfitting, results may be overly optimistic and findings difficult or impossible to replicate on other data sets, but that does not look like the case here. The R-squared and adjusted R-squared values can be indicative of model accuracy. The values equal 0.8473 and 0.8423, respectively, indicating a good fit model. The predicted prices in Figure 6 are for the first 20 observations of the testing data once missing records have been removed. A data frame and line graph are created to analyze this data better by comparing the actual with predicted values. The actual prices are visually very similar to the predicted prices, another indication of a good-fit model.

Summary

Overall, the training set was large enough to create a reliable, simple linear regression model to predict housing sale prices. The median prices in the summary data are the best indicators of average prices because these are less affected by the significant outliers. Outliers in this set are values over $350,000, as demonstrated on the boxplots for the testing and combined data sets. The training set was used to create a linear regression model, and the testing set helped determine how well that model fit the data, which was determined to be a good fit based on sample size and R-squared values. The linear regression model identified the significant predictors, most of which had positive coefficients. That is, as those variables increased, the Sale Price also increased. The two variables with negative coefficients—bedroom and kitchen above the ground level—indicate that as they increased, the SalePrice decreased.

Finally, once missing data were omitted from the test set, the first 20 rows were pulled to predict values using the linear regression model. These were then compared to the actual prices in a table. As that did not tell much of a story, a line graph was used to visualize the comparison. Visually, the prices were very similar, another indication of a good-fit model.

rstuido-housing-prices-analysis-prediction's People

Contributors

rachh8283 avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.