Git Product home page Git Product logo

housing-capstone's Introduction

Housing-Capstone

Wake County Housing Market Tableau Public Link

google slides link

Team Seven - Marla, Yolanda, Robert

Project Concept

Modeling Wake County Real Estate - Model to compare and analyse price based on living area, bedrooms, bathrooms, and lot size. Using property types and zip codes as categorical bounds, we analysed factors driving the total sale price of a property, and the differences in weight of those features based on the property type and location.

Project Methodology

Using API data from recent Zillow sales in Wake County, North Carolina, we established a database to integrate the feature components of recent sales for our machine earning model.

Our multiple linear regression model includes four features:
-living area
-lot size
-bedroom
-bathrooms
These variables are modeled to determine their impact on sales price.

Incorporating ZIP code as a categorical variable allowed able to determine the geographical impact of the features across the county. Splitting the data by property type also allowed us to model the implied value of the land itself without the building improvments - a useful tool in a location with some aging housing stock that may be candidates for 'knockdown' redevelopment.

Results

County-wide Results

As a baseline, our model was executed against all samples in our dataset regardless of property type or ZIP code.

Summary stats for our model:
image

Our model showed and effective fit with an R-Squared of 81.3%.

Using Machine Learning Results To Predict Price

Given our results for our total sample and some of our segmented data, we can make well informed estimates to predict a sale price range for a property based on the four feature inputs used in our model. If a hypothetical 2268 square foot single family home with 3 bedrooms, 3 bathrooms on a .71 acre lot was input to the model, we would expect a sale price of $642,045.

Estimation procedure through our housing model:

Reminder of our feature weights and confidence intervals:
image

Our model caculated the price by weighting the coefficeints of the model (coef) again the hypothetical values listed above.
To explode our model and show the math we have the following fomula to generate price estimate

Estimated Price = constant + (number of bedrooms * -4.487e+04) + (number of bathrooms * -1.415e+04) + (acres of lot * 2.335e+05) + (square feet of living area * 200.1412)

Now the same formula with our hypothetical sample property (and eliminating the pesky scientific notation):
$642,045 = 199,400 + (3 * -44,870) + (3 * -14,150) + (0.71 * 233500) + (2268 * 200.1412)

The model can be adapted to drill down to isolate particular zip codes and property types, but as those slices are made from the total sample, we lose strength of fit in the model. As indicated by such a large span on the confidence of the bathroom feature, our model struggles to narrow the range on this feature. Cleaner input data would help the model, as would being able to avoid some of the data fill operations mentioned below.

Unfortunately the precision of this model is such that no one could use this tool to realistically bid on a property, but as a proof of concept, this model was able to execute the goal of the project.

Model Limitations and Opportunities

Data Integrity

Our machine learning model required no nulls, so a choice was required to fill null values. Given unlimited resources of time or a pristine dataset, we coud have avoided some distortion in our data. Many null values were able to be filled with zeros correcty, but in certain cases this was less than ideal. In lieu of intervening on the data in a call by cell fashion, we opted to make the fill in this project, and address that concern as an area that could be improved in future iterations of the model. The distortion shown below had little impact on the model, but due to the small number of entries in this subset of data, there was an impact in the model

Evidence of the fillNA operation can be seen by the cluster of results sitting on the "0" Living Area below: image

housing-capstone's People

Contributors

robgmizzou avatar yolandahjohnson avatar marlawinstead avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

marlawinstead

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.