Git Product home page Git Product logo

zillow-time-series-modeling's Introduction

Top 5 Tech Cities in the U.S to invest in

Forcasting housing market in major tech cities in the US

May 2022

Authors

alt text

Overview

Real estate has always been one of the most dependable markets when looking for consistent, yet high returns. Even after the housing crash of 2008, it only took a few years for the market to return to previous highs, and it has maintained steady growth ever since. This has been especially true in markets with high incomes and high density such as San Francisco or New York. Tech jobs are also gaining prominence in the job market and acquiring real estate that could serve these migrating employees could give us a competitive edge.

Business Problem

Tech is becoming a larger part of both the US and the Global economy every year. As tech grows in a city, it doesn't only bring tech jobs, it also brings other facets of culture. In the main tech hubs of America, you'll find much more than just the industrious culture of modern technology; there will be growth in art exhibits, breweries, parks, and many other places where people can share experiences. These traits make tech cities desirable places of residence not only for those in technology, but also anyone who values being in a place that is culturally engaging.

As time goes on, less people are deciding to stay in their small towns and are moving to larger cities instead. We can see a chart from business insider that portrays the shrinking of rural America here. Furthermore, when we look at the growth of cities, we find that the largest cities are growing at the fastest rate. This article from the Brookings Institute mentions this phenomenon.

A large proportion of this growth will most likely be seen in these emerging tech hubs due to their wide cultural and employment appeal. We selected 10 cities to analyze in America that we think hold promise as places of high growth. We decided on these 10 due to an Indeed article that asserted these cities as places of high prominence in the tech industry. Specifically, these cities were Washington D.C., New York City, Seattle, San Francisco, Los Angeles, San Jose, Dallas, Boston, Chicago, and Baltimore. Many of these places have expensive markets already, but there is no shortage of demand for housing in any of these cities. As the tech sector continues to grow, there will be an even greater need to develop housing. The political landscape is starting to warm up to higher density developments such as multiplexes, which will allow for new housing development opportunities in these markets that have previously been unprofitable. Focusing on these high growth areas will provide us an advantage over the competition that is more cautious to invest in these markets with higher upfront investment barriers.

Data

Home Price

Source: Zillow Dataset

Contents: We acquired data for 14,723 different zip codes in America. The data provided monthly data on the median home price for every zip code from April 1996 to April 2018. We selected the data from the 10 prominent tech cities specified earlier, and ran a time series analysis on all of them.

2017 Median Income

Source: Kaggle Dataset

Contents: We also found data on the median income for our 10 cities. We used this to look at the home price to income ratio for our 10 cities and compare it to the U.S. average of 5.75.

Methods

We can see from this graph that many of these markets have been increasing exponentially, and all have been consistently rising since recovering from the housing crash in 2008.

Baseline Model

For the baseline model, we shifted our time-series data for 3 periods. We chose the period of 3 years to allow us to forecast home prices 3 years ahead using our baseline model. Moreover, we used RMSE for our scoring and attained an RMSE score of $118,00 for our baseline model.

ARIMA Model

We used an ARIMA model to forecast the average house price in 10 cities. We score the model predictions using root mean squared error. The most critical component of the ARIMA model is the (p,d,q) order of the model for the autoregressive, differences, and moving average components. We found that our best model had the order of 1 for Auto Regression, 2 for differences and 3 for moving average components.

Our RMSE score for our final model was approximately $33,500 compared to our baseline RMSE which was $118,000.

Conclusion

Recommendations

The forecasts show the largest growth in San Jose, Los Angeles, Boston, Baltimore, and New York. We would suggest focusing on these markets.

San Francisco was a close runner-up. While San Francisco will provide a return of 20%, we do not recommend investing there because as you can see in this graph, they have the highest median home price, nearly 1.8 million dollars.

The risk and return trade-off is better in other cities with significantly lower home prices. We would be able to diversify much more in cities like Baltimore or Los Angeles.

There is a good balance to the 5 cities we have suggested here that should hedge against itself. San Jose, Boston, and New York are well established cities with not a lot of buildable land left. Just owning real estate in these cities will ensure that values will give consistent returns. However, Baltimore and Los Angeles have more land that could give huge returns if invested in and renovated properly. The trick would be doing this in a manner that does not feel to be undermining the affordability, culture, or diversity of these neighborhoods as is often the case. These rennovations must not feel gaudy, but seemless and at home with the current residents. Another more ethically straightforward, but politically challenging idea is looking into building more dense housing in the suburbs with multiplexes or townhouses. These projects often run into obstacles, but as NIMBY culture becomes less popular and zoning reform progresses, new investment opportunities should arise.

Information

Check out our notebook for a more thorough discussion of our project, as well as our presentation.

Next Steps

There are several steps that could be taken to give even more value if we had the funding. While our model performed quite well, it was evaluated on test data that was consistently a bull market. If the market was bear or particularly volatile, it is unlikely that the model would perform as well. This is particularly seen with the prediction for Chicago market. Because Chicago was in a short term slump in our most recent data points, this market shrinkage was projected to continue for the next three years. Designing a model that could differentiate volatility changes as opposed to structural market failures could allow for much better predictions.

Furthermore, our model only looked at factors endogenous to the time series data. While this is useful for understanding how real estate markets change over time, it does nothing to explain all the other factors that are driving changes in housing prices. If we had more data on factors relevant to housing prices, such as housing density, quality of infrastructure, and cultural engagement, then we could explain so much more of the variance that our model failed to explain. In particular, we could use more complex models like SARIMAX, or even a Long-short Term Memory Neural Network to catch on to patterns completely unexplored by our current model.

Finally, since this data has been recorded, much has changed in the real estate market. COVID-19 made the real estate market come to an abrupt halt, only for the absurdly low interest rates to trigger one of the greatest housing market shortages in decades. And now, with the interest rates increasing once more, it seems that the housing market is starting to cool off once more. A simple ARIMA model like ours would be completely inadequate to analyzing all the crazy changes seen over the last three years. Having more recent data could give very useful insights on understanding many different phenomena induced by the pandemic.

Repository Structure

 ├── Data
 ├── gitignore
 ├── Individual Notebooks
 │       ├── Alice's Notebook.ipynb
 │       ├── Hanis-Notebook.ipynb
 │       ├── Jordan's Notebook.ipynb
 │       ├── Kyongmin Stuff.ipynb
 │       ├── Tyler's (Real) Notebook.ipynb
 ├── figures
 ├── Final_Notebook-Top-5-Cities.ipynb
 ├── README.md
 └── Presentation.pdf
  

zillow-time-series-modeling's People

Contributors

hanis-z avatar jskominsky avatar aliceagrawal avatar kyongminso avatar twood2015 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.