Git Product home page Git Product logo

house-prices---advanced-regression-techniques---kaggle's Introduction

House Prices - Advanced Regression Techniques

In this notebook, I will try to predict the sales house prices and practice with feature engineering.

1. Loading Dataset

The first stage is loading dataset and we can do it by using pandas library.

We have training dataset (1460 rows and 81 columns) and testing dataset (1459 rows and 80 columns).

2. Data Visualization

The next stage is data visualization, we can draw the histogram and boxplot to check the skew and outliers of each feature.

There are many outliers in the dataset, we will keep them because we do not want to lose the important information. In order to handle outliers, we can use cross-validation technique.

3. Merge Train and Test

Next stage, we will merge training dataset and testing dataset for data preprocessing, then we will split the data before building model.

4. Handling Missing Values

  • Handling missing values of numerical features (we will change missing values by mean values of each feature).

  • Handling missing values of categorical features (we will change missing values of the features that have the keywords 'Qual' OR 'Cond' OR 'Qu' OR 'QC', and categorical features that have NA as a correct value into the string "Missing". The rest of categorical features, we will change missing values into the mode values).

5. Feature Engineering

  • Drop the id column because we do not need it to train the model.

  • Encoding categorical features (we will encode the categorical features where ranking can be applied and the remaining categorical features will be encode by using pandas.get_dummies() function).

6. Train-Test Splitting

Splitting the merged data into training dataset (shape of training data: 1460) and testing dataset (shape of testing data: 1459).

7. Feature Scaling

We will scale the training data by using MinMaxScaler.

8. Training Model

  • we will use Ridge Regression Model which is applied regularization technique to reduce overfitting.

  • In order to choose alpha parameter for Ridge Regression model and apply cross-validation technique, we will use GridSearchCV.

9. Evaluate The Model

we will use R-squared and Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally).

10. Predictions On Testing Data

Finally, we will predict on testing data and export the predicted result into csv file to submit.

house-prices---advanced-regression-techniques---kaggle's People

Contributors

phucvan-coder avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.