Git Product home page Git Product logo

machine-learning's Introduction

Machine-Learning

In this problem, given a dataset that records house prices of king county. I used this data to explore the facts on which parameters the price depends and built a model which will be able to predict price of the house based on the set of parameters. My Initial hypothesis was:- The price of the house is dependent on the area of the house and the year built. The price of the house also depends on the grade of the house. The price of the house also depends on the number of bedrooms and bathrooms.

Let us do some EDA by plotting the graphs and understanding the variation the price of with respect to the variables. EDA:- In this process we will analyze and find out the following : Missing values. All numerical variables. Distribution of numerical variables. All categorical variables. Distribution of categorical variables. Outliers. Relationship between independent and dependent features(Price). Results:- We do not have any missing values in our dataset . Hence we go forward with finding out the numerical and categorical variables. We see that if the house is recently renovated then the price of the house is high. We plot histograms to check the skewness for continuous numerical variables. We observe that features have normal distribution and some of them are skewed. If data is skewed we transform it into log normal distribution. We plot the bar plots to study the relationship of discrete variables with price. The relationship between grade and price is exponential. We can see that all the features have monotonic relationship with Price. We can also see they have a positive correlation. As sqft_living increases the sales price of the house also increases, and same is for rest of the features. From the EDA process we infer that the initial hypothesis and assumptions made by us were incorrect and the price of house depends on various other factors other than those taken in consideration during the hypothesis. FEATURE ENGINEERING AND FEATURE SCALING:- We apply the following feature engineering steps to improve the performance of the model. We transform the variable yr_built to the age of house in years. We have seen that the numerical variables were skewed. We handle them taking the log transform of these variables. Note : Here we should only take log transform of non-negative and non-zero columns. If we take log of negative and zero columns we might get NaN values. 3. Feature Scaling : In this particular dataset we have many features which have different units. Hence it is necessary we do feature scaling to apply a particular machine learning algorithm. We use MinMaxScaler which scales all the values between 0 and 1. Note : We need to perform feature engineering on both the datasets i.e. training data and test data. LINEAR REGRESSION: We perform a Linear Regression on the featured data and test the model on the featured test data to get the R squared and RMSE score. Here we consider all the numerical variables in the dataset after feature engineering as the set of predictors and price as the target.� We drop zip code, id and date from the predictors.

machine-learning's People

Contributors

manishagupta97 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.