Git Product home page Git Product logo

bike-price-predictor's Introduction

Bike Price Predictor

A simple flask web application powered by xgboost that helps in predicting the prices of the bike based on given inputs

Project Highlights

  • model accuracy of 92.5% has been acheived
  • Generalised model with train accuracy of 96% and test of 92.5%
  • Detailed jupyter notebook with each and every step explanined
  • Usage of sklearn pipelines [ for training 11 different models synchronously ]

Technologies Used

  • Python
  • Flask
  • Pandas
  • Numpy
  • Seaborn
  • Scikit-learn
  • Html
  • Css
  • Pickle

Dataset Description

Dataset is taken from kaggle and can be downloaded from here

This contains information about different bikes and their prices and has 7857 rows and 8 columns

Column name Description
model_name The name of the bike's model. It contains some additional information like model year,engine etc.
model_year The year in which the model was built.
kms_driven Total kilometers the bike has been driven.
owner The represents which type of owner the bike has like it is first owner which means the current owner had bought the this bike as new, second owner means the bike has been sold to this owner from first owner and so on.
location The location of the seller.
mileage Average mileage the bike gives. Its is represented as kilometer per liter of petrol (kmpl).
power Power is in terms of Bhp. BHP is the rate at which the torque generated by the engine in a bike is delivered to the wheels. Such that faster the deliverability, higher is the speed of the motorcycle and vice versa. For a bike that consists of a lower BHP can pull higher loads and for a bike that contains a greater BHP can propel the bike at faster speeds.

Run Locally

Clone the project

git clone https://github.com/RishiBakshii/Bike-Price-Predictor.git

Go to the project directory

cd path/to/the/cloned/repository

Install dependencies

pip install -r requirements.txt

Start the server

py app.py

Lifecycle of this Project

  • Data Cleaning and Pre-Processing
  • Exploratory Data analysis
  • Feature Engineering
  • Modelling
  • Deployment

Data Cleaning and Pre-Processing

  • Columns like model_name, mileage ,kms_driven and power were like this in the initial stage

  • A lot of data Cleaning has been perfomed to clean them and make it look like this, all the cleaning functions were written from scratch

  • all the values in different units like HP and Kw has been converted to bhp in the power column

Exploratory Data Analysis

  • Royal enfield is the highest selling bike with the selling rate of 23.40%

  • then comes Bajaj Pulsar at 14.78% selling rate

  • TVS Apache is at 6.48%

  • Bajaj Avenger is at 5.21%

  • Yamaha YZF-R15 at 3.84%

  • harley davidon fat is the most expensive bike in the dataset which average price goes up to 9.85 lakhs almost 1 Crore

  • most of the bikes are manufactured between 2010 and 2020

  • Almost all of the bikes are driven under 2 lakh Kilometres

  • Delhi is the main base of bike buisness ( sales )

  • as delhi contributes 21.67% to the total sales

  • mumbai contributes 12% and bangalore contributes 11% in the sales of bike

  • Ranchi has the highest avg of bike price i.e 1.70Lakhs

  • Power and Price have have a very strong relationship

  • Distribution of the target column price is extremely right skewed

Feature Engineering

  • Treatment of Outliers in target column "Price"

  • Fixed the skewed Distribution of Price

  • Created column transformers for encoding( onehotencoding and Ordinal Encoding ) and scaling ( MinMaxScaler ) of data

 encoder_transformer=ColumnTransformer([
    ('onehotencoding',OneHotEncoder(sparse=False,handle_unknown='ignore',drop='first'),[0,4]),
    ('ordinalencoder',OrdinalEncoder(categories=[['fourth owner or more','third owner','second owner','first owner']],handle_unknown='error'),[3]),
],remainder='passthrough')

scaler_transformer=ColumnTransformer([
    ('StandardScaler',MinMaxScaler(),[1,2,5,6]),
],remainder='passthrough')

Modelling

  • Created piplines for iterative training of 11 different models
pipeline_lr=Pipeline([('encoder_transfomer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                      ('linear',LinearRegression())
                      ])

pipeline_las=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('lasso',Lasso())
                     ])

pipeline_ridge=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('ridge',Ridge())
                     ])

pipeline_knn=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('knn',KNeighborsRegressor())
                     ])

pipeline_dt=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('dt',DecisionTreeRegressor())
                     ])

pipeline_svm=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('svm',SVR())
                     ])

pipeline_rf=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                      ('rf',RandomForestRegressor())
                      ])

pipeline_gbr=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('gbr',GradientBoostingRegressor())
                     ])

pipeline_abr=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('abr',AdaBoostRegressor())
                     ])

pipeline_etr=Pipeline([('encoder_transformer',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('etr',ExtraTreesRegressor())
                     ])

pipeline_xgb=Pipeline([('encoder_transformer0',encoder_transformer),
                        ('scaler_transformer',scaler_transformer),
                     ('xgb',XGBRegressor())
                     ])
  • Training of all the pipelines

  • Model's Performance

    XgBoost was the most generalized model with the highest accuracy and lowest differnece between bias and variance

  • Model Evaluation

    Residual plot

    • Residuals are densely populated between the range of -100 and 100
    • some outliers are present in the lower magnitude

    Relationship between actual and predicted values

    • There can be seen a very strong linear relationship between the actual and predicted values

Deployment

Authors

bike-price-predictor's People

Contributors

rishibakshii avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.