Git Product home page Git Product logo

time-series-prediction-on-bike-sharing-data's Introduction

GitHub Depfu Travis (.org) Open Source Love

Clean-code-Challenge

The repository contains the solution of BlueYonder GmbH's challenge for data scientist position. Implementing a regression model on bike-sharing data-set to predict count of future rentals. The Code Quality matters a lot here therefore, PEP-8 was followed to make script more pythonic :)

This includes following -

  • Correct user defined fuction naming
  • Choosing clear variable names
  • Helper for all the functions
  • Spaces and punctiations as per PEP-8 standard
  • Raising clear exceptions
  • Etc.....

Table of Contents

  • Data-set description
  • Data Summary
  • Feature Engineering
  • Missing Value Analysis
  • Correlation Analysis
  • Visualizing Distribution Of Data
  • Visualizing Count Vs (Month,Season,Hour,Weekday,Usertype)
  • Fitting the model
  • Results

Data-set description

Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Our target is to predict the remtal count given the independent variables


Data Summary

Here you can see what is inside the data image


Simple Visualization Of Variables number Count

all 4 Seasons seem to have eqaul count 1:spring 2:summer 3:fall 4:winter image


It is quit obvious that there would be less holiday and more of normal day in 2 years of time. image


Working day is having sattistics. remember in plots 0:Falseand 1:True

image


Weather Count is as follows

  • weather

    1: Clear, Few clouds, Partly cloudy, Partly cloudy

    2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

    3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

    4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

image


Which month had the highest demand image


Which was the peak hour for renting the bike image


What temperature was best preferred for the ride image


Feature Engineering

You see! the columns "season","holiday","workingday" and "weather" should be of "categorical" data type.But the current data type is "int" for those columns. Let us transform the dataset in the following ways so that we can get started up with our EDA (Exploratory Data Analysis).

categoryVariableList = ["weekday",
                        "month",
                        "season",
                        "weather",
                        "holiday",
                        "workingday"]

for var in categoryVariableList:
    data[var] = data[var].astype("category")

image


Missing Value Analysis

Let's see if there is any missing on NA values in the entire dataset. SO, we dont have any missing value in the dataset. Yeeey...!! image


Correlation Analysis

To understand how a dependent variable is influenced by features (numerical) is to get a correlation matrix between them. Lets plot a correlation plot between "count" and ["temp","atemp","humidity","windspeed"].

image

temp and humidity features has got positive and negative correlation with count respectively.Although the correlation between them are not very prominent still the count variable has got little dependency on "temp" and "humidity".

windspeed is not gonna be really useful numerical feature and it is visible from it correlation value with "count"

"atemp" is variable is not taken into since "atemp" and "temp" has got strong correlation with each other. During model building any one of the variable has to be dropped since they will exhibit multicollinearity in the data.

"Casual" and "Registered" are also not taken into account since they are leakage variables in nature and need to dropped during model building.


Visualizing Count Vs (Month,Season,Hour,Weekday)

Looking at the following plot we can get some useful information. image

It is quiet obvious that people tend to rent bike during summer season since it is really conducive to ride bike at that season.Therefore June, July and August has got relatively higher demand for bicycle.

On weekdays more people tend to rent bicycle around 7AM-8AM and 5PM-6PM. As we mentioned earlier this can be attributed to regular school and office commuters.

Above pattern is not observed on "Saturday" and "Sunday".More people tend to rent bicycle between 10AM and 4PM.


After few more feature selection (see BlueYonder.py) We are all set to go for chosing the right Machine Learning model and evaluate it's performance.

How to run

1. clone this repository
2. open cmd in the cloned repo
3. type >>> python Clean_Regression.py

Now you can see the top hundred prediction and the loss of the model. If you want to see the plot of error by a specific regression model please visit explore_data folder of this repository.

image

Thanks a lot

time-series-prediction-on-bike-sharing-data's People

Contributors

nirajdpandey avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.