Git Product home page Git Product logo

rideservices's Introduction

RideServices

Project logo

⚡ 🚀 Ride Services analysis


Find insights from the dataset for ride service.

📝 Table of Contents

🧐 About

Understand the dataset, estimate the charged rate, identify patterns, and determine the best locations for drivers to pick up customers to achieve a higher charge rate.

Prerequisites

SQL, Tableau, Python or Jupyter Notebook

🔖 Data Cleaning

Basic information of the dataset (please refer to Appendices on the page 10, 11 and 12)

a.	Original record: 1337759 rows, 20 columns

b.	Data: continuous data (regression models)

c.	Duplicates: No duplicates

d.	Outliners: 17997 rows were removed

    i.	Three rows (RateCodeID=99, which were not explained on the data dictionary)
    ii.	17994 rows (Trip distance was 0, but the fare amount was more than $100)
e.	Null values: 1337759 counts on the “Ehail_fee” column.

    i.	The null values were updated into 0 
    
f.	Final record: 1319765 rows, 20 columns

Tools were used for this project: Python, MySQL, and Tableau

a.	Viewed the dataset and imported it into Mysql: Python 

b.	Data cleaning and queries: Mysql

c.	Data analysis and visualization: Tableau

image image image image

🌱 General Picture

  1. Which vendor had the most trips? How many trips were taken?

    Vendor #2 VeriFone Inc had the most and 1036551 were taken. (Please refer to the below screenshot)

    image

  2. Please pull the records with the highest fare or passenger_count by date (day)’

    Please refer to the below screenshot:

    image

  3. Estimate the charged rate for each RateCodeID. (For this question, assume rates are only charged based on distance. )

    • Forecast model is applied to predict the charged rate

    • It is for the next week from July 1st, 2014, to July 12th, 2014.

    • The orange/red line is the trend of the charged rate. Please refer to the below screenshot:

    image

  4. What was the average difference between the driven distance and the haversine distance of the trip?

    The average difference was 8 miles:

    •	Calculations for two points: Pickup point & Dropoff point
    
          o	Pickup Point=MAKEPOINT([Pickup latitude],[Pickup longitude])
          o	Dropoff point=MAKEPOINT([Dropoff latitude],[Dropoff longitude])
    
    •	Calculation for Haversine distance: 15,185,531 miles.
    
          o	Haversine distance=Distance([Pickup Point],[Dropoff Point],'mi')
    
    •	Average difference calculation: 8 miles
    
          o	AVG([Haversine distance]-[Trip distance])
    
    •	Please refer to the below screenshot
    

image

  1. Are there any patterns with tipping over time? If you find one, please provide a possible explanation!
  These Patterns are based on the trip type #1 Street-hail:
  
       1.	The tipping rate has weekly seasonality:
       
            o	Every week, the tipping rate is slowly going up from Monday to Saturday, and Saturday has the highest tipping rate, then it goes down slowly to the bottom on Monday. 
  
       2.	There is the highest tipping rate from Saturday at 11:00 PM to Sunday at 00:00 AM:
       
            o	The reason of the tipping rate has a significant jump on the Saturday and Sunday:
                	Friends and family gatherings, trips, church, etc. 

image

image

  1. Can you predict the length of the trip based on factors that are known at pick-up? How might you use this information?

    Yes, the length of the trip would be predicted:

      •	Trip time calculation for the data from each trip
    
            o	DATEDIFF ('hour',[Lpep Pickup Datetime],[Lpep_dropoff_datetime])
    
      •	Trip time and trip distance data were applied to the Quantile model
    
            o	MODEL_QUANTILE(0.5, ATTR([trip_time]), ATTR([Trip distance]))
    
                  For example, the drivers will get the information on the trip distance before they pick up customers. If the distance is around 17.9 miles, the length of the trip                              will be more than an hour, which make sense since the traffic is heavy in New York, and the travel speed also needs to be considered.   (Please refer to the below                              screenshot)
    

image

🌽 Other Insights

  1. Get creative! Present any interesting trends, patterns, or predictions that you notice about this dataset.
  The interesting facts are based on:

      Two vendors #1 Creative Mobile Tech (283,214 records) vs #2 VeriFone Inc (1,036,551 records)

      Differences:
      
      •	Vendor #2 VeriFone Inc has more trips than #1 Creative Mobile Tech; however, #1 Creative Mobile Tech has more no charge payments (based on RateCodeID=1 Standard rate charge & 5               Negotiated fare). (Please refer to the below screenshots)

image image

      Similarities:

      •	Both vendors show the main customer charge comes from RateCodeID=1 Standard rate fare & 5 Negotiated fare. 
      
      •	Both data sets show the same pattern:
      
          o	There is a weekly seasonality
          
          	The customer charge rate reached the maximum on Saturday and the minimum on Monday.
           
          o	The peak charge rate occurs from Saturday at 11:00 PM to Sunday at 0:00 AM. 
          
          o	Both maps show the same area for more customers and a higher charge rate.

image image

🎉 Conclusion

The best date and hour for a driver to get more businesses and the peak charge rate from customers is from Saturday at 11:00 PM to Sunday at 0:00 AM, and the best locations are the common black area from both maps. (Please refer to the Q7 screenshots)

rideservices's People

Contributors

yinghu1234 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.