Git Product home page Git Product logo

merrillm1 / ultimate_challenge Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 4.23 MB

Predicted rider retention for a taxi service and identified most significant factors that contributed to it. Achieved an 80% accuracy with a catboost model, which was chosen for its interpretability.

Python 0.16% Jupyter Notebook 99.84%
retention-analysis random-forest catboost feature-engineering feature-importance shap auroc accuracy

ultimate_challenge's Introduction

The Ultimate Challenge

This was a three part challenge aimed at predicting rider retention for a taxi service. The goal was to identify the most significant factors contributing to rider retention.

Part 1 - Exploratory data analysis

This part was seperate from the prediction phase and was meant to explore login trends for users over a given time period. The time stamp was aggregated by logins over 15 minute intervals and examined for intermediate trends. The timeseries plots reveal weekday surges around 12pm and 12am, and weekend surges around 4am (Friday and Saturday night).

Part 2 ‑ Experiment and metrics design

Problem prompt:

The neighboring cities of Gotham and Metropolis have complementary circadian rhythms: on weekdays, Ultimate Gotham is most active at night, and Ultimate Metropolis is most active during the day. On weekends, there is reasonable activity in both cities. However, a toll bridge, with a two-way toll, between the two cities causes driver partners to tend to be exclusive to each city. The Ultimate managers of city operations for the two cities have proposed an experiment to encourage driver partners to be available in both cities, by reimbursing all toll costs.

1. What would you choose as the key measure of success of this experiment in encouraging driver partners to serve both cities, and why would you choose this metric?
2. Describe a practical experiment you would design to compare the effectiveness of the proposed change in relation to the key measure of success. 

Please provide details on:

    a. how you will implement the experiment
    b. what statistical test(s) you will conduct to verify the significance of the observation
    c. how you would interpret the results and provide recommendations to the city operations team along with any caveats.

Response:

Assuming each driver is registered in either Matropolis or Gotham, we could measure the effectiveness of a bridge toll reimbursement experiment by tallying the number of rides each driver takes outside of their registered city. Our key measure of success would then be the average number of riders picked up outside of a drivers registered location. Assuming there is a significant bias towards ones own city, the average number would be low for drivers in both cities. The hope then would be that this statistic increases after the incentive is provided, thus deeming the experiment a success.

This can be done by testing the incentive with a randomly selected group of drivers and comparing the results to a control group who was not given the incentive. If the results indicate an increase, this can be tested by comparing means for each group and determining if the change is deemed significant.

Significance here can be determined using the two sample t-test, which will determine if the difference in means of randomly sampled groups from two populations is significant enough to rule out the null hypothesis. Our null hypothesis here would be that the incentive has no influence on whether a driver will leave his city to pick up a rider. Our alternative hypothesis would be that it does.

The results would be interpreted based on the 5% conventional threshold for significance. If the difference in means is considered significant it can be concluded that the incentive does motivate drivers to navigate outside of the their registered city limits to pick up a fare. However, the difference must also be significant enough to justify the cost of implementing the incentive. The outcome would need to be cross examined by the finance team to determine if the incentive is feasible for the entire population for both cities.

Part 3 ‑ Predictive modeling

Results

The best results came from the catboost classifier which was able to determine if a rider will be retained with about an 80% accuracy. The AUROC was about an 86% which is also promising. The top 3 most influential indicators of rider retention according to this model are weekday percentage of rides, location and the type of phone a rider has. The most significant indicator here is the weekday percentage of rides for users, so I will specifically focus on how Ultimate can leverage this to their advantage.

The question here is how the significant features can be interpreted, from the exploratory analysis performed it was evident that retained riders are balanced in their use of the Ultimate service, the percentage gradually varied from primarily weekend users to primarily weekday users, with very few polarized to either one. Riders who were not retained used the service almost exclusively on the weekends or during the weekday with over 60% of them split between the two.

Ultimate can use this to their advantage by advertising the service to riders who need it for work and play. This would be riders who live an inner city lifestyle and do not have a car, or are paying the cost of having to park. These customers would need the service not just as a convenience, but as a necessity. By increasing the number of riders who are using the service on weekdays and weekends, presumably for work and weekend outings, the percentage of retained riders should increase.

ultimate_challenge's People

Contributors

merrillm1 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.