Kickstarter is a US based global crowd funding platform focused on bringing funding to creative projects. Since the platform’s launch in 2009, the site has hosted over 159,000 successfully funded projects with over 15 million unique backers. Kickstarter uses an “all-or-nothing” funding system. This means that funds are only dispersed for projects that meet the original funding goal set by the creator.
Kickstarter earns 5% commission on projects that are successfully funded. Currently, less than 40% of projects on the platform succeed. The objective is to predict which projects are likely to succeed so that these projects can be highlighted on the site either through 'staff picks' or 'featured product' lists.
Name | Github Page | Personal Website |
---|---|---|
Nateé Johnson | nateej1 | --- |
Misha Berrien | mishaberrien | www.mishaberrien.com |
- Machine Learning
- Data Visualization
- Predictive Modeling
- Python
- Pandas, jupyter
In order to increase the number of successful campaigns, we propose two related solutions:
- Predict Successful Campaigns and promote those with the lowest predicted probability of being successful.
- Contact creators from those campaigns that are just below the “success” margin and give them insights that will help them succeed.
- Clone this repo.
- A sample of the the deduplicated dataset can be found in the data_sample folder here.
- In order to reproduce results first open the "results" file located in the results folder here. Then change the two file paths at the beginning of the document
from:
kick_deduped = pd.read_csv('../../data/02_intermediate/kick_deduped.csv.zip')
cluster_features_df = pd.read_csv('../../data/03_processed/KNN_cluster_features_.csv'))
to:
kick_deduped = pd.read_csv('../../data_sample/kick_deduped_sample.csv.zip')
cluster_features_df = pd.read_csv('../../data_sample/KNN_cluster_features_.csv'))
then run the results file.
-
The data processing/transformation scripts are being kept in the src folder here
-
A data dictionary can be found in the references folder here
This file structure is based on the DSSG machine learning pipeline.