Git Product home page Git Product logo

dsi-hackathon-gfc's Introduction

Data Science Venn Diagram

Creator: Alexander Combs, NYC


Today we are going to have a team-based competition. The goal is to create the best performing model on a hold-out sample of data. Simple right?

Well, there is a catch.

This will be a constrained optimization. To understand what that means, let's take a look at the Project Management Venn Diagram, below.

The idea is that for any project you can have any two of these. You can have good work done cheap, but it will take a long time. You can have good work done fast, but it won't be cheap. Or you can have work done fast and on the cheap, but it won't be good.

Today we will apply this concept to data science.

You will be given a dataset and teams will be randomly assigned to one constraint: samples, features or algorithm.


Team 1 - Sample Constraint

  • Your choice of algorithm
  • Your choice of features
  • Must use the cheap train sample

Team 2 - Features Constraint

  • Your choice of algorithm
  • Limited to a maximum of 20 features
  • Your choice of samples

Team 3 - Algorithm Constraint

  • Must use a Random Forest
  • Your choice of features
  • Your choice of samples

Deliverables

Your team will have until 2:30 EST (Presentation time) to build the very best model possible under those constraints!

  • Modeling, predictions csv, and slide deck done by 2:30pm EST
  • Group presentations (5 min, semi-technical audience) with slide deck between 2:30pm-3pm EST (presentation time)
  • Repo with organized notebooks due by 2:30pm EST

Descriptions of the data can be found here.

Submission


The task is to predict if a person's income is in excess of $ 50,000 given certain profile information, and more specifically to generate the labels for income being above $50,000 for each row in the test set. This will simply be a csv called group#-group-submission.csv, for example 1-group-submission.csv, with a single column of the predictions [0,1]. One member from each group will Slack the csv with your predictions to Rowan by 2:30pm EST. Only one member from each group should submit the link to your GitHub repository to Google Classroom.

Good luck!

dsi-hackathon-gfc's People

Contributors

ksylvia16 avatar pwalesdi avatar kellyslatery avatar seanhulseman avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.