Git Product home page Git Product logo

dat_sf_19's Introduction

DAT SF 19 Course Repository

Course materials for General Assembly's Data Science course in San Francisco (11/30/15 - 3/2/16).

Instructor: Rob Hall

TA's:

  • Justin Breucop
  • Dave Yerrington

Office Hours

Who When
Justin Sundays 3-6pm at GA
Dave Fridays 6-8pm at GA
Rob Slack and by appointment

Setup Info

Installation and Setup Checklist

Git and Github Setup

Project Info

Course Project Info

Course Project Examples

Course Schedule

Monday Wednesday
11/30: Course Overview, Introduction to Data Science 12/2: Version Control
12/7: Intro to Python 12/9: Intro to Machine Learning, KNN
12/14: NumPy, Pandas, Viz, Model Evaluation 12/16: Regression & Regularization
Project Question & Dataset Due
12/21: No Class (Holiday Break) 12/23: No Class (Holiday Break)
12/28: No Class (Holiday Break) 12/30: No Class (Holiday Break)
1/4: Logistic Regression 1/6: Naive Bayes
1/11: Clustering 1/13: APIs & Web Scraping
1/18: No Class (MLK Day) 1/20: Advanced Model Evaluation
Project First Draft Due
1/25: Decision Trees 1/27: Ensembling Techniques
2/1: Dimensionality Reduction 2/3: Support Vector Machines
2/8: Recommender Systems 2/10: SQL, Databases
Project Second Draft Due (Optional)
2/15: No Class (President's Day) 2/17: Advanced Topic or Guest Speaker
2/22: Advanced Topic or Guest Speaker 2/24: Course Review
2/29: Project Presentations & Project Due 3/2: Project Presentations & Project Due

syllabus last updated: 12/12/2015


Class 1: Introduction to Data Science

  • Welcome from General Assembly staff
  • Course overview (slides)
  • Introduction to data science (slides)
  • Command line & exercise (code)
  • Exit tickets

Homework:

Resources:


Class 2: Version Control

Homework:

  • If you haven't already, complete the homework exercise listed in the command line introduction. Create a Markdown document that includes your answers and the code you used to arrive at those answers. Add this file to a GitHub repo that you'll use for all of your coursework, and submit a link to your repo using the homework submission form.

Git and Markdown Resources:

Command Line Resources:

  • If you want to go much deeper into the command line, Data Science at the Command Line is a great book. The companion website provides installation instructions for a "data science toolbox" (a virtual machine with many more command line tools), as well as a long reference guide to popular command line tools.

Class 3: Intro to Python

  • Jupyter Notebook overview (slides)
  • Intro to Python (slides)
  • Linear algebra refresher (slides)

Python Resources:


Class 4: Intro to Machine Learning & Classification with KNN

  • Intro to Machine Learning (slides)
  • Lab: KNN classification with Scikit-learn (notebook)

ML Resources:

  • For a more formal, in-depth introduction to machine learning, read section 2.1 (14 pages) of Hastie and Tibshirani's excellent book, An Introduction to Statistical Learning. (It's a free PDF download!)

KNN Resources:


Class 5: numpy & pandas, Visualization, Model Evaluation

  • Lab: numpy (notebook)
  • Lab: pandas (notebook)
  • Lab: Visualization with Bokeh (notebook)
  • Model Evaluation, incl. Cross Validation (slides)
  • Lab: Cross validation with Python and Scikit-learn (notebook)

Pandas Resources:

  • To learn more Pandas, review this three-part tutorial.
  • Browsing or searching the Pandas API Reference is an excellent way to locate a function even if you don't know its exact name.
  • If you want to go really deep into Pandas (and NumPy), read the book Python for Data Analysis by Wes McKinney, the creator of Pandas. Ping me on Slack for a discount code.
  • Here are examples of different types of joins in Pandas, for when you need to figure out how to merge two DataFrames.
  • Optional: Read the Teaching Assistant Evaluation dataset into Pandas, create the X and y objects (the response variable is "class attribute"), and go through scikit-learn's 4-step modeling process. (There's no need to submit your code unless you have a question or would like feedback!)

Model Evaluation Resources

  • For another explanation of training error versus testing error, the bias-variance tradeoff, and train/test split (also known as the "validation set approach"), watch Hastie and Tibshirani's video on estimating prediction error (12 minutes, starting at 2:34).
  • Caltech's Learning From Data course includes a fantastic video on visualizing bias and variance (15 minutes).
  • Random Test/Train Split is Not Always Enough explains why random train/test split may not be a suitable model evaluation procedure if your data has a significant time element.

Additional Resources:


Class 6: Regression & Regularization

  • Regression: Linear, Multiple, Polynomial (slides)
  • Regularization (slides)

Resources for Continued Learning over the Holiday Break

dat_sf_19's People

Contributors

hallr avatar breucopter avatar dyerrington avatar

Stargazers

 avatar  avatar

Watchers

Hamed Hasheminia avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.