Git Product home page Git Product logo

sf-dat-20's Introduction

SF-DAT-20

##Lecture 1 Summary (Introduction on Data Science part I)

  • We talked about different roles of Data Scientists
  • T-Shaped Data Scientists
  • Data Science Workflow
  • Continuous, Discrete and Qualitative Data
  • Supervised vs Unsupervised Learning
  • Set up github accounts
  • set ipython notebook
  • Introduced Numpy

Lecture 2 Summary (Introduction on Data Science part II)

  • Classification vs Clustering and Regression vs Dimentionality Reduction
  • Flexibility vs Interpretability
  • Different types of data (Cross-Sectional, Time-Series, Panel Data)
  • Walkthrough Acquire& Parses with Pandas
  • HW 1 assigned - Due date Feb 8th at 6:30PM

Lecture 3 Summary (Basic Statistics - Review Session)

  • Measures of central tendency (Mean, Median, Mode, Quartiles, Percentiles)
  • Measures of Variability (IQR, Standard Deviation, Variance)
  • Skewness Coefficient
  • Kurtosis Coefficient
  • Boxplots
  • Bias vs Variance
  • Central Limit Theorem โ€“ Standard Error of Mean
  • Class/Dummy Variables
  • Walkthrough describing and visualizing data in Pandas

Lecture 4 Summary (Linear Regression Lines - Part I)

  • Linear Regression lines
  • Single Variable and Multi-Variable Regression Lines
  • Capture non-linearity using Linear Regression lines.
  • Interpretting regression coefficients
  • Dealing with dummy variables in regression lines
  • intro on sklearn and searborn library
  • HW 2 assigned - Due date Feb 17th 2016 at 6:30PM

Lecture 5 Summary (Linear Regression Lines - Part II)

  • Hypothesis test - test of significance on regression coefficients
  • p-value
  • Capture non-linearity using Linear Regression lines.
  • Different types of errors and R-squared
  • Interaction Effects

Lecture 6 Summary (Model Selection)

  • Bias-Variance Trade off
  • Validation (Test vs Train set)
  • Cross-Validation
  • Ridge and Lasso Regression
  • (Optional) Backward Selection, Forward Selection, All Subset Selection. (If you want to use these methods you need to use R)

Lecture 7 Summary (Missing Data and Imputation)

  • Types of missing data (MCAR, MAR, NMAR)
  • Single imputation and their limitations
  • Imuptation using regression lines and error
  • Hot deck imputation
  • multiple imputation

Lecture 8 Summary (K-Nearest Neighbors)

  • Classification Problems
  • Misclassifciation Error
  • KNN algorithm for Classification
  • Cross-Validation for KNN Algorithm
  • Limitations of KNN Algorithm
  • KNN algorithm for Regression

Lecture 9 Summary (Logistic Regression Part I)

  • Intro to Logistic Regression
  • Odds vs Probability
  • Using Logistic Regression to Make predictions
  • How one interprets coefficients of Logistic Regression model
  • Strength and weaknesses of Logistic Regression Model

Lecture 10 Summary (Logistic Regression Part II)

  • Unbalanced observations and Logistic Regression
  • FP/FN/TP/TN/FPR/TPR
  • The effect of chaning Threshold
  • ROC Curves
  • Area Under Curve
  • How to compare classifciation algorithms

Lecture 11 Summary (Decision Trees Part I)

  • Decision Tree for Regression
  • Greedy Approach
  • Decision Tree for Classification
  • Gini Index and Entropy index
  • Limitation of Simple Decision Tree

Lecture 12 Summary (Decision Trees Part II)

  • Bagging
  • Random Forest
  • Boosting
  • Tuning parameters for boosting and Random Forest

Additional Resources

Lecture 13 Summary (Natural Language Processing)

  • Definition of Natural Language Processing
  • NLP applications
  • Basic NLP practice
  • Stop words, bag-of-words, IF-DIF

Additional Resources

Lecture 14 Summary (Principal Component Analysis)

  • Principal Component Analysis
  • Computation of PCAs
  • Geometry of PCAs
  • Proportion of Variance Explained

Additional Resources

Lecture 15 Summary (Time Series Models)

  • AutoRegressive Models
  • Moving Averages
  • ARMA
  • ARIMA

Additional Resources

Lecture 16 Summary (Databases and SQL)

  • Talked about databases and data warehouse design.
  • Introduction to SQL and learn the Fundamental Growth Query.
  • Look at product engagement data of a fictional company and use FGQ to compute retention curves.
  • Apply convolution to the retention curve to project future active users.
  • Build a model to predict the retention likelihood of individual customers.
  • Thanks to Michael

Additional Resources

Lecture 17 Summary (Naive Bayes)

sf-dat-20's People

Contributors

dataminingclass avatar vanessaohta avatar

Watchers

Dan Quasney avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.