Git Product home page Git Product logo

cmc-csci145-math166's Introduction

CSCI145 / MATH166: Data Mining

Important links:

  1. Data Mining vs Machine Learning vs Artificial Intelligence vs Statistics
  2. What do data scientists get payed?

About the Instructor

Name Mike Izbicki (call me Mike)
Email [email protected]
Office Adams 216
Office Hours MW 3:45-5:00 or by appointment (see my schedule)
Webpage https://izbicki.me
Research Machine Learning (see izbicki.me/research.html for some past projects)
Fun Facts grew up in San Clemente, CA (1 hr south of Claremont)
7 years in the navy, worked on nuclear submarines and at NSA
left Navy as a conscientious objector
phd/postdoc at UC Riverside
taught in DPRK

About the Course

General Information:

  1. This is the theory course for CMC's Data Science major
  2. Combines linear algebra, statistics, and computation
  3. Prepare you for industry or graduate school

Learning Objectives:

  1. Exposure to research-level data mining
    1. Understand the latest algorithms
    2. But algorithms get outdated fast, and data mining practitioners must be able to read math
  2. Major algorithms
    1. Eigen-methods for data mining
    2. Logistic regression
  3. Major concepts
    1. Bias/variance trade-off
    2. Regularization
  4. Major Theorems
    1. The VC Dimension theorem
    2. The SGD convergence theorem
    3. (maybe) The Johnson-Lindenstrauss Lemma
    4. (probably not) The Cramer-Rao bound and Fisher information
  5. Feature generation methods
    1. Text (English, non-English)
    2. Social media
    3. Kernels
  6. Ethical implications of data mining
  7. Apply data mining libraries (PyTorch, scikit-learn, GenSim, spaCy, etc.)
    1. Teaching you how to use these libraries is NOT the primary goal of the course
    2. Approximately 1/3 of the homeworks are programming related, but these assignments are designed to help you understand the math

Prerequisite knowledge:

  1. linear algebra
    1. eigenvectors
  2. statistics
    1. linear/logistic regression
    2. (no class listed as a prereq in the catalog because there are more than 20 stats classes offered)
  3. computation
    1. big-o analysis
    2. git
    3. use python libraries
    4. generating plots

Textbook:

All resources are freely available online

  1. Understanding Machine Learning: From Theory to Algorithms (freely available here)
  2. lots of research papers (5-10)

Grades:

Category Percent
Homework 80
Project 20

This will be a hard class, but a low-stress class.

  1. The material is intrinsically hard

    1. Very few people find linear algebra, statistics and computing to ALL be easy subjects
    2. There's a reason people who understand this material get paid $200k+ salaries at FAANG
  2. The course is low-stress because you have full control over what your grade will be:

    1. You will grade all homeworks yourself

      1. I will spot check your homeworks
      2. If you want detailed feedback, ask and I will provide it
      3. You should know when a proof/coding assignment is right/wrong
    2. The project:

      1. To get an A, you must somehow advance the state of human knowledge
      2. May work individually or in a small team
      3. Options:
        1. Write an analysis of 2-3 research papers
        2. Perform an interesting experiment
      4. Publish your writeup online
        1. Your grade is determined based on how many people read/share your writeup
        2. This will be part of your "portfolio"
        3. No one cares about your grades

Late Work Policy:

You lose 20% on the assignment for each week late.

Collaboration Policy:

There are no restrictions on collaboration in this class, and collaboration is highly encouraged.

WARNING: All material in this class is cumulative. If you work "too closely" with another student on an assignment, you won't understand how to complete subsequent assignments, and you will quickly fall behind. You should view collaboration as a way to improve your understanding, not as a way to do less work.

You are ultimately responsible for ensuring you learn the material!

Schedule

Week Date Topic
1 Mon, Aug 24 Course intro
1 Wed, Aug 26 Computational Linear Algebra
2 Mon, Aug 31 Pagerank
2 Wed, Sep 2 Pagerank
3 Mon, Sep 7 Statistical Learning Theory
3 Wed, Sep 9 Statistical Learning Theory
4 Mon, Sep 14 Statistical Learning Theory
4 Wed, Sep 16 Statistical Learning Theory
5 Mon, Sep 21 Logistic Regression
5 Wed, Sep 23 Logistic Regression
6 Mon, Sep 28 Kernels / neural networks / k-nearest neighbor / decision trees
6 Wed, Sep 30 Kernels / neural networks / k-nearest neighbor / decision trees
7 Mon, Oct 5 Stochastic gradient descent
7 Wed, Oct 7 Stochastic gradient descent
8 Mon, Oct 12 Regularization
8 Wed, Oct 14 Regularization
9 Mon, Oct 19 Hashing trick / random projections
9 Wed, Oct 21 Hashing trick / random projections
10 Mon, Oct 26 Word2Vec
10 Wed, Oct 28 Word2Vec
11 Mon, Nov 2 Word2Vec: FastText
11 Wed, Nov 4 Word2Vec: translation
12 Mon, Nov 9 Word2Vec: bias
12 Wed, Nov 11 Word2Vec: history
13 Mon, Nov 16 Other Applications
13 Wed, Nov 18 Other Applications
14 Mon, Nov 23 Other Applications

Accommodations for Disabilities

I've tried to design the course to be as accessible as possible for people with disabilities. If you need any further accommodations, please ask.

I want you to succeed and I'll make every effort to ensure that you can.

cmc-csci145-math166's People

Contributors

mikeizbicki avatar elizabethsong avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.