Git Product home page Git Product logo

ipp-datacamp's Introduction

X-Datascience Datacamp

Datacamp class for master student - 5 days

The aim of this course is to learn data science by doing. All aspects of completing a data science pipeline will be covered, from exploratory data analysis (EDA), feature engineering, parameter optimization to advanced learning algorithms. You will also need to setup your own challenge!

Grade is a mix of your performance on the data challenge offered to the class as well as the challenge you will setup.

Each day you will have 50% of lectures and 50% of work on the competitive challenge using the RAMP website.

The slides used in some of the lectures are available here.

Instructors:

Location

The course will be during the week from Dec 18 to Dec 22 in person.

To join the discord channel use this URL.

On GitHub you have some of the teaching materials at: https://github.com/x-datascience-datacamp

You must have a GitHub account to complete the course.

Setup:

We will be using many Python packages in this course such as pandas, sklearn, and matplotlib, and they can all be downloaded and installed using a package-management system. We recommend you to use mamba but you will be fine if you already have conda installed in your computer.

NB: Windows users should be sure to closely follow the instructions for installing mamba and conda, since many common problems come from not having properly setup the PATH variable for the system.

Day 1: Data wrangling

  • Introduction to the workflow (VSCode, git, github, tests, ...)
  • Advanced course on Pandas
  • Github assignments: numpy and pandas

Day 2: ML Pipelines and model evaluation

  • Advanced scikit-learn: Column transformer and pipelines
  • Parallel processing with joblib
  • Generalization and Cross Validation
  • Assignment sklearn
  • Getting started on RAMP & Introduction to the challenges.

Day 3: Metrics and dealing with unbalanced data

  • Presentation of the different ML metrics
  • Problem of the metric with imbalanced data
  • ML approaches to deal with imbalanced data
  • Working on data challenges

Day 4: Feature engineering and model inspection

  • Feature engineering and advanced encoding of categorical features
  • Model inspection: Partial dependence plots, Feature importance
  • Working on data challenges

Day 5: Ensemble methods and hyperparameter optimization

  • From trees to gradient boosting
  • Profiling with snakeviz
  • Hyperparameter optimization
  • Working on data challenges

Institutional information

This class is teached in the context of the Master Data Science at Institut Polytechnique de Paris.
It receives support from Hi!Paris and DataIA.

ipp-datacamp's People

Contributors

tommoral avatar agramfort avatar mariusaaros avatar plcrodrigues avatar gphilippee avatar bourhano avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.