Git Product home page Git Product logo

data-centric-deep-learning's Introduction

Welcome to Data-Centric Deep Learning

Data-Centric Deep Learning (DCDL) is a four week class taught by Andrew Maas and Mike Wu on the Uplimit platform. This repository contains the open-sourced project material.

Course Description

Build, improve, and repair deep learning applications with a data-centric approach. Data is the key to success in modern machine learning, and this course provides hands-on experience with the impact of data quality, improving models via data, realistic performance evaluation, and human-in-the-loop data improvement methods. Learn best practices for achieving production-quality deep learning results, and how new technologies like pre-trained foundation models can make development faster and simpler. Understand how data-centric principles apply when developing LLM-based applications, agents, and retrieval-augmented generation (RAG) systems.

Instructor's Notes

The course is focused on a practical introduction to deep learning engineering and operations, with an emphasis on algorithmic challenges that practitioners face in the real world. To be "data-centric" means leveraging methods and tools that use data to improve, repair, and test deep learning models.

Students will walk through each step of a deep model's lifecycle, from annotation to training to testing to deployment to monitoring back to annotation. In each step, students will be introduced to new tools as well as the underlying methodology.

This class is an extremely hands-on project-driven course. Students will work with real data across images, speech audio, and natural language. Students will leverage state-of-the-art methods to achieve high performance, as well as break these models to analyze their shortcomings in practice.

In July '24, we have updated this course in light of recent advancements with large language models and the new ecosystem of data centric problems this new class of models present.

Class layout

This course will have four weekly projects. Each project will build on concepts from the prior week but have its own standalone components.

  • Week 1 will be completely in a colab notebook, so no code in this repository will be used. - Week 2 through 4 will each have their own folders in course/.
  • In each week's folder, you will at least one subfolder. Each subfolder is a project component. The weekly course page on Uplimit will guide you through the different subfolders.

Prerequisites

We expect students to be proficient in Python programming, and familiar with deep learning languages like PyTorch or Tensorflow. Students should have a basic understanding of machine learning and deep learning concepts. Optional knowledge of web applications may be beneficial.

Setup

These projects are best done through Github codespaces.

  1. Fork this repo (leave box checked for "Copy the main branch only")
  2. We recommend using Github Codespaces. If you click the green "Code" button at the top right of this page, you should be able to enter a dev environment for completing this assignment.

data-centric-deep-learning's People

Contributors

mhw32 avatar willgannon avatar blowoffvalve avatar kevinsbarnard avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.