Git Product home page Git Product logo

dataeng's Introduction

Data Engineering:

Repository for the Data Engineering Course (LTAT.02.007)

Graph View

inline

Teaching Assistants:

Acknowledgments

Special Thanks to Emanuele Della Valle and Marco Brambilla from Politecnico di Milano to letting me "steal" some of their great slides.

Lectures

Date Title Material Mandatory Reads Extras
01/09 Course Intro Slides - pdf slide 45-109)
03/09 Data Modeling Slides - pdf slide 1-44 Chp 4 p111-127, Chp 5 p151-156, Chp 6 p199-205 of [3]
10/09 DM for Relational Databases Slides - pdf slide 45-109 Chp 2, 6, and 7 (Normal Forms) of [1] Relational Model
10/09 DM for Data Warehouse Slides - pdfslide 109-118 pdf video Chp 2 of [2]
17/09 DM for Big Data Slides - pdf Chp 2 of [3], video paper
17/09 Key Value Stores Slides 1,Slides 2pdf nosql
24/10 Column Oriented Databases Slides 1 Slides 2 pdf nosql
24/10 Document Databases Slides 1 Slides 2 pdf nosql
01/10 Graph Databases Slides 1 Slides 2 pdf1 pdf2 Chp 3 and 5 of [5] book
08/10 Data Ingestion Slides 1 Slide 2 Slide 3 Slide 4
15/10 Part 1 Recap Slides 1 pdf
22/10 Midterm
29/10 Data Engineering Pipelines (Part1) Slides 1 slide 2 pdf
05/11 Data Engineering Pipelines (Part2) Slides 1 Slides 2 Slides 3 Chp 10 of 3 R. Chang Pt 2 R. Chang Pt 3
12/11 Streaming Data (Part 1) Slide 1 Slide 2 Chp 11 of 3 Streaming 101 Streaming 102
19/11 Data Journey Slides
26/11 Streaming Data (Part 2) Slide 1 Slide 2
03/12 Data Wrangling (Part 1) pdf
10/12 Data Wrangling (Part 2) pdf

Practices (Videos Will be Available after Group 2 issue)

Date Title Material Reads Videos Branch Notes
07-8/09 Docker Slides - Video GP1 Video GP2 Lab Branch QA GP2 only
14-15 /09 Modeling and Querying Relational Data with Postgres Slides Chp 32 of [1]§ Video Homework 1
21-22 /09 Modeling and Querying Key Value Data with Redis Slides Video Homework 2
28-29/09 Modeling and Querying Document Data with MongoDB Slides Video Homework 3
5-6/10 Modeling and Querying Graph Data with Neo4J Slides CypherManual Video Homework 4
19-20-26-27/10 Data Ingestion with Apache Kafka Slides Video 1 Video 2 Video 3 Video 4 Homework 5
10-11/11 Apache Airflow Data Pipelines Slides Video 1 Video 2 Homework 6
16-17/11 Stream Processing with Kafka Streams Slides Video 1 Video 2 Homework 7
23-24/11 Stream Processing with KSQL Slides Video 1 Video 2 Homework 7
07-8/12 Data Cleansing Slides Video 1 Video 2 Homework8
14-15/12 Data Augmentation Slides Video1Video2 Homework8

Extras

Contributing

  • Modeling and Querying RDF data: SPARQL
  • Domain Driven Design: a summary
  • Event Sourcing: a summary
  • Data Pipelines with Luigi
  • Data Pipelines with Apachi Nifi
  • Data Processing with Apache Flink

Syllabus

  • What is (Big) Data?
  • The Role of Data Engineer
  • Data Modeling
    • Data Replication
    • Data Partitioning
    • Transactions
  • Relational Data
  • NoSQL
    • Document
    • Graph
  • Data Warehousing
    • Star and Snowflake schemas
  • Data Vault
  • (Big) Data Pipelines
    • Big Data Systems Architectures
    • ETL and Data Pipelines
      • Best Practices and Anti-Patterns
    • Batch vs Streaming Processing
  • Data Cleansing
  • Data Augumentation

Books

dataeng's People

Contributors

riccardotommasini avatar mohamedragabanas avatar toxp avatar ktark avatar maerthaekkinen avatar kpokk avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.