Git Product home page Git Product logo

nf26's Introduction

NF26 - Datawarehouse et stockage en haute volumétrie

Credits: 6 Cours: 2h/semaines Projets: 2h/semaines Prof. Pierre Morizet-Mahoudeaux et Jean-Benoist Leger 2019

Résumé du contenu

This course aims at presenting the principles of tools development and using for data warehouse conception and decision taking with specific tools (Business Objects, regression, segmentation). The class NF26 aims at presenting the data warehouse construction principles, with an introduction to NoSQL. In this class, a first part is focused on the major concepts related to data warehouses. The students learn how to analyze “clients” needs and requests so as to conceptualize an architecture that will answer those needs in an optimal, simple and ready to use way: the data marts. In this part, the differences, pros and cons of different data warehouse models are emphasized. In order to conceptualize a data warehouse, the students learn to separate business process data into facts, holding measurable and quantitative data about a business, and dimensions which are descriptive attributes related to fact data. Examples of fact data include sales price, sales quantity, and time, distance, speed, and weight measurements. Related dimension attribute examples include product models, product colors, product sizes, geographic locations, and salesperson names. Main theory of the first part includes the following keywords: dimensions tables - facts tables - hierarchy - normalization - denormalization - star schema - snowflake schema.

In a second part of this class, students learn a new paradigm for data warehouse implementation and conceptualisation : NoSQL. Using Cassandra technologies and Python as main programming language, students implement a complete data warehouse following NoSQL and column table logic. In this project the students will take advantage of Cassandra model for presenting a dataset in a way that is interesting for analytics, data visualisation and that allows non-technical people to retrieve and get real business value on top of the row data set. Main theory of the second part groups the following keywords: Cassandra - CAP Theorem - Eventual consistency - NoSQL - Consistent Hashing - Nodes - ACID and BASE - Analytics - Kmeans.

Description des projets

● Partie 1

Technologie : Pentaho, Birt

● Partie 2 et project

Technologie : Python, Cassandra, Spark

nf26's People

Contributors

theodorebourgeon avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.