Git Product home page Git Product logo

w205-tracking-user-activity's Introduction

Project 2: Tracking User Activity

Abstract

In this project, a service that delivers assessments from an education tech firm was created. The data outcome is ready for further queries work according to customer's requirements.

Main tasks of this project

  • Publish and consume messages with Kafka
  • Use Spark to transform the messages.
  • Use Spark to transform the messages so that can be landed in HDFS

Problem Statement

The original data was acquired by

curl -L -o assessment-attempts-20180128-121051-nested.json https://goo.gl/ME6hjp`

The data is contained in the file assessment-attempts-20180128-121051-nested.json in the Data folder, which is inclued in this repository.

Questions are answered in this project:

  • How many assesstments are in the dataset?
  • What's the name of your Kafka topic? How did you come up with that name?
  • How many people took Learning Git?
  • What is the least common course taken? And the most common?
  • Add any query(ies) you think will help the data science team

Tool Used

  • Docker Images:

    • cloudera
    • kafka
    • mids
    • spark
    • zookeeper_1
      (docker configuration file docker-compose.yml is included in the Code foler)
  • Google cloud virtual machine

  • Jupyter Notebook


Project Outcome

  • Publish and consume messages with Kafka in topic assessment

  • Use Spark to transform the messages(Spark TempTable: select_assessment, forced schema: final_schema) and can be landed in HDFS

  • Question solved

    • 3242 distinct assesstments in the dataset
    • 390 people took Learning Git
    • The least common courses: Learning to Visualize Data with D3.js, Native Web Apps for Android, Nulls, Three-valued Logic and Missing Information and The Closed World Assumption
    • The most common course: Learning Git

Link to the report

Project Report
Console history

w205-tracking-user-activity's People

Contributors

penpen86 avatar haoyuzhang89 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.