Git Product home page Git Product logo

logs-analysis's Introduction

Udacity Logs Analysis Project

The 3rd projects made as a part of Udacity Full Stack Web Development Nanodegree. Project represents a reporting tool that prints the result of queries in the plain text to console, answering the following 3 questions:

1. What are the most popular three articles of all time?

2. Who are the most popular article authors of all time?

3. On which days did more than 1% of requests lead to errors?

Required software

Vagrant and VirtualBox are needed to be installed before running the program to provide Linux environment. Vagrant can be downloaded from HashiCorp website, and VirtualBox from here. If you face some problems during the installation of Vagrant, you can find answers either at Stack OverFlow or in the GitHub repository of Vagrant itself.

Also, installed Python 2.7 is required, and psycopg2 library, which connects Python code with PostgreSQL DBMS.

How to start

  • After the installation, create your Virtual Machine by running in command line vagrant up
  • Enter your Virtual Machine after the previous command by vagrant ssh
  • Download the data here and unzip the file and put newsdata.sql into your vagrant directory
  • Download the data to your VM running psql -d news -f newsdata.sql
  • Connect to your DB running: psql -d news
  • Run CREATE VIEW statements presented below
  • Run the script python log_analysis.py

Newsdata Database

Database contains three tables: articles, log, authors.

In order to prevent multiple subqueries, views for the 2nd and 3rd questions were used. View for the 2nd question: The view to collect the number of views associated to author ID

CREATE VIEW author_views AS
SELECT a.author,
       COUNT(l.path) AS views
FROM articles AS a
LEFT OUTER JOIN log AS l ON '/article/'||a.slug = l.path
GROUP BY a.author
ORDER BY views DESC;

View for the 3rd question Extracts the date from timestamp and calculates the number of errors and total number of requests

CREATE VIEW error_analysis AS
SELECT log.time::TIMESTAMP::date AS date,
       COUNT(CASE
                 WHEN log.status = '404 NOT FOUND' THEN 'error'
             END) AS errors,
       COUNT(*) AS requests
FROM log
GROUP BY date;

Results of analysis

Answers for the questions are presented in the answers.txt file.

logs-analysis's People

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.