An internal reporting tool that will use information from the database to discover what kind of articles the site's readers like.
This tool will answer three question.
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
You should Install python2 to run this script
-
Set up a VirtualBox/Vagrant environment
- Install VirtualBox
- Install Vagrant
- Start the virtual machine
From your terminal, inside the vagrant subdirectory, run the command vagrant up. This will cause Vagrant to download the Linux operating system and install it.
When vagrant up is finished running, you will get your shell prompt back. At this point, you can run vagrant ssh to log in to your newly installed Linux VM!
-
[optional step] Install python module psycopg2 (IF haven't installed)
If your pip version supports wheel packages it should be possible to install a binary version of Psycopg including all the dependencies from PyPI. Just run:
$ pip install -U pip # make sure your pip is up-to-date
$ pip install psycopg2
- Download database Next, download the data here. You will need to unzip this file after downloading it. The file inside is called newsdata.sql.
To load the data, use the command psql -d news -f newsdata.sql.
-
Create View in database You shold create three view in database, log_articles, log_daily, log_error.
-
Run the script
python logs_analysis.py
Outputing a table of successful log record for every article and its author
CREATE VIEW log_articles AS
SELECT title, author, count(*) AS views FROM articles
LEFT JOIN log
ON '/article/' || slug = path
WHERE status = '200 OK'
GROUP BY title, author;
Example:
title | author | views |
---|---|---|
Trouble for troubled troublemakers | 2 | 84810 |
There are a lot of bears | 1 | 84504 |
Candidate is jerk, alleges rival | 2 | 338647 |
Bad things gone, say good people | 3 | 170098 |
Bears love berries, alleges bear | 1 | 253801 |
Media obsessed with bears | 1 | 84383 |
Balloon goons doomed | 4 | 84557 |
Goats eat Google's lawn | 1 | 84906 |
Outputing a table of all requests for each day.
CREATE VIEW log_daily AS
SELECT DATE(time), COUNT(*) AS count_total
FROM log
GROUP BY DATE(time);
Example:
date | count_total |
---|---|
2016-07-01 | 38705 |
2016-07-02 | 55200 |
2016-07-03 | 54866 |
2016-07-04 | 54903 |
Outputing a table of error requests for each day
CREATE VIEW log_error AS
SELECT DATE(time), COUNT(*) AS count_error
FROM log
WHERE status LIKE '404%'
GROUP BY DATE(time);
Example:
date | count_error |
---|---|
2016-07-31 | 329 |
2016-07-06 | 420 |
2016-07-17 | 1265 |
2016-07-19 | 433 |