This project sets up a PostgreSQL database for a news website. Logs Analysis is a command line tool that will use information from the database to discover what kind of articles the site's readers like.
The provided Python script main.py uses the psycopg2 library to query the database and produce a report that answers the following questions:
- What are the most popular three articles of all time?
- Who are the most popular article authors of all time?
- On which days did more than 1% of requests lead to errors?
- Install VirtulBox
- Install Vagrant
- Download VM File (VM includes Python 2.7.12, PostgreSQL), then to setup VM:
cd /vagrant vagrant up vagrant ssh
- Import news database to PostgreSQL
- Download database script
- Unzip it and run:
psql -d news -f newsdata.sql
to import data to news database
The database contains newspaper articles, as well as the web server log for the site. The log has a database row for each time a reader loaded a web page.
News database consists from three tables:
- articles table:
Table "public.articles"
Column | Type | Modifiers
--------+--------------------------+-------------------------------------------------------
author | integer | not null
title | text | not null
slug | text | not null
lead | text |
body | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('articles_id_seq'::regclass)
Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
"articles_slug_key" UNIQUE CONSTRAINT, btree (slug)
Foreign-key constraints:
"articles_author_fkey" FOREIGN KEY (author) REFERENCES authors(id)
- authors table:
Table "public.authors"
Column | Type | Modifiers
--------+---------+------------------------------------------------------
name | text | not null
bio | text |
id | integer | not null default nextval('authors_id_seq'::regclass)
Indexes:
"authors_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "articles" CONSTRAINT "articles_author_fkey" FOREIGN KEY (author) REFERENCES authors(id)
- log table:
Table "public.log"
Column | Type | Modifiers
--------+--------------------------+--------------------------------------------------
path | text |
ip | inet |
method | text |
status | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('log_id_seq'::regclass)
Indexes:
"log_pkey" PRIMARY KEY, btree (id)
pip install -r requirements.txt
python src/main.py
Use question numbers to get answers or anything else to exit application.
This application can give you an answer to following question:
1: What are the most popular three articles of all time?
2: Who are the most popular article authors of all time?
3: On which days did more than 1% of requests lead to errors?
Put the question number or anything else to exit application.
Enter question number:1
Answer:
"Candidate is jerk, alleges rival" - 338647 views
"Bears love berries, alleges bear" - 253801 views
"Bad things gone, say good people" - 170098 views