In this project, I'm building a internal reporting tool that will use information from a newspaper database to discover what kind of articles the site's readers like.
The reporting tool will be a python script program that uses psycopg2 module to connect to the database and to query the database to answer the following questions:
1.What are the most popular three articles of all time?
2.Who are the most popular article authors of all time?
3.On which days did more than 1% of requests lead to errors?
- _First you need to download python 2 or python 3
- Second, it's adviced that you use a virtual machine. So Please download Vagrant and VirtualBox to manage your Virtual machine.
- Download Udacity's preconfigured vagrant file from here
- To bring up your virtual machine use
vagrant up
andvagrantt ssh
to log in from git bash in udacity's folder directory. - Download Udacity's news database from here .
- Use
cd /vagrant
to access your shared files. - Use this command line to connect to the database and run the SQL statements in the file newsdata.sql
psql -d news -f newsdata.sql
. - Create the Views below.
- Exit psql.
- Execute the python file using the command
python log_analysis.py
.
CREATE VIEW art_authors AS
SELECT title, name
FROM articles, authors
WHERE articles.author = authors.id;
CREATE VIEW art_views AS
SELECT title, count(log.id) as views
FROM articles, log
WHERE log.path = CONCAT('/article/', articles.slug)
GROUP BY articles.title
ORDER BY views desc;
CREATE VIEW logs AS
SELECT to_char(time,'DD-MON-YYYY') as Date, count(*) as log_c
FROM log
GROUP BY Date;
CREATE VIEW err_logs AS
SELECT to_char(time,'DD-MON-YYYY') as Date, count(*) as error_c
FROM log
WHERE STATUS = '404 NOT FOUND'
GROUP BY Date;