Skip to content

Lesson_3: Intro to Databases

jhaski edited this page Feb 26, 2014 · 1 revision

W3 School

We'll be playing with the live database available here:

http://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all

SQL Questions

Let's walk through a few examples:

  1. Retrieve all Customers from Madrid

    SELECT * FROM Customers WHERE City='Madrid'

  2. What is the most common city for customers?

    SELECT City, COUNT() FROM Customers GROUP BY City ORDER BY COUNT() DESC

  3. What category has the most products?

    SELECT CategoryName, COUNT() FROM Categories JOIN Products on (Categories.CategoryID = Products.CategoryID) GROUP BY CategoryName ORDER BY COUNT() DESC

Classwork

  1. What customers are from the UK
  2. What is the name of the customer who has the most orders?
  3. What supplier has the highest average product price?
  4. What category has the most orders?
  5. What employee made the most sales (by number of sales)?
  6. What employee made the most sales (by value of sales)?
  7. What Employees have BS degrees? (Hint: Look at LIKE operator)
  8. What supplier has the highest average product price assuming they have at least 2 products (Hint: Look at the HAVING operator)

Submit these SQL queries as a .sql file to Dropbox, using SQL comments to have the question referring to each:

-- What customers are from the UK?
SELECT * FROM Customers WHERE Country = 'UK'

-- What is the name of the customer who has the most orders?

Reading SQL into Python / Pandas

First, install these new packages:

   sudo easy_install pip
   pip install psycopg2

In your Dropbox under the student directory, you will find the file sql.py. Copy this to your own directory. This is a special helper package that will make it easy to read and write SQL from Python.

  import sql
  
  # Read a database into Pandas
  q = 'select * from nytimes'
  df = sql.read_db(q, database_url)
  
  # Create a database datable from Pandas
  # Use head() to select the first 5 rows so that don't create too many copies of the data
  df.head().to_db('nytimes_copy', database_url)

Let's do the same assignment that we did last class, but this time we'll use SQL. Find the click through rate per each age, gender, and signed_in combination (remember that CTR is calculated as clicks/impressions).

  • Bonus - Read the Baseball salaries dataset (input/Salaries.csv) into Python and insert it into the database. Use Pandas / SQL to explore the dataset and to find the highest and second highest paid members of each team.

Resources

SQL Joins Visualized

Comparison of NoSQL systems