-
Notifications
You must be signed in to change notification settings - Fork 8
Lesson_3: Intro to Databases
We'll be playing with the live database available here:
http://www.w3schools.com/sql/trysql.asp?filename=trysql_select_all
Let's walk through a few examples:
-
Retrieve all Customers from Madrid
SELECT * FROM Customers WHERE City='Madrid'
-
What is the most common city for customers?
SELECT City, COUNT() FROM Customers GROUP BY City ORDER BY COUNT() DESC
-
What category has the most products?
SELECT CategoryName, COUNT() FROM Categories JOIN Products on (Categories.CategoryID = Products.CategoryID) GROUP BY CategoryName ORDER BY COUNT() DESC
- What customers are from the UK
- What is the name of the customer who has the most orders?
- What supplier has the highest average product price?
- What category has the most orders?
- What employee made the most sales (by number of sales)?
- What employee made the most sales (by value of sales)?
- What Employees have BS degrees? (Hint: Look at LIKE operator)
- What supplier has the highest average product price assuming they have at least 2 products (Hint: Look at the HAVING operator)
Submit these SQL queries as a .sql file to Dropbox, using SQL comments to have the question referring to each:
-- What customers are from the UK?
SELECT * FROM Customers WHERE Country = 'UK'
-- What is the name of the customer who has the most orders?
First, install these new packages:
sudo easy_install pip
pip install psycopg2
In your Dropbox under the student directory, you will find the file sql.py. Copy this to your own directory. This is a special helper package that will make it easy to read and write SQL from Python.
import sql
# Read a database into Pandas
q = 'select * from nytimes'
df = sql.read_db(q, database_url)
# Create a database datable from Pandas
# Use head() to select the first 5 rows so that don't create too many copies of the data
df.head().to_db('nytimes_copy', database_url)
Let's do the same assignment that we did last class, but this time we'll use SQL. Find the click through rate per each age, gender, and signed_in combination (remember that CTR is calculated as clicks/impressions).
- Bonus - Read the Baseball salaries dataset (input/Salaries.csv) into Python and insert it into the database. Use Pandas / SQL to explore the dataset and to find the highest and second highest paid members of each team.