Skip to content

Latest commit

 

History

History
99 lines (78 loc) · 3.42 KB

D.Data Cleaning and Transformation.md

File metadata and controls

99 lines (78 loc) · 3.42 KB

🍕 Case Study #2 - Pizza Runner

🧼 Data Cleaning & Transformation

🔨 Table: customer_orders

Looking at the customer_orders table below, we can see that there are

  • In the exclusions column, there are missing/ blank spaces ' ' and null values.
  • In the extras column, there are missing/ blank spaces ' ' and null values.

image

Our course of action to clean the table:

  • Create a temporary table with all the columns
  • Remove null values in exlusions and extras columns and replace with blank space ' '.
CREATE TEMP TABLE customer_orders_temp AS
SELECT 
  order_id, 
  customer_id, 
  pizza_id, 
  CASE
	  WHEN exclusions IS null OR exclusions LIKE 'null' THEN ' '
	  ELSE exclusions
	  END AS exclusions,
  CASE
	  WHEN extras IS NULL or extras LIKE 'null' THEN ' '
	  ELSE extras
	  END AS extras,
	order_time
FROM pizza_runner.customer_orders;

This is how the clean customers_orders_temp table looks like and we will use this table to run all our queries.

image


🔨 Table: runner_orders

Looking at the runner_orders table below, we can see that there are

  • In the exclusions column, there are missing/ blank spaces ' ' and null values.
  • In the extras column, there are missing/ blank spaces ' ' and null values

image

Our course of action to clean the table:

  • In pickup_time column, remove nulls and replace with blank space ' '.
  • In distance column, remove "km" and nulls and replace with blank space ' '.
  • In duration column, remove "minutes", "minute" and nulls and replace with blank space ' '.
  • In cancellation column, remove NULL and null and and replace with blank space ' '.
CREATE TEMP TABLE runner_orders_temp AS
SELECT 
  order_id, 
  runner_id,  
  CASE
	  WHEN pickup_time LIKE 'null' THEN ' '
	  ELSE pickup_time
	  END AS pickup_time,
  CASE
	  WHEN distance LIKE 'null' THEN ' '
	  WHEN distance LIKE '%km' THEN TRIM('km' from distance)
	  ELSE distance 
    END AS distance,
  CASE
	  WHEN duration LIKE 'null' THEN ' '
	  WHEN duration LIKE '%mins' THEN TRIM('mins' from duration)
	  WHEN duration LIKE '%minute' THEN TRIM('minute' from duration)
	  WHEN duration LIKE '%minutes' THEN TRIM('minutes' from duration)
	  ELSE duration
	  END AS duration,
  CASE
	  WHEN cancellation IS NULL or cancellation LIKE 'null' THEN ' '
	  ELSE cancellation
	  END AS cancellation
FROM pizza_runner.runner_orders;

Then, we alter the pickup_time, distance and duration columns to the correct data type.

ALTER TABLE runner_orders_temp
ALTER COLUMN pickup_time DATETIME,
ALTER COLUMN distance FLOAT,
ALTER COLUMN duration INT;

This is how the clean runner_orders_temp table looks like and we will use this table to run all our queries.

image


Click here for solution to A. Pizza Metrics!