The primary goal of this project was to build a data warehouse using Snowflake that can be utilized for a variety of different purposes.
- Upload raw data to S3 bucket
- Create database and Schema in Snowflake
- Configure access so that Snowflake can connect to AWS
- Create tables for each of the respective Instacart raw csv files
- Write automated script to copy raw data from S3 into each of the tables
- Identify different data types and characteristics to determine which fact and dimension tables we will need to create
- Use Python/Pandas to create data frames to create fact and dimension tables in Snowflake and define primary and foreign keys to create key relationships among the different tables
This dataset is a relational set of files describing customers' orders over time. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, we provide between 4 and 100 of their orders, with the sequence of products purchased in each order. We also provide the week and hour of day the order was placed, and a relative measure of time between orders.
https://www.kaggle.com/c/instacart-market-basket-analysis