Skip to content

claydoers/Instacart-data-pipeline-analysis-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

Instacart Market Basket Analysis| Snowflake & Tableau

Overview

The primary goal of this project was to build a data warehouse using Snowflake that can be utilized for a variety of different purposes.

Technology

  • Python
  • Snowflake
  • Amazon S3
  • Tableau
  • Architecture

    image

    Process

    1. Upload raw data to S3 bucket
    2. Create database and Schema in Snowflake
    3. Configure access so that Snowflake can connect to AWS
    4. Create tables for each of the respective Instacart raw csv files
    5. Write automated script to copy raw data from S3 into each of the tables
    6. Identify different data types and characteristics to determine which fact and dimension tables we will need to create
    7. Use Python/Pandas to create data frames to create fact and dimension tables in Snowflake and define primary and foreign keys to create key relationships among the different tables

    Dataset

    This dataset is a relational set of files describing customers' orders over time. The dataset is anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users. For each user, we provide between 4 and 100 of their orders, with the sequence of products purchased in each order. We also provide the week and hour of day the order was placed, and a relative measure of time between orders.

    https://www.kaggle.com/c/instacart-market-basket-analysis

    Data Model

    image

    Simple Tableau Dashboard Example

    image

    About

    No description, website, or topics provided.

    Resources

    Stars

    Watchers

    Forks

    Releases

    No releases published

    Packages

    No packages published