Skip to content

nogibjj/ag825_matrix_build

Repository files navigation

Mini Project: [Matrix Build and Testing] Pandas Descriptive Statistics

Adil Keku Gazder
ag825, adil.gazder@duke.edu
IDS 706: Data Engineering Systems
Duke University, Fall 2024

About the project

CI
This builds on a previous project but implements matrix testing for various versions of python (3.7, 3.8, 3.9, 3.11)

The aim with this project was to read a .csv file and generate summary statistics and plots describing the data. The dataset used for this project was acquired from Kaggle (Olympic Summer Games - Paris 2024 -> medallists.csv)

Link to the dataset: (https://www.kaggle.com/datasets/muhammadehsan02/olympic-summer-games-paris-2024?select=medallists.csv)

Repository Structure

The structure of this file is as follows:

  • .gitignore file
  • .github/workflows file
    • Used to define an automated process which will run the pipeline before publishing
    • Will be defined using a YAML file
  • Makefile
    • Compilation and maintainence of code
    • Helps manage dependinces
    • Install / Format / Lint / Test
  • Requirements file
    • Text file (.txt) detailing the required packages to be installed for this program to run
  • main.py
    • aboutdata(): Creates descriptive statistics about the data
    • createplots(): Creates two plots describing the dataset
    • createsummary(): Creates a summary of this analysis and writes to a markdown file (summary_report.md)
  • testmain.py
    • Tests that the functions aboutdata() and createplots() in main.py work as expected
  • medallists.csv
    • Source data in .csv format
  • summary_report.md
    • Final output in a markdown file

Expected Output

This file generates the following on execution:

  • Head of the data (top 5 rows of the entire dataset)
  • Descriptive statistics about the dataset
    • Count
    • Mean
    • Standard Deviation
    • Minimum value
    • 25th percentile
    • 50th percentile
    • 75th percentile
    • Maximum value
  • Bar graph detailing the number of total medals won per country
  • Line chart detailing the total medals won per day for each day of the athletics

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published