- Jupyter notebook performing descriptive statistics & tested with nbval plugin
- Python script for statistics and generating one data visualization
- Shared code in library file
- Summary pdf or markdown file
- Makefile that installs required packages, formats with Black, lints with Ruff, and tests notebook, script, and library
- requirements.txt
- Testing files for library and script
- Successful CI/CD badges for each step of the workflow
The dataset used in this project is a synthetic, free dataset from Kaggle called Student Performance Factors. It contains various columns that could potentially impact student performance on exams, such as hours studied, hours slept, class attendance, tutoring sessions, and family income. The full list of columns can be viewed at the link above.
-
Prepare the necesary configuration files like the Dockerfile, devcontainer.json, Makefile, requirements.txt, and main.yml for GitHub Actions integration. Ensure that the requirements.txt lists all necessary packages (for example, matplotlib for visualizing and pandas for data manipulation) and pins to a specific version of those packages.
-
Create a library.py script with functions that will be shared across the Jupyter notebook and main.py file--
- load_data()
- generate_summary_stats()
- grab_max()
- grab_median()
- grab_min()
- generate_study_hours_viz()
- generate_sleep_viz()
-
Create a main.py script with two functions--
- summarize(): Using the Student Performance csv and summary statistics functions from library.py, this function produces summary statistics (mean, median, mode, standard deviation, percentiles, max, and min) for each column of the dataframe.
- create_visualizations(): generates scatterplot and histogram visualizations of the csv data using the respective functions from library.py.
-
Create test_main.py and test_lib.py scripts to test both files.
-
Create a Jupyter Notebook with the same code as the main.py script to easily show the outputs of the descriptive statistics and data visualization.
-
Using yml files, set up a GitHub Actions workflows so that every time changes are pushed to the repository, all of the Makefile commands are run to ensure that new code is properly formatted using Black, linted using Ruff, and tested using Pytest.
The outputs of the Jupyter notebook (tested with the nbval plugin) are captured in this pdf file.