Rishika Randev's Pandas Descriptive Script for IDS706 Week 3

☑️ Requirements (Individual Project 1):

Jupyter notebook performing descriptive statistics & tested with nbval plugin
Python script for statistics and generating one data visualization
Shared code in library file
Summary pdf or markdown file
Makefile that installs required packages, formats with Black, lints with Ruff, and tests notebook, script, and library
requirements.txt
Testing files for library and script
Successful CI/CD badges for each step of the workflow

☑️ Video Demo Here!

☑️ The Dataset

The dataset used in this project is a synthetic, free dataset from Kaggle called Student Performance Factors. It contains various columns that could potentially impact student performance on exams, such as hours studied, hours slept, class attendance, tutoring sessions, and family income. The full list of columns can be viewed at the link above.

☑️ Steps

Prepare the necesary configuration files like the Dockerfile, devcontainer.json, Makefile, requirements.txt, and main.yml for GitHub Actions integration. Ensure that the requirements.txt lists all necessary packages (for example, matplotlib for visualizing and pandas for data manipulation) and pins to a specific version of those packages.
Create a library.py script with functions that will be shared across the Jupyter notebook and main.py file--
- load_data()
- generate_summary_stats()
- grab_max()
- grab_median()
- grab_min()
- generate_study_hours_viz()
- generate_sleep_viz()
Create a main.py script with two functions--
- summarize(): Using the Student Performance csv and summary statistics functions from library.py, this function produces summary statistics (mean, median, mode, standard deviation, percentiles, max, and min) for each column of the dataframe.
- create_visualizations(): generates scatterplot and histogram visualizations of the csv data using the respective functions from library.py.
Create test_main.py and test_lib.py scripts to test both files.
Create a Jupyter Notebook with the same code as the main.py script to easily show the outputs of the descriptive statistics and data visualization.
Using yml files, set up a GitHub Actions workflows so that every time changes are pushed to the repository, all of the Makefile commands are run to ensure that new code is properly formatted using Black, linted using Ruff, and tested using Pytest.

make install
make format
make lint
make test

Summary File

The outputs of the Jupyter notebook (tested with the nbval plugin) are captured in this pdf file.

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
lib		lib
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
StudentPerformanceFactors.csv		StudentPerformanceFactors.csv
hours_studied_performance.png		hours_studied_performance.png
main.py		main.py
performance.png		performance.png
requirements.txt		requirements.txt
sleep_performance.png		sleep_performance.png
summary.html		summary.html
summary.ipynb		summary.ipynb
summary.pdf		summary.pdf
test_lib.py		test_lib.py
test_main.py		test_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rishika Randev's Pandas Descriptive Script for IDS706 Week 3

☑️ Requirements (Individual Project 1):

☑️ Video Demo Here!

☑️ The Dataset

☑️ Steps

Summary File

About

Releases

Packages

Languages

nogibjj/Rishika_Randev_Individual_1

Folders and files

Latest commit

History

Repository files navigation

Rishika Randev's Pandas Descriptive Script for IDS706 Week 3

☑️ Requirements (Individual Project 1):

☑️ Video Demo Here!

☑️ The Dataset

☑️ Steps

Summary File

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages