Academic Success Prediction

author: Jenson Chang, Jingyuan Wang, Catherine Meng, Siddarth Subrahmanian

Demo of a data analysis project for DSCI 522 (Data Science Workflows); a course in the Master of Data Science program at the University of British Columbia.

About

Here we attempt to build a classification model using the k-Nearest Neighbors algorithm to predict student dropout and academic success based on information available at enrollment (including academic path, demographics, and socio-economic factors). Our final classifier performed consistently on unseen test data, achieving a cross-validation training score of 0.72, with a similar test score. Although the model’s accuracy is moderate, it performs consistently. Given that the data was collected from a single institution, a larger dataset may be necessary to generalize predictions to other institutions or countries. We believe this model can be a starting point for institution to identify and support students at risk of dropout. However, the model can be developed further by combining academic data with social/economic data to improve the prediction and provide stakeholders with a more comprehensive view on the potential causes of student dropouts. We recommend this improvement because it would enable instutitions to focus their leverage their limited resources for maximum student support.

The data set is created by Mónica Vieira Martins, Jorge Machado, Luís Baptista and Valentim Realinho at the Instituto Politécnico de Portalegre (M.V.Martins, D. Tolledo, J. Machado, L. M.T. Baptista, V.Realinho. 2021). It is sourced from UC Irvine's Machine Learning Repository and can be found here. The data contains demographic, enrollment and academic (1st and 2nd semesters) information on the students. Each row in the data set represents a student record. Using these data, a model would be built to predict the academic outcome of the student. There are 36 columns in total.

Report

The final report can be found here.

Dependencies

Docker

Usage

Run Jupyter Notebook

Clone this GitHub repository
Navigate to the root of the project and run the following command with command line

docker compose up

This container will run Jupyter Notebook using the default port of 8888. Make sure no other applications are using this port.
Navigate to the root of this project on your computer using the command line and enter the following command to reset the project to a clean state (i.e., remove all files generated by previous runs of the analysis):

make clean

To run the analysis in its entirety, enter the following command in the terminal in the project root:

make all

Clean Up

Press Ctrl + C in the terminal to shut down the Jupyter Notebook.
Use the following command to remove the container.

docker compose rm

Folder Structure

data: Contains both raw and processed data
img: Contains image used in the README
report: Contains .html and .pdf versions of the final report, as well as the .qmd file used the generate the report.
results: Contains figures and models exported by the analysis scripts in scripts
scripts: Contains Python scripts used to perform data processing, analysis and model training
src: Contains source code for functions used by analysis scripts in scripts
test: Contains unit tests for functions in src

Adding a new dependency

Add the dependency to the environment.yml file on a new branch.
Run conda-lock -k explicit --file environment.yml -p linux-64 to update the conda-linux-64.lock file.
Re-build the Docker image locally to ensure it builds and runs properly.
Push the changes to GitHub. A new Docker image will be built and pushed to Docker Hub automatically. It will be tagged with the SHA for the commit that changed the file.
Update the docker-compose.yml file on your branch to use the new container image (make sure to update the tag specifically).
Send a pull request to merge the changes into the main branch.

License

The Academic Success Prediction report contained herein are licensed under the Creative Commons Attribution 2.5 Canada License (CC BY 2.5 CA). See the license file for more information. . If re-using/re-mixing please provide attribution and link to this webpage. The software code contained within this repository is licensed under the MIT license. See the license file for more information.

Reference

Bantilan, Niels. 2020. “Pandera: Statistical Data Validation of Pandas Dataframes.” In SciPy, 116–24.
Kramer, Oliver, and Oliver Kramer. 2016. “Scikit-Learn.” Machine Learning for Evolution Strategies, 45–53.
McKinney, Wes et al. 2011. “Pandas: A Foundational Python Library for Data Analysis and Statistics.” Python for High Performance and Scientific Computing 14 (9): 1–9.
Python, Why. 2021. “Python.” Python Releases for Windows 24.
Realinho, Valentim, Jorge Machado, Luı́s Baptista, and Mónica V Martins. 2022. “Predicting Student Dropout and Academic Success.” Data 7 (11): 146.
VanderPlas, Jacob, Brian Granger, Jeffrey Heer, Dominik Moritz, Kanit Wongsuphasawat, Arvind Satyanarayan, Eitan Lees, Ilia Timofeev, Ben Welsh, and Scott Sievert. 2018. “Altair: Interactive Statistical Visualizations for Python.” Journal of Open Source Software 3 (32): 1057.

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.github/workflows		.github/workflows
data		data
img		img
report		report
results		results
scripts		scripts
src		src
test		test
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
conda-linux-64.lock		conda-linux-64.lock
docker-compose.yml		docker-compose.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Academic Success Prediction

About

Report

Dependencies

Usage

Adding a new dependency

License

Reference

About

Releases 4

Packages

Contributors 4

Languages

License

UBC-MDS/academic-success-prediction

Folders and files

Latest commit

History

Repository files navigation

Academic Success Prediction

About

Report

Dependencies

Usage

Adding a new dependency

License

Reference

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 4

Languages

Packages