Temporal and Between-Group Variability in College Dropout Prediction

This analysis corresponds to the full paper with the above title at Learning Analytics & Knowledge 2024. The project aims to investigate college dropout prediction from administrative data. The supplementary material is located here.

Prerequisites

To reproduce the preprocessing and the analyses, you must have R installed. Most scripts will automatically install the required packages. You should always set the working directory to the project's main repository. You also need to request access to the data, which we unfortunately cannot share with persons other than research group members of the UCI Measuring Undergraduate Success Trajectories project.

Project structure

Data preprocessing

For the preprocessing, you need the following:

Locate the CSV files on your machine containing the student background, term, and course data for the students.
Create a copy of config.R.example named config.R and change the variables to match your local directory structure. In case of doubt, check read_data.R for the collection of functions that accesses datasets and contains the required filenames.

Execute the entire preprocessing by executing pipeline.R. The file invokes the scripts in the directory preprocessing, which may be executed individually. They may depend on intermediate results from previous scripts.

Descriptive analyses

The directory analyses contains scripts to create various descriptive analyses.

Modeling

You can use the file install_keras_tensorflow.R to set up the required Python environment with reticulate. The directory models contains scripts that model the statistical structure of student dropout. First, you have to run impute_data.R to predict complete observations for all different observation periods. Then, retrain the models by executing the R scripts that correspond to the research questions. The files with the suffix results contain the code to create the results tables based on the model evaluations. The remaining files are auxiliary scripts or initial isolated tests that we left inside the repository for reference.

Miscellaneous

The folder data contains mappings and dataset information for further analyses. The folder documentation contains plots in the paper and additional plots for reference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal and Between-Group Variability in College Dropout Prediction

Prerequisites

Project structure

Data preprocessing

Descriptive analyses

Modeling

Miscellaneous

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
analyses		analyses
data		data
documentation		documentation
models		models
preprocessing		preprocessing
.gitignore		.gitignore
EWS.Rproj		EWS.Rproj
LICENSE		LICENSE
README.md		README.md
Supplementary_Material.pdf		Supplementary_Material.pdf
config.R.example		config.R.example
inspect_student.R		inspect_student.R
install_keras_tensorflow.R		install_keras_tensorflow.R
pipeline.R		pipeline.R
read_data.R		read_data.R

License

dominikglandorf/college-dropout-prediction

Folders and files

Latest commit

History

Repository files navigation

Temporal and Between-Group Variability in College Dropout Prediction

Prerequisites

Project structure

Data preprocessing

Descriptive analyses

Modeling

Miscellaneous

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages