Skip to content

wessware/missing_marks_prediction_analysis

Repository files navigation

An Analysis & Prediction of Missing Marks

Problem Statement

Data loss & missing marks cases have over the past decade skyrocketed in most public and private institutions in Kenya. Owing to the massive migration by these institutions from manual file systems to hybrid and digital systems, proper & effective data management is a problem most of these institutions are yet to address.

Data

Respondents from ten top institutions of Higher learning contributed to this study by providing information on data storage, management & loss in their respective institutions.

Project Scope

The project focuses on the two major Information management Systems used by most institutions in Kenya i.e Digital systems - where data is stored and transferred digitally and Hybrid system - where data is storage and management involves both manual and digital effort.

The analysis

The study analyses the most likely circumstances under which marks are likely to go missing in both hybrid & digital systems based on different Hypothesis.

Hypothesis

  1. Hybrid systems are more likely to result in data loss compared to digital systems.
  2. Human errors are the main cause of data loss & missing marks in Hybrid sytems.
  3. Lack of robustness is the most likely cause of data loss & missing marks in digital systems.
  4. Lack of proper expertise in handling Information Management Systems (IMS) is the most likely cause of data loss & missing marks in both Hybrid and Digital systems.
  5. Outdated information management systems are a probable cause of data loss and missing marks in Hybrid & digital systems.

Prediction

From the availed data, the study tries to predict which of the two systems - under the circumstances described by the availed data, is likely to lose student data.

Modelling

The study uses a number of Machine Learning models to try and make the most optimal prediction. These included; Linear Regression, XGBoost, Support Vector Machine, Random Forest Regressors, Random Forest Classifiers and Artificail Neural Networks.

Outcomes

From the models employed in this study, the Random Forest Classifier and Support Vector Machine were chosen as the best performing models and were further tuned to improve the accuracy. The objective thesis for the study were quantified by a statistical analysis of samples from the available data except for 'Outdated information management systems are a probable cause of data loss and missing marks in Hybrid & digital systems', which was deemed by the study to not necessarily be true.

Further Study

For further study and analysis, this study recommends the following;

  1. Gender demographics - that future studies include the respondents gender as a feature to ascertain if missing marks problems are gender-based in any way.
  2. Departmental Demographics - that future studies focus on particular departments in the respective institutions under study as different departments and faculties handle cases of data loss differently.
  3. Staff & non-student respondents - that to ascertain there being no bias in the availed data, future studies should invlove non-student respondents are this study mainly focused on student respondents.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published