Data loss & missing marks cases have over the past decade skyrocketed in most public and private institutions in Kenya. Owing to the massive migration by these institutions from manual file systems to hybrid and digital systems, proper & effective data management is a problem most of these institutions are yet to address.
Respondents from ten top institutions of Higher learning contributed to this study by providing information on data storage, management & loss in their respective institutions.
The project focuses on the two major Information management Systems used by most institutions in Kenya i.e Digital systems - where data is stored and transferred digitally and Hybrid system - where data is storage and management involves both manual and digital effort.
The study analyses the most likely circumstances under which marks are likely to go missing in both hybrid & digital systems based on different Hypothesis.
- Hybrid systems are more likely to result in data loss compared to digital systems.
- Human errors are the main cause of data loss & missing marks in Hybrid sytems.
- Lack of robustness is the most likely cause of data loss & missing marks in digital systems.
- Lack of proper expertise in handling Information Management Systems (IMS) is the most likely cause of data loss & missing marks in both Hybrid and Digital systems.
- Outdated information management systems are a probable cause of data loss and missing marks in Hybrid & digital systems.
From the availed data, the study tries to predict which of the two systems - under the circumstances described by the availed data, is likely to lose student data.
The study uses a number of Machine Learning models to try and make the most optimal prediction. These included; Linear Regression, XGBoost, Support Vector Machine, Random Forest Regressors, Random Forest Classifiers and Artificail Neural Networks.
From the models employed in this study, the Random Forest Classifier and Support Vector Machine were chosen as the best performing models and were further tuned to improve the accuracy. The objective thesis for the study were quantified by a statistical analysis of samples from the available data except for 'Outdated information management systems are a probable cause of data loss and missing marks in Hybrid & digital systems', which was deemed by the study to not necessarily be true.
For further study and analysis, this study recommends the following;
- Gender demographics - that future studies include the respondents gender as a feature to ascertain if missing marks problems are gender-based in any way.
- Departmental Demographics - that future studies focus on particular departments in the respective institutions under study as different departments and faculties handle cases of data loss differently.
- Staff & non-student respondents - that to ascertain there being no bias in the availed data, future studies should invlove non-student respondents are this study mainly focused on student respondents.