GitHub - Mayur-vora/Dropout-rate-prediction: Dropout prediction for high school student

Drop out rate prediction for high school students

Project Overview

This project focuses on visualizing and analyzing student data to identify patterns and insights related to student enrollment, dropout rates, and graduation rates. Various data visualization techniques and machine learning models are utilized to achieve these objectives.

Data Visualization

The project starts with an exploration of the student dataset, using different visualization techniques to understand the data distribution and relationships.

Gender Distribution

sns.countplot(data=student, x='Gender')
plt.xlabel('Gender')
plt.ylabel('Number of Students')
plt.show()

The gender distribution is visualized to understand the balance between male and female students.

Nationality Distribution

sns.countplot(data=student, x='Nationality')
plt.xticks(rotation=90)
plt.xlabel('Nationality')
plt.ylabel('Number of Students')
plt.show()

The nationality distribution helps identify the most common nationalities among the students.

Displaced Students

sns.countplot(data=student, x='Displaced', hue='Target', hue_order=['Dropout', 'Enrolled', 'Graduate'])
plt.xticks(ticks=[0,1], labels=['No','Yes'])
plt.ylabel('Number of Students')
plt.show()

This plot shows the number of displaced students and their corresponding enrollment status.

International Students

sns.countplot(data=student, x='International', hue='Target', hue_order=['Dropout', 'Enrolled', 'Graduate'])
plt.xticks(ticks=[0,1], labels=['No','Yes'])
plt.ylabel('Number of Students')
plt.show()

The distribution of international students and their enrollment status is analyzed.

Modeling and Prediction

Different machine learning models are implemented to predict student outcomes based on various features. Logistic Regression

Logistic regression is used to model the probability of different outcomes.

from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, Y_train)
y_pred = log_reg.predict(X_test)

Random Forest

A random forest classifier is implemented to improve prediction accuracy.

from sklearn.ensemble import RandomForestClassifier
rf = RandomForestClassifier()
rf.fit(X_train, Y_train)
y_pred_rf = rf.predict(X_test)

Model Evaluation

The performance of different models is evaluated using metrics such as accuracy and ROC curves.

from sklearn.metrics import accuracy_score, RocCurveDisplay
accuracy = accuracy_score(Y_test, y_pred)
print("Accuracy:", accuracy)

RocCurveDisplay.from_predictions(Y_test, y_pred_rf)
plt.show()

Creating a System for Prediction

A system is created to predict student outcomes based on input data.

This system uses the trained model to make predictions based on new data inputs.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
Dropout_prediction.ipynb		Dropout_prediction.ipynb
README.md		README.md
dataset.csv		dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drop out rate prediction for high school students

Project Overview

Table of Contents

Data Visualization

Gender Distribution

Nationality Distribution

Displaced Students

International Students

Modeling and Prediction

Random Forest

Model Evaluation

Creating a System for Prediction

About

Releases

Packages

Languages

Mayur-vora/Dropout-rate-prediction

Folders and files

Latest commit

History

Repository files navigation

Drop out rate prediction for high school students

Project Overview

Table of Contents

Data Visualization

Gender Distribution

Nationality Distribution

Displaced Students

International Students

Modeling and Prediction

Random Forest

Model Evaluation

Creating a System for Prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages