To develop predictive analysis for identifying the employees most likely to get promoted based on various factors such as training performance, KPI completion etc.
Data collection and preprocessing using Pandas Exploratory data analysis(EDA) using Matplotlib and Seaborn Feature engineering Model building and evaluation using Scikit-learn
1.Data collection & cleaning: Using Pandas 'read_csv' to collect data from training and testing csv files.
2.Descriptive statistics: Using .describe() to get information about statistical measures like max, min, average etc.
3.Data exploration: Employing countplot, displot, histograms from seaborn, matplotlib libraries for various graphical insights about the datasets.
Count of employees who got promoted:
Count of employees who got promoted wrt to education:
Count of employees who got promoted wrt to age:
Count of employees who got promoted wrt to previous_year_rating:
Count of employees who got promoted wrt to age & length of service:
Scatter plot for dataset exploration:
4.Label conversion for categorical data attributes using LabelEncoder from preprocessing module:
5.Correlation: Analyzing inter-dependency between different attributes, here KPI's, award's won & avg_training_score attributes have positive correlation thus having high impact on target variable('is_promoted')
8.RandomForest: Accuracy is not a good parameter for classification models, here focus is on recall or f1-score to make it close to 1.0