R-project
SanFrancisco Employee data Dataset obtained from Kaggle (https://www.kaggle.com/san-francisco/sf-employee-compensation)
Our application is based on San Francisco Employee compensation data which describes the various features related to employee department, organization, job profile, salary and benefits. In a corporate structure, employees are the integral part of the organization. No matter your company size, your people are your most important asset. They are the backbone of your business. So, one of the most important aspects of running your business is keeping your employees happy by offering them high-quality employee benefits and compensation. Employee benefits refer to all non-wage compensation or bonus provided to employees in addition to their salaries. The type of benefits your company decides to offer will vary based on the organization and job profile. But, sometimes many companies don’t realize how much time and money ineffective HR processes are costing them. Providing benefits to those type of job profile which have a very low productivity has often come into wrong consideration. So, there must be some solution in which company can know in advance about the compensation structure based on job profile and organization. This provided us the opportunity to develop model which can predict compensation and benefits based on different factors. Employers can use this model to imbibe some knowledge regarding the compensation factors and employees can use it to decide which job profiles are receiving maximum benefits
Since our database contains 0.8 million records, we will use Hold-out evaluation for our model. We will split the data into 80-20 ratio i.e training set (80%) and test set (20%) and will start building the model using training data.
We will build different models and evaluate them using coefficient of determination on training dataset to check the accuracy of model. We will then apply the prediction methods on test dataset and evaluate the results by comparing all the models and choose the best out of them. Methods used in predicting variables:
We have used Multiple linear regression to predict the compensation and benefits given to the employee based on salary, organizationand job profile using R Statistical tool.
We have used Multiple linear regression to predict the total salary given to the employee based on organization and job profile and other factors using R Statistical tool.
We have used ANOVA to compare average salaries of different employees based on job profiles and organization