The dataset used contains various features such as age, sex, smoking habits, region etc. of individuals along with their respective insurance charges. The model applies multiple linear regression to predict the insurance charges and the accuracy is measured with the R2 score. Before starting with the model, we performed exploratory data analysis in Tableau to check for correlations.
As we can see, charges increase with age and hence there is a positive correlation.
We performed similar analysis with region,sex and smoking as well to find:
Thus, we concluded that region and sex did not have much effect on the deviation of charges. Smoking was one of the leading factors in explaining the deviation in charges.
The model was based on the features: age, children, bmi and smoking. R2 value was found to be 0.78