Data analysis on Dublin Science Gallery risk lab exhibitions
Completed as part MSG500 linear statistical models (Chalmers University of Technology)
Files (All in python notebook format):
- Scales_converter.ipynb: Conversion for scales (BIS,dospert)
- Risk Taking predict smoking .ipynb: Predicting the number of cigarettes smoked per week from a measure of risk taking in a social context (DOSERT)
- BIS Self Control Verse Car Crashes .ipynb: Predicting the number of car crashes from a measure of self control (BIS)
- Model for Smoking.ipynb: Building a prediction model to predict if a participant in the dataset has smoked during their lifetime.
- Model for drinking.ipynb: Building a prediction model to predict the age that a participant started drinking alcohol
Datasets:
- RISKLAB_10_11_17.csv: Orginal uneditted dataset
- Risklab_2.4.csv Ready to use dataset
Predicting the number of cigarettes smoked per week from a measure of risk taking in a social context (DOSERT)
- We divided cigarettes smoked per week (cspw) into seven levels, we then compared it against social risk taking. ANOVA and Kruskal-Wallis rank sum test suggested there was no statically significant mean difference between the seven groups.
- We performed an regression holding the first level as base line, none of the levels where able to assist in the prediction of cspw.
- We compared social risk taking against recreational risk taking by cspw group level. No statically significant result.
- We grouped car crashes dichotomously (no crash, or at least one crash).
- Group 1, have a slightly lower self control mean then group 0, but neither ANOVA or Kruskal-Wallis rank sum test reported any difference in group means.
- Applied a logistical regression model to the data, residuals plots suggested a number of assumption violations. We played around with transformations but it wasn't look good.
- We concluded that it wasn't an easy prediction, and moved on.
Building a prediction model to predict if a participant in the dataset has smoked during their lifetime.
- Here we looked at building a prediction model for the variable "Ever smoked". We developed a logistical regression model with 12 other variables (Demographic,BIS,DOSERT)
- Backward selection and interaction search reduced our model down to Age ,BIS_Cog_Instability, BIS_self_Control, DOS_Fin_Investment, DOS_Fin_Gambling, DOS_Health Safety.
- Overall model stability looks acceptable.
- Evaluated the model on a test set, 99% prediction this... but this might be due to an issue with our R code. (Maybe should have tried Cross Validation)?? ## Maybe look back here.
- Had to group age started drinking into four different levels. I haven't studied multinomial logistic regression yet so we put this model to bed!