Group Member: Yuan Fangxu, Guo Yuchen, Lin Lirong, Long Yuepeng, Lei Lijun
Total vaccination data in US can be accessed from https://ourworldindata.org/covid-vaccinations#source-information-country-by-country
Data pre-processing, description, goal 1 and goal 2 can be found in python code.
use data_cleaning.py to process 2021VAERVAX.csv
, 2021VAERSYMPTOMS.csv
and 2021VEARSDATA
to get the data21.csv
.
data21.csv
is the dataset we used in the two assignments.
Use the SparsePCA1.ipynb
to do the Hospitalisation Prediction work.
Use the OnsetPrediction.ipynb
to do the Onset Time Prediction work.
Use the visual_1.ipynb
to do the Real Data Set Illness top-15 and Real data set length of stay work.
Use the visual_1.ipynb
to do the Proportion of three types of vaccination work.
- Background Lei
- Introduction Guo
- Framework Lin
- Hospital Prediction Yuan
- Onset Time Prediction Long
- Evaluation Yuan, Long
- Case Study Yuan, Long
- Related Work Lei
- Discussion Lin
- Conclusion Guo
- Acknowledgement Yuan
336485 cases Date from 2021.1.1 to 2021.4.20
- Hospitalisation Prediction Yuan
- Sparse Naive Bayes
- Sparse Principal Component Analysis(PLA)+ logistic regression
Evaluation Criteria:
- Optimal probability threshold
- AUC
- Training set sensitivity
- Training set specificity
- Validation set sensitivity
- Validation set specificity
- Onset Time Prediction Long
- Random Forest(RF)
- Regularized regression(LASSO, RIDGE)
- Artificial neural network
- OLS
Evaluation Criteria:
- TrainingMSE
- TestMSE
- Best predictors for shorter duration
- Best predictors for longer duration
- Data visualisation
- Proportion of three types of vaccination Lei
- Real Data Set Illness top-15 Guo
- Real data set length of stay Lin