This project aims to introduce researchers to:
- The use of GitHub for version control, sharing, and collaboration, including collaborative code reviewing
- Our recommendations for project organisation and data documentation
- The use of R for reproducible health data science, including good coding practices, checking your data quality, and troubleshooting
This project accompanies the Aberdeen Centre for Health Data Science guidebook for open, reproducible and collaborative research, which you can find here: https://github.com/AbdnCHDS/guidebook
Before you start, make sure you create an account on GitHub, and download R, the free version of RStudio Desktop and GitHub Desktop.
We will be using a fake dataset that is based on the format of typical electronic healthcare records and is very messy, so we can practice cleaning it and preparing it for analysis and visualisation. The dataset includes hospital admissions and demographic information.
Our aim is to summarise total time in hospital in 2020 by age and deprivation, for a general audience.
Let's get started!
Step 1: Copy this project to your own account
Step 2: Create an RStudio project
Step 3: Add collaborators and tasks
Step 4: Clean and analyse the data using R