This repository holds all the code MARC uses to keep their COVID-19 Data Hub up to date for the Kansas City and Northern and Southern Health Care Coalition Regions. It exposes the entire process for downloading the most up-to-date datasets from our MARC Data API and transforming it to the forms needed to feed the COVID-19 Data Hub.
These files are updated as part of our process for pushing updated data to our publication server so they should match the data in the MARC Data API.
- Case, Death, and Test Data:
https://gis2.marc2.org/MARCDataAPI/api/covidcasedeathtest- This is the time series of the back-updated Case, Death, Test data that MARC uses
- Newly Reported Case, Death, and Test Data:
https://gis2.marc2.org/MARCDataAPI/api/covidcasedeathtestnewlyreported- This is the time series of Newly Reported Case, Death, Test data that MARC uses to provide an estimate of how many Cases, Deaths, and Tests were reported in the last 24 hours similar to how the media reports these values. This data is not back-updated.
- Hospital Data:
https://gis2.marc2.org/MARCDataAPI/api/covidhospital- This is the time series of the back-updated Hospital data that MARC uses
- Vaccination Data (Local/State Sources):
https://gis2.marc2.org/MARCDataAPI/api/covidvaccination- This is the time series of the vaccination data that MARC uses
- Vaccination Data (CDC Source):
https://gis2.marc2.org/MARCDataAPI/api/covidvaccinationcdc- This is the time series of the vaccination data that MARC uses
- Note The LastUpdated columns are in UTC time if downloading
directly from the API. The time conversion to
‘America/Chicago’/‘Central’ is implemented when downloading from the
R package with the function
downloadMARCCovidData()
This repository is actually an R package made for this purpose. MARC uses this package to make sure the data being displayed through Power BI on the COVID-19 Data Hub stays up to date.
Programs to install:
- R (From CRAN; latest version at time of writing was 4.0.3)
- RStudio (Helpful R IDE)
Once these programs are installed. Open up RStudio and install the {remotes} package by running:
install.packages('remotes')
Then install the Covid19MARCData package from this repository with:
remotes::install_github('MARC-KC/Covid19MARCData')
This will launch a process that may install a bunch of packages that the Covid19MARCData package is dependent upon. Once completed you should be able to load the new package with:
library(Covid19MARCData)
This will attach the package to your environment and allow you to call the package functions. See the section Using the R package Covid19MARCData for more information.
This is similar to calling the API directly, except that it loads the
resulting table into the R session and does the conversion of the
LastUpdated columns from UTC to Central time.
You can download all three datasets using the same function with a
different type
argument:
#Case, Death, and Test Data
cdtData <- downloadMARCCovidData(dataset = "CDT")
#Newly Reported Case, Death, and Test Data
cdtNRData <- downloadMARCCovidData(dataset = "CDT_NewlyReported")
#Hospital Data
hospData <- downloadMARCCovidData(dataset = "Hospital")
#State/Local Vaccination Data
vaccData <- downloadMARCCovidData(dataset = "Vaccination")
#CDC Vaccination Data
vaccCDCData <- downloadMARCCovidData(dataset = "VaccinationCDC")
Or you can download all three of these datasets with a single command as a list of data.frames:
downloadAllCovidAPIData()
Or you can download all three datasets and create the base derived datasets with:
getBaseCovidData()
There are two main products that MARC produces using this data.These
include both the Kansas City Region and HCC Northern and Southern
Regional COVID-19 Data Hubs and the Weekly Data Snapshots. The returns
from createBiDatasets_Hub()
and createBiDatasets_WDS()
will be a
list of the created data.frames.
All of the datasets used to produce the figures for the COVID-19 Data Hubs can be created using the following function:
createBiDatasets_Hub()
All of the datasets used to produce the figures and summaries for the Weekly Data Snapshot can be created using the following function:
createBiDatasets_WDS()
In order to keep our products up to date, we run both of these functions during our morning and nightly updates is a pattern like:
#Download in the most recent data from the API
apiData <- Covid19MARCData::downloadAllCovidAPIData()
#Create the base datasets
baseData <- getBaseCovidData(apiData)
#Create the datasets needed for the COVID-19 Hubs
dfListHub <- createBiDatasets_Hub(baseDataList = baseData, lagDaysCDT = 10, lagDaysHosp = 2)
#Create the datasets needed for the Weekly Data Snapshot
dfListWDS <- createBiDatasets_WDS(baseDataList = baseData, cutoffDay = 'Sunday', lagDaysCDT = 10, lagDaysHosp = 2)
#Output the data as CSV's for consumption by Power Bi
names(apiData) <- glue::glue("bi_base_{names(apiData)}")
list2CSV(c(apiData, dfListHub, dfListWDS))