-
Notifications
You must be signed in to change notification settings - Fork 326
Truth Data
The COVID-19 Forecast Hub collates daily deaths and confirmed cases from the Johns Hopkins University's (JHU) Center for System Science and Engineering (CSSE) group's COVID-19 github repository as the gold standard reference data for deaths in the US.
We also collate case and death data from NYTimes and USAFacts for comparison to JHU.
Hospitalization data are taken from the HealthData.gov COVID-19 Reported Patient Impact and Hospital Capacity by State Timeseries. More details on how these data are used are available in the technical README. Ideally, the data will be available easily through the EpiData API.
We aggregate and format both Cumulative Death and Incident Death truth data from the JHU CSSE group. Although these csv
s are not explicitly used in the visualization code, they match the "Actual" line in the visualization. This method in covidHubUtils
package creates these truth data csvs.
There are also corresponding methods in covidHubUtils
, for truths from NYTimes and USAFacts, that downloads and perform aggregation. The data is stored in data-truth/nytimes/ and data-truth/usafacts/
Weekly cumulative counts are the reported values as of the Saturday of each week. For example, the weekly cumulative count for the week ending Saturday, August 1, 2020 is equal to the reported daily cumulative count for Saturday, August 1, 2020.
Weekly incident counts are calculated as the difference between consecutive weekly cumulative counts. For example, the weekly incident count for the week ending Saturday, August 1, 2020 is the difference between the weekly cumulative count for Saturday, August 1, 2020 and the weekly cumulative count for Saturday, July 25, 2020.
The cumulative and incident counts at the state level are calculated by summing reported cumulative and incident counts in the JHU data file across all locations with the same value for the Province_State
field. This includes some "county-level" records for which we do not request forecasts. These are records with a five-digit FIPS code beginning with 80
or 90
, corresponding to "Out of State" or "Unassigned" locations. For this reason, the counts at the state level may in general be larger than the sum of the counts for the counties within a given state.
Special case: DC is recorded in the truth data with both county code 11001 and state code 11. We have made the decision to omit the county level data since it is duplicated by the state level data.
The counts at the national level are calculated as the sum of counts for all locations in the JHU data file. This includes counts for the Diamond Princess cruise ship, and so the counts for the state level again do not sum to the counts for the national level.
The Actual
line in the visualization is based on the JHU CSSE group truth data. The visualization uses this Cumulative Death JSON, and this Incident Death JSON. This python script creates these JSONS.
The actual data the visualization uses (Forecasts + Truth Data) is in this folder. These JSONs are created with the commands in 0-init-vis.sh using the truth data when the visualization is built. The file called "season-latest" is the default view, which is also Cumulative Deaths. For each State key in the JSON, there is an Actual
object that contains the truth data in the visualization. More on the JSON structure here.
The Zoltar truth data is created with this method in covidHubUtils
and is storedhere.
JHU, NYTimes and USAFacts truth data are updated every Sunday at 12PM through Github Action CI.
This active workflow is responsible for running the truth update weekly, and can be manually activated as well if triggered. The configuration for this workflow is defined here
This deprecated workflow is the previous version used, which did not unit test the truth data before aggregation, but it can still be triggered manually if needed. The configuration for this workflow is defined here
The JHU truth data is unit-tested through 2 phases
- Tests in covidData to ensure that package is faithful to the raw source data at the county level, and that aggregation is done correctly.
- Tests in covidHubUtils to make sure the functions are faithful to the covidData outputs, and that we have all the correct locations, timezeros, etc as required for the specific file
- Home
- Submitting Forecasts
- Data Validation
- Truth Data
- Baseline model
- Weekly ensemble release
- Developer