Skip to content

Truth Data

Jarad Niemi edited this page Aug 12, 2020 · 17 revisions

Truth Data Source

The COVID-19 Forecast Hub collates daily deaths and confirmed cases from the Johns Hopkins University's (JHU) Center for System Science and Engineering (CSSE) group's COVID-19 github repository as the gold standard reference data for deaths in the US.

We also collate NYTimes and USAFacts for comparison to JHU.

Daily Truth Data

We aggregate and format both Cumulative Death and Incident Death truth data from the JHU CSSE group. Although these csvs are not explicitly used in the visualization code, they match the "Actual" line in the visualization. This python script creates these truth data csvs.

Weekly Truth Data

Weekly cumulative counts are the reported values as of the Saturday of each week. For example, the weekly cumulative count for the week ending Saturday, August 1, 2020 is equal to the reported daily cumulative count for Saturday, August 1, 2020.

Weekly incident counts are calculated as the difference between consecutive weekly cumulative counts. For example, the weekly incident count for the week ending Saturday, August 1, 2020 is the difference between the weekly cumulative count for Saturday, August 1, 2020 and the weekly cumulative count for Saturday, July 25, 2020.

Aggregation to State and National Level

The cumulative and incident counts at the state level are calculated by summing reported cumulative and incident counts in the JHU data file across all locations with the same value for the Province_State field. This includes some "county-level" records for which we do not request forecasts. These are records with a five-digit FIPS code beginning with 80 or 90, corresponding to "Out of State" or "Unassigned" locations. For this reason, the counts at the state level may in general be larger than the sum of the counts for the counties within a given state.

The counts at the national level are calculated as the sum of counts for all locations in the JHU data file. This includes counts for the Diamond Princess cruise ship, and so the counts for the state level again do not sum to the counts for the national level.

Visualization Truth Data

The Actual line in the visualization is based on the JHU CSSE group truth data. The visualization uses this Cumulative Death JSON, and this Incident Death JSON. This python script creates these JSONS.

The actual data the visualization uses (Forecasts + Truth Data) is in this folder. These JSONs are created with the commands in 0-init-vis.sh using the truth data when the visualization is built. The file called "season-latest" is the default view, which is also Cumulative Deaths. For each State key in the JSON, there is an Actual object that contains the truth data in the visualization. More on the JSON structure here.

Zoltar Truth Data

The Zoltar truth data is created with this python script and can be found here.

Truth Data Update Schedule

JHU truth data are updated every 6 hours through Github Action CI while NYTimes and USAFacts are updated manually and sporadically.