Name		Name	Last commit message	Last commit date
parent directory ..
CEID-Walk		CEID-Walk
CMU-TimeSeries		CMU-TimeSeries
CU-ensemble		CU-ensemble
Flusight-baseline		Flusight-baseline
Flusight-ensemble		Flusight-ensemble
GH-Flusight		GH-Flusight
GT-FluFNP		GT-FluFNP
IEM_Health-FluProject		IEM_Health-FluProject
JHUAPL-Gecko		JHUAPL-Gecko
LUcompUncertLab-TEVA		LUcompUncertLab-TEVA
LUcompUncertLab-VAR2		LUcompUncertLab-VAR2
LUcompUncertLab-VAR2K		LUcompUncertLab-VAR2K
LUcompUncertLab-VAR2K_plusCOVID		LUcompUncertLab-VAR2K_plusCOVID
LUcompUncertLab-VAR2_plusCOVID		LUcompUncertLab-VAR2_plusCOVID
LUcompUncertLab-humanjudgment		LUcompUncertLab-humanjudgment
LosAlamos_NAU-CModel_Flu		LosAlamos_NAU-CModel_Flu
MOBS-GLEAM_FLUH		MOBS-GLEAM_FLUH
PSI-DICE		PSI-DICE
SGroup-RandomForest		SGroup-RandomForest
SGroup-SIkJalpha		SGroup-SIkJalpha
SigSci-CREG		SigSci-CREG
SigSci-TSENS		SigSci-TSENS
UMass-trends_ensemble		UMass-trends_ensemble
UT_FluCast-Voltaire		UT_FluCast-Voltaire
UVAFluX-Ensemble		UVAFluX-Ensemble
VTSanghani-ExogModel		VTSanghani-ExogModel
METADATA.md		METADATA.md
README.md		README.md

README.md

Data submission instructions

This page is intended to provide teams with all the information they need to submit forecasts. We note that these instructions have been adapted from the COVID-19 Forecast Hub.

All forecasts should be submitted directly to the data-forecasts/ folder. Data in this directory should be added to the repository through a pull request so that automatic data validation checks are run.

These instructions provide detail about the data format as well as validation that you can do prior to this pull request. In addition, we describe metadata that each model should provide.

Table of Contents

What is a forecast
ground truth data
data formatting
forecast file format
forecast data validation
weekly ensemble build
policy on late submissions

What is a forecast

Models are asked to make specific quantitative forecasts about data that will be observed in the future. These forecasts are interpreted as "unconditional" predictions about the future. That is, they are not predictions only for a limited set of possible future scenarios in which a certain set of conditions (e.g. vaccination uptake is strong, or new social-distancing mandates are put in place) hold about the future -- rather, they should characterize uncertainty across all reasonable future scenarios. In practice, all forecasting models make some assumptions about how current trends in data may change and impact the forecasted outcome; some teams select a "most likely" scenario or combine predictions across multiple scenarios that may occur. Forecasts submitted to this repository will be evaluated against observed data.

We note that other modeling efforts, such as the COVID-19 Scenario Modeling Hub, have been launched to collect and aggregate model outputs from "scenario projection" models. These models create longer-term projections under a specific set of assumptions about how the main drivers of the pandemic (such as non-pharmaceutical intervention compliance, or vaccination uptake) may change over time.

Ground truth data

This project treats hospitalization data reported from the HHS Protect system at HealthData.gov as "ground truth" data. We create processed versions of these data that are stored in this repository.

Details on how ground truth data are defined can be found in the data-truth folder README file.

Data formatting

The automatic checks in place for forecast files submitted to this repository validates both the filename and file contents to ensure the file can be used in the visualization and ensemble forecasting.

Subdirectory

Each subdirectory within the data-forecasts/ directory has the format

team-model

where

team is the teamname and
model is the name of your model.

Both team and model should be less than 15 characters and not include hyphens. The model should be unique from any other model in the project.

Within each subdirectory, there should be a metadata file, a license file (optional), and a set of forecasts.

Metadata

The metadata file should have the following format

metadata-team-model.txt

and here is the structure of the metadata file.

License (optional)

By default, forecasts are released under a CC-BY 4.0 license. If you would like to release your forecasts under a different license, please specify a standard license in the license field of your metadata file. Alternatively, if you wish to use a license that is not in the list of standard licenses, you may include a

LICENSE.txt

file in your model directory.

Forecasts

Each forecast file within the subdirectory should have the following format

YYYY-MM-DD-team-model.csv

where

YYYY is the 4 digit year,
MM is the 2 digit month,
DD is the 2 digit day,
team is the teamname, and
model is the name of your model.

The date YYYY-MM-DD is the forecast_date. For this project, the forecast_date should always be the Monday on which the submission is due.

The team and model in this file must match the team and model in the directory this file is in. Both team and model should be less than 15 characters, alpha-numeric and underscores only, with no spaces or hyphens.

Forecast file format

The file must be a comma-separated value (csv) file with the following columns (in any order):

forecast_date
target
target_end_date
location
type
quantile
value

No additional columns are allowed.

Each row in the file is either a point or quantile forecast for a location on a particular date for a particular target.

`forecast_date`

Values in the forecast_date column must be a date in the format

YYYY-MM-DD

This is the date of the Monday on which the forecasts were due to be submitted. forecast_date should correspond and be redundant with the date in the filename, and is included here by request from some analysts.

`target`

Values in the target column must be a character (string) and be one of the following specific targets:

“N wk ahead inc flu hosp” where N is a number between 1 and 4

For week-ahead forecasts, we will use the specification of epidemiological weeks (EWs) defined by the US CDC which run Sunday through Saturday. There are standard software packages to convert from dates to epidemic weeks and vice versa. E.g. MMWRweek for R and pymmwr and epiweeks for python.

For week-ahead forecasts with forecast_date of Monday of EW12, a 1 week ahead forecast corresponds to EW12 and should have target_end_date of the Saturday of EW12.

N week ahead inc flu hosp

This target is the number of new weekly hospitalizations predicted by the model during the week that is N weeks after forecast_date.

`target_end_date`

Values in the target_end_date column must be a date in the format

YYYY-MM-DD

This is the date for the forecast target. For “# wk” targets, target_end_date will be the Saturday at the end of the week time period.

`location`

Values in the location column must be one of the “locations” in this FIPS numeric code file which includes numeric FIPS codes for U.S. states and selected jurisdictions (Washington DC, Puerto Rico, and the US Virgin Islands) as well as “US” for national forecasts.

Please note that when writing FIPS codes, they should be written in as a character string to preserve any leading zeroes.

`type`

Values in the type column are either

“point” or
“quantile”.

This value indicates whether that row corresponds to a point forecast or a quantile forecast. Point forecasts are used in visualization while quantile forecasts are used in visualization and in ensemble construction.

When point forecasts are not included, the median for every location-target pair will be interpreted as the point forecast.

`quantile`

Values in the quantile column are either “NA” (if type is “point”) or a quantile in the format

0.###

For quantile forecasts, this value indicates the quantile for the value in this row.

Teams must provide the following 23 quantiles:

c(0.01, 0.025, seq(0.05, 0.95, by = 0.05), 0.975, 0.99)

##  [1] 0.010 0.025 0.050 0.100 0.150 0.200 0.250 0.300 0.350 0.400 0.450 0.500
## [13] 0.550 0.600 0.650 0.700 0.750 0.800 0.850 0.900 0.950 0.975 0.990

`value`

Values in the value column are non-negative numbers indicating the “point” or “quantile” prediction for this row. For a “point” prediction, value is simply the value of that point prediction for the target and location associated with that row. For a “quantile” prediction, value is the inverse of the cumulative distribution function for the target, location, and quantile associated with that row.

Forecast validation

To ensure proper data formatting, pull requests for new data in data-forecasts/ will be automatically run.

Pull request forecast validation

When a pull request is submitted, the data are validated through Github Actions which runs the tests present in the validations repository. The intent for these tests are to validate the requirements above. Please let us know if you are facing issues while running the tests.

Weekly ensemble build

Every Monday at 11pm ET, we will generate the ensemble forecast using a single valid forecast from each team that submitted in the current week.

Policy on late or updated submissions

In order to ensure that forecasting is done in real-time, all forecasts are required to be submitted to this repository by 11pm ET on Mondays each week. We do not accept late forecasts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-forecasts

data-forecasts

README.md

Data submission instructions

What is a forecast

Ground truth data

Data formatting

Subdirectory

Metadata

License (optional)

Forecasts

Forecast file format

`forecast_date`

`target`

N week ahead inc flu hosp

`target_end_date`

`location`

`type`

`quantile`

`value`

Forecast validation

Pull request forecast validation

Weekly ensemble build

Policy on late or updated submissions

Files

data-forecasts

Directory actions

More options

Directory actions

More options

Latest commit

History

data-forecasts

Folders and files

parent directory

README.md

Data submission instructions

What is a forecast

Ground truth data

Data formatting

Subdirectory

Metadata

License (optional)

Forecasts

Forecast file format

forecast_date

target

N week ahead inc flu hosp

target_end_date

location

type

quantile

value

Forecast validation

Pull request forecast validation

Weekly ensemble build

Policy on late or updated submissions

`forecast_date`

`target`

`target_end_date`

`location`

`type`

`quantile`

`value`