Skip to content

Latest commit

 

History

History
265 lines (230 loc) · 10.7 KB

era5-pds.md

File metadata and controls

265 lines (230 loc) · 10.7 KB

ERA5 Data on S3 via AWS Public Dataset Program

To provide cloud-based access to ERA5 reanalysis data, Intertrust is working in conjunction with the AWS Public Dataset Program to publish and maintain regular updates of ERA5 data in S3.

This documentation outlines the dataset's details, available parameters, location and structure on S3, and includes examples of how to access and work with the data.

Please refer to the ECMWF website for the official ERA5 data documentation.

For the list of dataset updates and changes, please refer to the Changelog file.

Introduction

ERA5 Climate reanalysis provides a numerical assessment of the modern climate. It is produced by a similar process as regular numerical weather forecast, a data assimilation and forecast loop, taking into account most of the available meteorological observations and analyses them with state of the art numerical model, producing a continuous, spatially consistent and homogeneous dataset.

The dataset provides all essential atmospheric meteorological parameters like, but not limited to, air temperature, pressure and wind at different altitudes, along with surface parameters like rainfall, soil moisture content and sea parameters like sea-surface temperature and wave height. ERA5 provides data at a considerably higher spatial and temporal resolution than its legacy counterpart ERA-Interim. ERA5 consists of high resolution version with 31 km horizontal resolution, and a reduced resolution ensemble version with 10 members.

Data is currently available starting 1979 and is updated monthly. As ECMWF is moving towards more frequent data updates, the Intertrust team will work to match the data refresh with the ECMWF source.

Overview

Source ECMWF WebAPI
Category Climate Reanalysis
Format NetCDF
License Generated using Copernicus Climate Change Service Information 2018. See http://apps.ecmwf.int/datasets/licences/copernicus/ for additional information.
Storage Amazon S3
Location Amazon Resource Name (ARN)
arn:aws:s3:::era5-pds

AWS Region
us-east-1

URL
http://era5-pds.s3.amazonaws.com/
Update Frequency New data is published monthly. The ERA5 Public Release Plan is available at http://climate.copernicus.eu/products/climate-reanalysis

Variables

The table below lists the 18 ERA5 variables that are available on S3. All variables are surface or single level parameters sourced from the HRES sub-daily forecast stream.

Variable names are little different from ECMWF has. You can find explanation of variable names derivation here: https://github.com/planet-os/notebooks/blob/master/aws/variables_name_derivation.md

Variable Name File Name Variable type (fc/an)
10 metre U wind component eastward_wind_at_10_metres.nc an
10 metre V wind component northward_wind_at_10_metres.nc an
100 metre U wind component eastward_wind_at_100_metres.nc an
100 metre V wind component northward_wind_at_100_metres.nc an
2 metre dew point temperature dew_point_temperature_at_2_metres.nc an
2 metre temperature air_temperature_at_2_metres.nc an
2 metres maximum temperature since previous post-processing air_temperature_at_2_metres_1hour_Maximum.nc fc
2 metres minimum temperature since previous post-processing air_temperature_at_2_metres_1hour_Minimum.nc fc
Mean sea level pressure air_pressure_at_mean_sea_level.nc an
Sea surface temperature sea_surface_temperature.nc an
Mean wave period sea_surface_wave_mean_period.nc
Mean direction of waves sea_surface_wave_from_direction.nc
Significant height of combined wind waves and swell significant_height_of_wind_and_swell_waves.nc
Snow density snow_density.nc an
Snow depth lwe_thickness_of_surface_snow_amount.nc an
Surface pressure surface_air_pressure.nc an
Surface solar radiation downwards integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation.nc fc
Total precipitation precipitation_amount_1hour_Accumulation.nc fc

The date and time of the variable data is the valid time, with a mapping from forecast time to valid time corresponding to that outlined in Table 0 of the ECMWF ERA5 documentation. ERA5 can have two different versions of a some variables -- either analysis or forecast. Analysis is a field, where observations of the same timestep are mixed into the data. This differs from forecast, which is just a model calculation. For example, variables like 2m temperature and surface pressure are analysed at each timestep, because there are enough near surface observations available. An example of forecast, on the other hand, is precipitation. Full model analysis cycle is performed every 12 hours, at 06:00 and 18:00 UTC, respectively. For forecasted fields, the first 12 forecast hours are used from each forecast run, which occur at 06:00 and 18:00 UTC. A sample highlighting key times of this mapping is included below for reference.

Valid Time ERA5 HRES Sub-Daily Forecast
Date Time Date Forecast Run Step
date 00:00 date - 1 18:00 6
date 06:00 date - 1 18:00 12
date 07:00 date 06:00 1
date 18:00 date 06:00 12
date 19:00 date 18:00 1
date 23:00 date 18:00 5

If there are specific variables you would like to recommend for future inclusion, please contact datahub@intertrust.com.

Data Structure

The ERA5 dataset has been transformed to optimize access by specific variables and temporal ranges. To accommodate this, data is divided into distinct NetCDF granules organized by year, month, and variable name.

The data is structured as follows:

/{year}/{month}/main.nc
               /data/{var1}.nc
                    /{var2}.nc
                    /{....}.nc
                    /{varN}.nc

where year is expressed as four digits (e.g. YYYY) and month as two digits (e.g. MM). Individual data variables (var1 through varN) use names corresponding to NetCDF CF standard names convention plus any applicable additional info, such as vertical coordinate.

Granule variable structure and metadata attributes are stored in main.nc. This file contains coordinate and auxiliary variable data, and is also annotated using NetCDF CF metadata conventions.

A sample path for air temperature would take the following form:

/2008/01/data/air_temperature_at_2_metres.nc

Versioning

To provide a means for correcting potential processing errors in individual granule files, bucket versioning will be used. This solution allows for consistent S3 file paths for end users of the data, and also allows for recovery of previous file versions if necessary. Should an issue occur that requires the rewriting of data granules, we will publish details of the incident as well as the affected files on the ERA5 dataset page.

In the unlikely event that a major update impacting the data structure or its dimensionality be required, such changes would be published as a distinct version of the dataset.

Data Access

The data is publicly available in the ERA5 S3 bucket (era5-pds) and may be directly accessed there. Please note that the best transfer speeds will be achieved by accessing the data from an EC2 instance located in the same AWS region as the S3 bucket (us-east-1).

Data may be accessed via http using the S3 REST API. To make a GET request, use the bucket name and the full key name for the object. For example, to download air temperature at 2 meters for January, 2008, submit a GET request to the following url: http://era5-pds.s3.amazonaws.com/2008/01/data/air_temperature_at_2_metres.nc

Another option is to use the AWS SDK or CLI. We’ve published a jupyter notebook on GitHub that provides an example of how to access ERA5 data in python using boto.

Use Cases & Examples