To run the experiment used to generate figures in the report:
$ python train.py --cfg=config/report.yaml
In late 2021, one of Cambridge's EPSRC-funded centre for doctoral training (CDT) programs by the name of 'AI for the study of environmental risks' (AI4ER) launched a group team challenge (GTC) amongst it's cohort of 2021. The cohort was split down the middle into two groups of 4, with one tasked with building a neural net (NN) capable of categorising ice and open water in the Antarctic's Bellinhausen Sea ('Ice Group'), whilst the other looked to create a NN able to predict wildfire in the eastern European region of Polesia ('Fire Group'). Both of these projects were begun in December 2021 and due to end by March 2022.
The European Space Agency's (ESA) Φ-lab Division (of the Future Systems Department of the EO Programmes Directorate within ESA) spawned an initiative by the name of AI for earth observation (AI4EO) in 2019. It was in collaboration with this initiative that Cambridge's AI4ER program created the dual GTC projects- the 'ice group' benefitted from the domain-specific guidance of the British Antarctic Survey (BAS), whilst the 'fire group' was aided by the British Ornithological Trust (BTO).
The aim of this project has been to identify spatial relationships between wildfire distribution and climatic, topographical, and pedological drivers of wildfire. This was to be achieved by utilising convolutional neural networks (CNNs) to reduce the dimensionality of various geospatial datasets. The motivation for such an endeavour was to provide a new tool for the prediction of wildfire genesis, perhaps facilitating the mitigation of future destructive wildfire events; which are known to cause damage to local ecosystems if too frequent.
To narrow the geospatial bounds of the project, a study area was defined which encompassed the north of Ukraine and the south of Belarus, the rough region being known by the name Polesia. This region was selected so as to build upon previous research that had been commissioned by the BTO- a landcover classification algorithm had been specifically trained on a Polesian sub-region in 2020, and applying this algorithm to the wider research area defined provided one of the major predictor datasets for this project.
Figure 1 - Polesia's bounds in relation to the rest of Europe. The magnification provides clearer bounds of both the Polesia project area and the land cover training sub-region in relation to the two countries of Belarus and Ukraine. The map used is provided by OpenStreetMaps under an ODbL [OSM, 2022] whilst the plotting itself was done by QGIS which is publicly available under the GNU GPL [QGIS, 2022].
As directed by partners of this project, the target dataset used was the 'Fire_cci Burned Area dataset' as generated and served by the ESA. This dataset was originally derived from spectral band data from MODIS alongside thermal data from MODIS active fire products. This dataset was processed into monthly batches and each pixel was assigned a burn/no-burn value (See Figure 2); this target dataset provided a necessary ground truth with which to iteratively improve the NN's performance when processing the predictor datasets. Area 3 was selected as it covers the Polesia region. Figure 2 - Simplified plot of MODIS-derived Fire_CCI51 dataset, with burnt and unburnt areas apparent for March 2020.
3.2.1 - Landcover Types (Sentinel-2+Classifier)
Based on previous work commissioned by the BTO, a 'google earth engine' based land cover classifier built by Artio Earth Observation LLP was made available to this project via a public github page. This classifier was based on a random forest (RF) classification algorithm which had been pre-trained for a sub-region of Polesia (See Figure 1). By using this algorithm on satellite imagery gathered by Sentinel 2, a full landcover map of the entire project area was generated (classifying pixels into one of nine simple landcover types) with a maximum spatial resolution of 20m (See Figure 3). Figure 3 - Land cover map for 2018 generated using the RF algorithm in combination with Sentinel satellite imagery.
3.2.2 - Snow Cover/Depth, Soil Moisture, Surface Temperature (ERA5)
To obtain datasets pertaining to potential cryological, pedological, and climatological predictors of wildfires, ERA5 reananalysis data generated by the European Centre for Medium-Range Weather Forecasts (ECMWF) was obtained. By selectively downloading and splitting apart ERA5 netCDF files it was possible to obtain three important predictor datasets: snow cover, snow depth (m of water equivalent), soil moisture (m3 H2O m-3 soil), and 2m surface temperature (K). All of the reanalysis datasets retained their 31km spatial resolution. These four predictors are visualised for a single month in the below Figures 4/5/6. In this project however, we were provided ERA 5 indices at monthly temporal resolution for Polesia region by Martin Rogers (BAS).
Figure 4 - Snow depth and cover across Polesia in December 2020 based on ERA5 reanalysis dataset.
Figure 5 - Soil moisture across Polesia in December 2020 based on ERA5 reanalysis dataset.
Figure 6 - Surface air temperature across Polesia in December 2020 based on ERA5 reanalysis dataset.
The data for the four indices mentioned above, and the scripts used to create them, can be downloaded from this GitHub repository.
3.2.3 - Normalised Differential Moisture and Water Index (Sentinel-2)
Two further predictors considered were the Normalised Differential Moisture (NDMI) which estimates the moisture content of vegetation and the Normalised Differential Water Index (NDWI) which is sensitive to changes in water content in bodies of water.
As proxies for these two indices can be adapted from specific spectral bands in satellite imagery, Sentinel-2 data was deemed a suitable dataset to utilise due to its spatial resolution of 20m and its temporal timeframe of 2016-2020. In general, NDMI is calculated using NIR and SWIR bands (NIR-SWIR/NIR+SWIR), whilst NDWI is calculated using GREEN and NIR bands (GREEN-NIR/GREEN+NIR). In the specific case of Sentinel-2 data, NDMI was estimated using bands 8 & 11 (B08-B11/B08+B11) as proxies for NIR & SWIR, whilst NDWI required bands 3 & 8 (B03-B08/B03+B08). For the sake of visualisation, Figures 7 & 8 provide plots of NDWI and NDMI respectively for June 2020. Although, in practice no pre-processing was done to bands before being used in the CNN (i.e. B03, B08, and B11 were fed into the CNN as raw data) as such manual dimensionality reduction would be functionally redundant.
Figure 7 - NDMI data generated from raw Sentinel-2 multispectral band data. A strip of cloud cover is seen on the left hand side of the image as well as a patch on the right.
Figure 8 - NDWI data generated from raw Sentinel-2 multispectral band data. A strip of cloud cover is seen on the left hand side of the image as well as a patch on the right.
As noted above, four datasets were used in this repository.
Polesia Landcover mapping was generated using the following open-access mapping repository: https://github.com/tpfd/Polesia-Landcover
This generates both a ’simple’ and ’complex’ landcover mapping, with the former splitting out 9 categories of landcover, and the latter 13. Classified tiles can be downloaded at this GitHub repository.
MODIS Burned Area, our ground truth, is provided by ESA, and contains burned area and confidence level information on a per-pixel basis, derived from MODIS satelllite. The dataset is comprised of separate monthly files. We select Area 3 as it covers the Polesia region. The pre-processing of the dataset is done within our custom TorchGeo ’MODIS’ Class and consists of the following: Binarize - We covert the numeric Julian day of burn into a binary value of: ‘0’ for no burn or ‘1’ for burn observed within a given month
Sentinel 2 Spectral Reflectance is provided by ESA, and consists of the bands 3, 8 and 11, as would make up the NDWI and NDVI indices.
To download Sentinel-2 and MODIS data on JASMIN HPC, run the download_data.py script in the src/data_loading folder.
- MODIS Burned area product for years 2000-2020. This method will pull from Jasmin storage, and unzip files to location specified in modis output variable.
- 3 bands of Sentinel 2 data: B3, B8 and B11, rolled up to monthly level and normalised to within 0-1. This method will download years 2017-2020. Each band downloads to seperate file. The Polesia region is split into 87 tiles to enable download.
To create and activate environment necessary for data download and processing:
$ conda env create -f data_envs.yml -n dataenv
$ conda activate dataenv
$ (dataenv)
To create and activate environment necessary to train the model:
$ conda env create -f environment.yml -n modelenv
$ conda activate modelenv
$ (modelenv)
- Authenticate Earth enginge account in this environemnt - https://developers.google.com/earth-engine/guides/python_install
- This script is designed to be run on JASMIN HPC - the Sentinel portion will work locally but the MODIS unzip will not. Modis data can also be accessed freely via the CEDA archive._
The ERA 5 indices at monthly resolution for the Polesia region were provided to us by Martin Rogers (BAS).
(a) Temperature 2m above land surface
(b) Snow Cover
(c) Snow Depth
(d) Volumetric Soil Water
The data for the four indices mentioned above, and the scripts used to create them, can be downloaded at this GitHub repository.
├── LICENSE
├── Makefile <- Makefile with commands like `make init` or `make lint-requirements`
├── README.md <- The top-level README for developers using this project.
|
├── notebooks <- Jupyter notebooks. Naming convention is a number (for ordering),
| | the creator's initials, and a short `-` delimited description, e.g.
| | `1.0_jqp_initial-data-exploration`.
│ ├── exploratory <- Notebooks for initial exploration.
│ └── reports <- Polished notebooks for presentations or intermediate results.
│
├── report <- Generated analysis as HTML, PDF, LaTeX, etc.
│ ├── figures <- Generated graphics and figures to be used in reporting
│ └── sections <- LaTeX sections. The report folder can be linked to your overleaf
| report with github submodules.
│
├── requirements <- Directory containing the requirement files.
│
├── setup.py <- makes project pip installable (pip install -e .) so src can be imported
├── src <- Source code for use in this project.
│ ├── __init__.py <- Makes src a Python module
│ │
│ ├── data_loading <- Scripts to download or generate data
│ │
│ ├── preprocessing <- Scripts to turn raw data into clean data and features for modeling
| |
│ ├── models <- Scripts to train models and then use trained models to make
│ │ predictions
│ │
│ └── tests <- Scripts for unit tests of your functions
│
└── setup.cfg <- setup configuration file for linting rules
Project Core Members:
-
Campbell, Hamish. (AI4ER Cohort-2021, University of Cambridge)
-
Colverd, Grace. (AI4ER Cohort-2021, University of Cambridge)
-
Højlund-Dodd, Thomas. (AI4ER Cohort-2021, University of Cambridge)
-
Stefanović, Sofija. (AI4ER Cohort-2021, University of Cambridge)
Technical Support Members:
Domain Support Members:
University of Cambridge:
AI4EO Initiative of ESA's Φ-Lab
- AI for Earth Observation (AI4EO) Initiative, Φ-lab Division of the Future Systems Department of the EO Programmes Directorate, European Space Agency (ESA).
AI4EO Partner:
OSM. (2022) 'Map of Europe' Available at: https://www.openstreetmap.org/about (Accessed: 14 March 2022)
QGIS. (2022) 'QGIS Version 3.22.3-Białowieża' Available at: https://www.qgis.org/en/site/forusers/download.html (Accessed: 14 March 2022)
Project template created by the Cambridge AI4ER Cookiecutter.