Skip to content

A fast way to retrieve water surface area time-series from Sentinel-2 and Landsat in GEE

License

Notifications You must be signed in to change notification settings

ShiruiH/EOWater

Repository files navigation

EOWater: efficient cloud computing of water surface area time-series from Sentinel-2 and Landsat in GEE

This repository presents an efficient GEE-based solution for mapping water surface area time-series in waterbodies from Landsat and Sentinel-2 imagery.

While many solutions exist to map waterbodies, this toolkit presents the following advantages:

  • flexibility: allows users to input their own set of polygons
  • cost-effective: is based solely on raster operations, significantly reducing the cost of cloud computations (avoiding the high computational cost of processing vectors with many vertices)
  • cloud masking: uses state-of-the-art cloud masking algorithms (s2cloudless) to maximise the temporal depth of the extracted time-series
  • live monitoring: can be set up as a near-real time monitoring workflow almost free of cost in a GEE/GCP environment (with Cloud Run function and Cloud Scheduler)

drawing

The development of this tool was supported through funding from the Australian Government (Murray-Darling Basin Authority).

The tool was developed jointly by the NSW Department of Climate Change, Energy, the Environment and Water and NGIS.

The toolkit is also catalogued in the SEED Portal as the remote-sensing-earth-observation-water-toolkit.

The tool is currently maintained by @ShiruiH and @kvos.

Table of Contents

Installation

To use this tool you will need access to a Google Earth Engine (GEE) project. You can create one at https://signup.earthengine.google.com/. Then go to https://cloud.google.com/sdk/docs/install and install the gcloud CLI. After you have installed it will automatically launch and let you authenticate with your GEE account (or personal gmail).

To install the Python environment, install Anaconda (https://www.anaconda.com/download/). Then open the Anaconda Prompt and type in the following commands:

conda create -n eowater
conda activate eowater
conda install -c conda-forge geopandas -y
conda install -c conda-forge earthengine-api scikit-image rasterio matplotlib notebook folium -y

Then type jupyter lab and navigate to the notebooks in this repository.

Usage

1. Download Sentinel-2 and Landsat original tiles

Two downloading options are available:

  • Option #1: Download scene from USGS Earth Explorer

    Visit USGS Earth Explorer and download a single scene for the required Sentinel-2 and/or Landsat tiles to cover your area of interest.

  • Option #2: Download scene from USGS Earth Explorer

    Use the EE Code Editor to download a single scene for the required Sentinel-2 and/or Landsat tiles to cover your area of interest. Use the Download_original_tiles_S2.js and Download_original_tiles_Landsat.js, and specify the variable tile_list.

2. Create polygon masks

01_Create_polygon_mask.ipynb: notebook to generate the polygon masks for Landsat and Sentinel-2 tiles using a waterbodies boundaries vector layer. A set of input polygon boundaries are provided in the repo for the NSW Northern Basin as an example in waterbodies_boundaries.geojson, courtesy of the National Resources Access Regulator (NRAR).

The inputs for this script is the Sentinel-2 and Landsat tiles downloaded from Step 1. The script creates a .tif file for each tile with a mask where each individual polygon is assigned a different value, which allows the process to distinguish them at a raster level.

drawing

3. Upload masks to GEE Assets

Once the polygon masks have been generated in Python, they need to be uploaded as cloud assets into GEE. You can follow the instructions below to perform this step. Two options are available, with different usage scnarios.

If only a few images need to be uploaded (e.g., fewer than 3), the Option #1 manual process is recommended. This method avoids the need to set up Cloud Storage access authentication.

For uploading a large number of images, the Option #2 automated process is more efficient.

Option #1 manual process
  1. Go to https://code.earthengine.google.com/, sign in and select your cloud project (in this example nsw-dpe-gee-tst).

  2. Click on NEW > GeoTIFF Image Upload. Select your file in /outputs (e.g., outputs/Sentinel2_tiles_mask/T55JGH_20231213T001111_B02.tif).

drawing

  1. Once uploaded, click on the asset and it should show up like in the screenshot below:

drawing

  1. Click on Edit then on the PROPERTIES tab and Add property. Add a property called Tile with value 55JGH (or different tilename). This property is needed later on.

drawing

  1. Repeat for the Landsat tiles, but add two properties, PATH and ROW with their respective values (example below for tile 090081).

drawing

  1. Once all the individual tiles have been uploaded, click on NEW > Image Collection and create an image collection for Sentinel-2 (named it Base_Sentinel2_tiles) and for Landsat (name it Base_Landsat_tiles).

drawing

  1. Then drag and drop all the invididual tiles into their respective image collection (Sentinel-2 or Landsat). The image collection should look as below (17 tiles in that example):

drawing

  1. Finally, upload the image labels which were saved in /outputs. Click on NEW > CSV file and select the file outputs/labels.csv (or Landsat one, they are the same). Call the asset Base_labels.

drawing

You should get a table that relates each unique polygon id to an integer value, like shown below:

drawing

Option #2 automated process
  1. Upload polygon masks to Google Cloud Storage (GCS) Buckets.

    (1) Install the gcloud CLI accordingly.

    (2) The easiest way is to use 02_Upload_polygon_mask_to_bucket.ipynb to upload the polygon masks to a Google Cloud Bucket.

    OR if you are familiar with this, use directly gcloud storage:

    # authenticate gcloud log in, make sure you have the necessary permissions to access the GCS Buckets
    gcloud auth login
    gcloud storage cp -m -r -n [LOCAL_PATH] gs://[BUCKET_NAME]/[DESTINATION_PATH]
  2. Ingest polygon masks from Buckets into GEE Assets using Image Manifest Upload.

    (1) Install the Earth Engine Python client.

    (2) 03_Upload_bucket_to_EE_asset.ipynb: create an ImageCollection, Base_Sentinel2_tiles and/or Base_Landsat_tiles. Ingest the polygon masks into the ImageCollection with the specified properties for each polygon mask.

  3. Upload the image labels which were saved in /outputs. Click on NEW > CSV file and select the file outputs/labels.csv (or Landsat one, they are the same). Call the asset Base_labels.

drawing


(Optional) If polygon masks in GEE Assets need to be removed and re-uploaded. Use 04_Reset_EE_asset_collection.ipynb to batch remove all tiles.

⚠️ Check that you have these 3 assets uploaded:

  • Base_Sentinel2_tiles: image collection of polygon masks for each tile of interest for Sentinel-2.
  • Base_Landsat_tiles: image collection of polygon masks for each tile of interest for Landsat.
  • Base_labels: table relating each polygon id to its unique label value in the masks.

Now you are all setup to map water surface area time-series in GEE!

4. Run GEE scripts in Code Editor

The scripts are found in GEE_scripts and can be copied into the Code Editor and run there. They will output a set of CSV files with the time-series of water surface area for each polygon. The following scripts are available:

  1. WSA_monitoring_S2.js: map water surface area on Sentinel-2 images.
  2. WSA_monitoring_L9.js: map water surface area on Landsat 9 images.
  3. WSA_monitoring_L8.js: map water surface area on Landsat 8 images.
  4. WSA_monitoring_L7.js: map water surface area on Landsat 7 images.
  5. WSA_monitoring_L5.js: map water surface area on Landsat 5 images.

The tileList in the scripts needs to include only the available tiles in Base_Sentinel2_tiles or Base_Landsat_tiles.

(Optional) Additionally, there is a Python script WSA_scheduled_cloud_function.js that can be setup as a Cloud Function to process Sentinel-2, Landsat 9 and Landsat 8 imagery as a cron job.

5. Postprocess water surface areas

05_Postprocess_timeseries.ipynb: notebook to postprocess the time-series of water surface area generated in GEE and includes the following steps:

  • remove outliers using an ad hoc despiking algorithm
  • clip time-series to the total area of the polygon (max area of water) The postprocessed time-series are then saved in individual CSV files named with each polygon identifier. A plot is also created for each polygon.

drawing

Finally, the notebook also creates an interactive map where users can visualise the polygons and time-series at the same time (click on a polygon to visualise the time-series plot).

drawing

This can be a useful tool to monitor water resources in a catchment.

Contributing and Issues

Having a problem? Post an issue in the Issues page (please do not email).

If you are willing to contribute, check out our todo list in the Projects page.

  1. Fork the repository (./fork). A fork is a copy on which you can make your changes.
  2. Create a new branch on your fork
  3. Commit your changes and push them to your branch
  4. When the branch is ready to be merged, create a Pull Request (how to make a clean pull request explained here)

References and Datasets

This section provides a list of references on this topic.

  1. Public talk presenting the water monitoring tool by Mustak Shaikh and Kilian Vos: VIMEO recording (from minute 19:10)
  2. Methodology intro by Shirui Hao: EOWater methodology on YouTube