A suite of Python scripts allowing the end-user to use Deep Learning to detect objects in georeferenced raster images.


Object Detector

This project provides a suite of Python scripts allowing the end-user to use Deep Learning to detect objects in geo-referenced raster images.

Table of contents



A CUDA-enabled GPU is required.


  • CUDA driver. This code was developed and tested with CUDA 11.3 on Ubuntu 20.04.

  • Although we recommend the usage of Docker (see here), this code can also be run without Docker, provided that Python 3.8 is available. Python dependencies may be installed with either pip or conda, using the provided requirements.txt file. We advise using a Python virtual environment.


Without Docker

The object detector can be installed by issuing the following command (see this page for more information on the "editable installs"):

$ pip install --editable .

In case of a successful installation, the command

$ stdl-objdet -h

should display some basic usage information.

With Docker

A Docker image can be built by issuing the following command:

$ docker compose build

In case of a successful build, the command

$ docker compose run --rm stdl-objdet stdl-objdet -h

should display some basic usage information. Note that, for the code to run properly,

  1. the version of the CUDA driver installed on the host machine must match with the version used in the Dockerfile, namely version 11.3. We let end-user adapt the Dockerfile to her/his environment.
  2. The NVIDIA Container Toolkit must be installed on the host machine (see this guide).


This project implements the workflow described here, which includes four stages:

Stage no. Stage name CLI command Implementation
1 Tileset generation generate_tilesets here
2 Model training train_model here
3 Detection make_detections here
4 Assessment assess_detections here

These stages/scripts can be run one after the other, by issuing the following command from a terminal:

  • w/o Docker:

    $ stdl-objdet <CLI command> <configuration_file>
  • w/ Docker:

    $ docker compose run --rm -it stdl-objdet stdl-objdet <CLI command> <configuration_file>


    $ docker compose run --rm -it stdl-objdet


    nobody@<container ID>:/app# stdl-objdet <CLI command> <configuration_file>

    For those who are less familiar with Docker, know that all output files created inside a container are not persistent, unless "volumes" or "bind mounts" are used (see this).

The same configuration file can be used for all the commands, as each of them only reads the content related to a key named after its name. More detailed information about each stage and the related configuration is provided here-below. The following terminology is used:

  • ground truth (GT): data used to train the Deep Learning-based detection model; such data is expected to be 100% true

  • other data: data that is not ground-truth-grade

  • labels: geo-referenced polygons surrounding the objects targeted by a given analysis

  • FP labels: geo-referenced polygons surrounding the False Positive objects detected by a previously trained model. They are used to select tiles that will not be annotated (fp tiles) but still included in the training dataset, to confront the model with potentially problematic images. The aim of us to improve model performance by avoiding FP detection.

  • AoI, abbreviation of "area of interest": geographical area over which the user intend to carry out the analysis. This area encompasses

    • regions for which ground-truth data is available, as well as
    • regions over which the user intends to detect potentially unknown objects
  • tiles, or - more explicitly - "geographical map tiles": see this link. More precisely, "Slippy Map Tiles" are used within this project, see this link.

  • empty tiles, tiles not intersecting ground truth (not annotated) added to the training dataset to provide contextual tiles and improve model performance. Empty tiles (% of tiles intersecting GT) can be added to the dataset and distributed to the trn, tst and val dataset. Remaining tiles can either be deleted or included to the oth dataset.

  • COCO data format: see this link

  • trn, val, tst, oth: abbreviations of "training", "validation", "test" and "other", respectively

Stage 1: tileset generation

This generate_tilesets command generates the various tilesets concerned by a given study. Each generated tileset is made up by:

  • a collection of geo-referenced raster images (in GeoTIFF format)
  • a JSON file compliant with the COCO data format

The following relations apply:

where "GT tiles" are AoI tiles including GT labels and

In case no GT labels are provided by the user, the script will only generate oth tiles, covering the entire AoI.

If the option is supported by the connector, tiles from a given year (e.g. 2020) or from several years ('multi-year') can be fetched.

When training the model, the user can choose to add empty tiles and/or empty tiles including FP detections to improve the model performance. Empty tiles can be manually defined or selected randomly within a given AoI.

In order to speed up some of the subsequent computations, each output image is accompanied by a small sidecar file in JSON format, carrying information about the image

  • width and height in pixels;
  • bounding box;
  • spatial reference system.

Here's the excerpt of the configuration file relevant to this script, with values replaced by some documentation:
    enable: <True or False (without quotes); if True, only a small subset of tiles is processed>
    nb_tiles_max: <number of tiles to use if the debug mode is enabled>
  working_directory: <the script will chdir into this folder>
    aoi_tiles: <the path to the file including polygons of the Slippy Mappy Tiles covering the AoI>
    ground_truth_labels: <the path to the file including ground-truth labels (optional)>
    other_labels: <the path to the file including other (non ground-truth) labels (optional)>
    FP_labels: <the path to the file including false positive labels (optional)>
      type: <"WMS" as Web Map Service or "MIL" as ESRI's Map Image Layer or "XYZ" for xyz link or "FOLDER" for tiles from an existing folder>
      location: <the URL of the web service or the path to the initial folder>
      layers: <only applies to WMS endpoints>
      year: <numeric year if no 'year' field is provided in tiles.geojson or "multi-year" if a 'year' field is provided in tiles.geojson (optional). Use only with "XYZ" and "FOLDER" connectors>
      srs: <e.g. "EPSG:3857">
    tiles_frac: <fraction (relative to the number of tiles intersecting labels) of empty tiles to add>
    frac_trn: <fraction of empty tiles to add to the trn dataset, then the remaining tiles will be split in 2 and added to tst and val datasets>
    keep_oth_tiles: <True or False, if True keep tiles in oth dataset not intersecting oth labels>  
  output_folder: <the folder were output files will be written>
  tile_size: <the tile/image width and height, in pixels>
  overwrite: <True or False (without quotes); if True, the script is allowed to overwrite already existing images>
  n_jobs: <the no. of parallel jobs the script is allowed to launch, e.g. 1>
    year: <see>
    version: <see>
    description: <see>
    contributor: <see>
    url: <see>
      name: <see>
      url: <see>
    category:     # Only for the mono-class case, otherwise classes are deducted from labels.
        name: <the name of the category target objects belong to, e.g. "swimming pool">
        supercategory: <the supercategory target objects belong to, e.g. "facility">

Note that:

  • the ground_truth_labels, FP_labels and other_labels datasets are optional. The user should either delete or comment out the concerned YAML keys in case she/he does not intend to provide these datasets. This feature has been developed in order to support, e.g., inference-only scenarios

  • Except for the XYZ connector which requires EPSG:3857, the framework is agnostic with respect to the tiling scheme, which the user has to provide as a input file, compliant with the following requirements:

    1. a field named id must exist;
    2. the id field must not contain any duplicate value;
    3. values of the id field must follow the following pattern: (<integer 1>, <integer 2>, <integer 3>), e.g. (135571, 92877, 18) or if a 'year' field is specified from the data preparation (<integer 1>, <integer 2>, <integer 3>, <integer 4>), e.g. (2020, 135571, 92877, 18)

Stage 2: model training

Note This stage can be skipped if the user wishes to perform inference only, using a pre-trained model.

The train_model command allows one to train a detection model based on a Convolutional Deep Neural Network, leveraging FAIR's Detectron2. For further information, we refer the user to the official documention.

Here's the excerpt of the configuration file relevant to this script, with values replaced by textual documentation:
  debug_mode: <True or False (without quotes); if True, a short training will be performed without taking the configuration for detectron2 into account.>
  working_directory: <the script will chdir into this folder>
  log_subfolder: <the subfolder of the working folder where we allow Detectron2 writing some logs>
  sample_tagged_img_subfolder: <the subfolder where some sample images will be output>
  COCO_files: # relative paths, w/ respect to the working_folder
    trn: <the COCO JSON file related to the training dataset (mandatory)>
    val: <the COCO JSON file related to the validation dataset (mandatory)>
    tst: <the COCO JSON file related to the test dataset (mandatory)>
  detectron2_config_file: <the Detectron2 configuration file (relative path w/ respect to the working_folder>
    model_zoo_checkpoint_url: <e.g. "COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml">

Detectron2's configuration files are provided in the example folders mentioned here-below. We warn the end-user about the fact that, for the time being, no hyperparameters tuning is automatically performed.

The evolution of the loss function over the training and validation dataset can be observed in a local server with the following command:

$ tensorboard --logdir <path to the logs folder>

Stage 3: detection

The make_detections command allows one to use the object detection model trained at the previous step to make detections over various input datasets:

  • detections over the trn, val, tst datasets can be used to assess the reliability of this approach on ground-truth data;
  • detections over the oth dataset are, in principle, the main goal of this kind of analyses.

Here's the excerpt of the configuration file relevant to this script, with values replaced by textual documentation:
  working_directory: <the script will chdir into this folder>
  log_subfolder: <the subfolder of the working folder where we allow Detectron2 writing some logs>
  sample_tagged_img_subfolder: <the subfolder where some sample images will be output>
  COCO_files: # relative paths, w/ respect to the working_folder
    trn: <the COCO JSON file related to the training dataset (optional)>
    val: <the COCO JSON file related to the validation dataset (optional)>
    tst: <the COCO JSON file related to the test dataset (optional)>
    oth: <the COCO JSON file related to the "other" dataset (optional)>
  detectron2_config_file: <the Detectron2 configuration file (relative path w/ respect to the working_folder>
    pth_file: <e.g. "./logs/model_final.pth">
  image_metadata_json: <the path to the image metadata JSON file, generated by the `generate_tilesets` command>
  # the following section concerns the Ramer-Douglas-Peucker algorithm, which can be optionally applied to detections before they are exported
    enabled: <true/false>
    epsilon: <see>
  score_lower_threshold: <choose a value between 0 and 1, e.g. 0.05 - detections with a score less than this threshold would be discarded>

Stage 4: assessment

The assess_detections command allows one to assess the reliability of detections, comparing detections with ground-truth data. The assessment goes through the following steps:

  1. Labels (GT + oth) geometries are clipped to the boundaries of the various AoI tiles, scaled by a factor 0.999 in order to prevent any "crosstalk" between neighboring tiles.

  2. Spatial joins and intersection over union are computed between the detections and the clipped labels, in order to identify

    • True positives (TP), i.e. objects that are found in both datasets, labels and detections;
    • False positives (FP), i.e. objects that are only found in the detection dataset;
    • False negatives (FN), i.e. objects that are only found in the label dataset;
    • Wrong class, i.e. objects that are found in both datasets, but with different classes. If the detection is performed over several years, the spatial comparison is made between labels and detections in the same year.
  3. Finally, TPs, FPs and FNs are counted in order to compute the following metrics (see this page) :

    • precision
    • recall
    • f1-score

Here's the excerpt of the configuration file relevant to this command, with values replaced by textual documentation:
  working_directory: <the script will chdir into this folder>
    ground_truth_labels: <the path to GT labels in format>
    other_labels: <the path to "other labels" in format>
    split_aoi_tiles: <the path to the file including split (trn, val, tst, out) AoI tiles>
      trn: <the path to the Pickle file including detections over the trn dataset (optional)>
      val: <the path to the Pickle file including detections over the val dataset (mandatory)>
      tst: <the path to the Pickle file including detections over the tst dataset (optional)>
      oth: <the path to the Pickle file including detections over the oth dataset (optional)>
  output_folder: <the folder where we allow this command to write output files>


A few examples are provided within the examples folder. For further details, we refer the user to the various use-case specific readme files:

It is brought to the reader attention that the examples are provided with a debug parameter that can be set to True for quick tests.


The STDL Object Detector is released under the MIT license.