License: Apache 2.0
SARFish [1] is an imagery dataset for the purpose of training, validating and testing supervised machine learning models on the task of ship detection and classification. SARFish builds on the excellent work of the xView3-SAR dataset by expanding the imagery data to include Single Look Complex (SLC) as well as Ground Range Detected (GRD) imagery data taken directly from the European Space Agency (ESA) Copernicus Programme Open Access Hub Website.
Links:
- Data:
- Labels
- Challenge:
- GitHub repo
- Mailbox
- DAIRNet
Read the terms and conditions for the:
- use of the SARFish dataset
- use of this repo
- participation in the SARFish challenge.
$ <package-manager> install g++ python-devel python3-devel gdal gdal-devel
Create the virtual environment with necessary packages.
Note: Requires >python3.8
$ ./venv_setup/venv_create.sh -v venv -r ./venv_setup/venv_requirement.txt
$ source ./venv/bin/activate
Edit the file reference/environment.yaml to set path to the root directory of the SARFish dataset:
SARFish_root_directory: /path/to/SARFish/root/
The SARFish dataset is available for download at:
full SARFish dataset sample SARFish dataset
dataset | coincident GRD, SLC products | compressed (GB) | uncompressed (GB) |
---|---|---|---|
SARFishSample | 1 | 4.3 | 8.2 |
SARFish | 753 | 3293 | 6468 |
Make sure you have at least enough storage space for the uncompressed dataset.
cd /path/to/large/storage/location
[Create|login] to a huggingface account.
In your python3 virtual environment login to the huggingface command line interface.
huggingface-cli login
Install git lfs
<package-manager> install git-lfs
git lfs install
Copy the access token in settings/Access Tokens from your huggingface account. Clone the dataset
git clone https://huggingface.co/datasets/ConnorLuckettDSTG/SARFish
Substitute the final command for the full dataset with the following:
git clone https://huggingface.co/datasets/ConnorLuckettDSTG/SARFishSample
Check the md5 sums of the downloaded SARFish products
./check_SARFish_md5sum.py
Unzip SARFish data products.
cd /path/to/SARFish/directory/GRD
unzip\_batch.sh -p $(find './' -type f -name "*.SAFE.zip")
cd /path/to/SARFish/directory/SLC
unzip\_batch.sh -p $(find './' -type f -name "*.SAFE.zip")
Download the training and validation label files for both the GRD and SLC products from the xView3 website
Add the label files to their respective partitions in the dataset file structure:
SARFish/
├── GRD
│ ├── public
│ ├── train
│ │ └── GRD_train.csv
│ └── validation
│ └── GRD_validation.csv
└── SLC
├── public
├── train
│ └── SLC_train.csv
└── validation
└── SLC_validation.csv
python3 -m jupyter notebook reference/SARFish_demo.ipynb
The SARFish demo is jupyter notebook to help users understand:
- What is the SARFish Challenge?
- What is the SARFish dataset?
- How to access the SARFish dataset
- Dataset structure
- How to load and visualise the SARFish imagery data
- How to load and visualise the SARFish groundtruth labels
- How to train, validate and test the reference/baseline model
- SARFish challenge prediction submission format
- How to evaluate model performance using the SARFish metric
A baseline reference implementation of a real-valued deep learning model is provided for the purpose of introducing new users to training and validating, testing models on the SARFish SLC data in addition to illustrating the use of the SARFish metrics. The reference model demonstrates how to use the SARFish metrics during training, testing and evaluation to help inform the development of better performing models.
The baseline uses the predefined PyTorch implementation of FCOS; chosen because it uses the concept of “centre-ness”, which we believe is applicable to the maritime objects in this dataset.
SARModel.py
The baseline can be trained and evaluated by sequentially running the following scripts:
1_create_tile.py generates the tiles used for training the baseline. Approximately 300GB is required for storage.
./1_create_tile.py
The following trains, validates and tests the baseline model n a small subset of the SARFish dataset detailed in fold.csv.
./2_train.py
./3_test.py
4_evaluate.py calls the SARFish_metric.py script on the testing scenes to determine model peformance on the SARFish challenge tasks.
./4_evaluate.py
The following scripts call the model over the entire public partition of the SARFish dataset to generate the submission/predictions uploaded to the Kaggle competition as the benchmark.
./5_inference.py
./6_concatenate_scene_predictions.py
Evaluate the baseline model's performance on a scene from the validation partition using the metrics for the SARFish dataset.
./SARFish_metric.py \
-p labels/reference_model/reference_predictions_SLC_validation_S1B_IW_SLC__1SDV_20200803T075720_20200803T075748_022756_02B2FF_E5D2.csv \
-g /path/to/SARFish/root/SLC/validation/SLC_validation.csv \
--sarfish_root_directory /path/to/SARFish/root/ \
--product_type SLC \
--xview3_slc_grd_correspondences labels/xView3_SLC_GRD_correspondences.csv \
--shore_type xView3_shoreline \
--no-evaluation-mode
[1] T.-T. Cao et al., “SARFish: Space-Based Maritime Surveillance Using Complex Synthetic Aperture Radar Imagery,” in 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), 2022, pp. 1–8.