DRAFT 2017-05-01
Michael Clark (michael.clark via wassname dot org), Craig Baldacchino
This project explores the idea that we can find leaks using satellite images and machine learning. The primary hypothesis is that given the time and place of leak repairs and given satellite images of 10-15m2 resolution we can predict leaks better than a baseline defined by random guessing. We use a random forest model then we tweak the data filters and model parameters using hyper-optimization. Our results give an f1-score of around 0.6 compared to a dummy f1-score of 0.5. This small improvement supports our hypothesis but it represents a predictive capability is too poor to be used for pipe maintenance. We anticipate that repeating the experiment with higher resolution imagery may give useful levels of predictive capability.
Water leakage in Western Australia accounts for more than 10 billion liters of water and costs over a billion dollars each year. The Water Corporation of Western Australia can only manually survey 10-12 percent of the pipe network each year for non-visible leaks [1]. A solution that can detect leaks without additional on-site inspections would be valuable for water utilities around the world.
One such solution is to use satellite images to detect leaks. Satellite derived vegetation indices can be used for the detection of water pipeline leakages in semiarid areas [2]. Spectral signature profiles can also be used to detect leaks but the signatures can be complicated [3].
Deep learning has been successful at detecting objects from satellite images even when the images give complicated response profiles [4]. This project attempts to find leaks using satellite images by utilizing machine learning to overcome difficulties with complex data.
We have a dataset of leak repairs which was obtained by a freedom of information requests to the very helpful people at Austin Water. The dataset includes the time of repair and the location which is accurate to within one street address. The data includes some obvious false call outs, duplicates, and multi-stage repairs but these are a minority and should only decrease our accuracy by a few percent.
Leaks locations have been combined with satellite images taken before or after a leak repair. Images t days before a leak were classified as positive examples of leaks – where t is a parameter determined by experimentation to be 1.5 days (See the appendix). Images after a leak are classified as negative examples. Images were downloaded from Google Earth Engine but many satellites only pass over once a week so not every leak has corresponding images. For more see the scraping notebooks and the readme files for each dataset. An example of this data is below:
An example of a sentinel-2 image 1 day before a leak repair near 608 Morrow St, Austin TX
An example of a sentinel-2 image 100 days after a leak repair near 608 Morrow St, Austin TX
dataset | Satellite | Leak dataset | Resolution * |
---|---|---|---|
s1-AURX_COPERNICUS-S1_GRD | Sentinel 1 | ATX | >10m2 |
s2-AUTX_v6_COPERNICUS-S2 | Sentinel 2 | ATX | >10m2 |
l7-AUTX_v2_LANDSAT-LE7_L1T | Landsat 7 | ATX | >15m2 |
l8-AUTX_v2_LANDSAT-LC8_L1T | Landsat 8 | ATX | >15m2 |
* This gives source resolution. All data was interpolated
A list of datasets and resolutions
After combining the datasets we must consider filtering the data. We might filter out cloudy images or images that to many days before a leak repair. For example a leak might be repaired two days after it's first observed, making an images three days before a poor example of a leak.
Which data filters yield the best results? To answer this question we used hyper-parameter optimization to find the data filters that have the most predictive power for each dataset. For more see the hyperopt notebook and the data tables in the appendix.
Modeling and prediction is done in the 3_model notebook where outputs include a map of false positives, predictions, and metrics. We tried many models and architectures including U-Net [5], pre-trained VGG-16 [6], decision-tree regressers, and multi-layer perceptrons. No model gave any significant improvement over random forest methods so this was our primary model.
We tested 500 random forest models on each of our satellite datasets then took the mean of the best 50 models. Dummy scores were generated by scikit's dummy classifiers with a uniform strategy. Since we used balanced data we can use f1-scores as our primary benchmark. If the models max score doesn't exceed the dummy score then it's a dataset with lots of misleading features. If the max score does exceed the dummy score then it may be an example of a small sample size or it may be an improvement over random guessing. The table below shows the results, for more see the 2_qc_scraped_data notebook.
support | dummy f1-score-score | f1-score_leak_max | f1-score_leak_mean | |
---|---|---|---|---|
s1-AUTX_COPERNICUS-S1_GRD | 281 | 0.58 | 0.68 | 0.511 |
s2-AUTX_COPERNICUS-S2 | 224 | 0.57 | 0.66 | 0.521 |
l7-AUTX_LANDSAT-LE7_L1T | 1082 | 0.54 | 0.66 | 0.426 |
l8-AUTX_LANDSAT-LC8_L1T | 434 | 0.55 | 0.69 | 0.55 |
Results of the top 50 random forest models compared to dummy scores for each dataset.
Our results show that random forest can improve on dummy benchmarks. However the difference is small, and the f1-scores are low, implying many false positives.
The results support our hypothesis, that machine learning can use satellite images (of 10m2 resolution) to predict water leaks. However the f1-scores indicate that the predictions that are too unreliable to be used in the field.
The largest limitation in this experiment was the resolution of satellite images. We expect better results with higher resolution images, unfortunately only a limited amount of high resolution imagery is available in 2017. Using the entire Digital Globe catalog would yield around 20 overlaps with our leaks datasets and the same goes for the high resolution US National Agriculture Imagery Program (NAIP). We expect that repeating this experiment when higher resolution imagery is available will yield improved predictions.
Landgate's SPUR team awarded us a grant and supported this project. Thanks to the people at Austin Water who were helpful in releasing the Water Repair data. Thanks to Ali Jeney for her assistance.
1: Water Coporation. 2015. 2015 annual report" Online.
2: Agapiou, Athos, et al. "Use of satellite derived vegetation indices for the detection of water pipeline leakages in semiarid areas." First International Conference on Remote Sensing and Geoinformation of Environment. International Society for Optics and Photonics, 2013.
3: Agapiou, Athos, et al. "Variations of spectral signature profiles of wet and dry targets for supporting the detection of water-leakages using satellite data." SPIE Remote Sensing. International Society for Optics and Photonics, 2013.
4: Mnih, Volodymyr, and Geoffrey Hinton. "Learning to detect roads in high-resolution aerial images." Computer Vision–ECCV 2010 (2010): 210-223.
5: Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image segmentation." International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer International Publishing, 2015.
6: K. Simonyan, A. Zisserman. "Very Deep Convolutional Networks for Large-Scale Image Recognition." arXiv preprint arXiv:1409.1556
The main files are
notebooks/0_process_data
where the leak datasets are convertednotebooks/scraped_satellite_images
where images are downloaded for each leaknotebooks/2_qc_scraped_data.ipynb
where test each dataset for consistency and predictive powernotebooks/2a_hyperopt.ipynb
where I try varied filters to see which has the most predictive powernotebooks/2_model.ipynb
which is a jupyter-notebook script for python 3.
You can download the raw satellite data used at this link. If the link is dead please email data at wassname followed by org.
We have many options for filtering the data including balancing datasets, jittering value, augumenting data, and filtering out cloudy images. The most important parameter is "timespan_before", here we decide how long we will allow between a satellite image and a leak repair. If we choose to long we will have a image of a pipe before it started leaking, and if we choose to short a time we will reduce our dataset.
To solve this I ran hyperparamater optimisation where I tried all variations using a random forest model. The results are below. The results for Landsat-7 show that the best 50 models used a timespan_before of 1.13 days, before which images are discarded.
top 50 scores,
- by metric="result_metrics_report_f1-score_leak"
- number of trials=564
corr | mean | std | units | |
---|---|---|---|---|
misc_vals_balanced_classes | 0.118 | 0.98 | 0.141 | bool |
misc_vals_channel_shift_range | -0.137 | 0.083 | 0.06 | frac |
misc_vals_height_shift_range | -0.138 | 0.165 | 0.094 | frac |
misc_vals_max_cloud_cover | -0.128 | 0.109 | 0.085 | frac |
misc_vals_normalized | nan | 0 | 0 | bool |
misc_vals_rescale | -0.011 | 0.33 | 0.089 | frac |
misc_vals_rotation_range | -0.041 | 9.903 | 7.671 | deg |
misc_vals_timespan_before | -0.24 | 1.134 | 1.181 | days |
misc_vals_width_shift_range | 0.027 | 0.271 | 0.091 | frac |
result_metrics_report_f1-score_leak | 1 | 0.679 | 0.01 | |
result_dummy_metrics_report_dummy_f1-score_leak | 0.218 | 0.508 | 0.039 | |
result_dummy_metrics_report_dummy_support_leak | -0.259 | 217.64 | 207.412 | int |
Here's an example of how we interpret the table. For the l7 datasets should we used balanced classes? We have a decent positive correlation so it's important. We had a mean of 0.98, but it's bool, so we should set it to True.
If we instead look at channel_shift_range
we see it has a decent sized negative correlation which means it's important and having it too high gives a low f1 score. The mean is 0.08+-0.06 so lets set it to 0.8.
top 50 scores,
- by metric="result_metrics_report_f1-score_leak"
- number of trials=564
corr | mean | std | units | |
---|---|---|---|---|
misc_vals_balanced_classes | 0.033 | 0.6 | 0.495 | bool |
misc_vals_channel_shift_range | 0.055 | 0.107 | 0.051 | frac |
misc_vals_height_shift_range | -0.04 | 0.22 | 0.147 | frac |
misc_vals_max_cloud_cover | -0.116 | 0.25 | 0.257 | frac |
misc_vals_normalized | -0.784 | 0.42 | 0.499 | bool |
misc_vals_rescale | 0.292 | 0.15 | 0.107 | frac |
misc_vals_rotation_range | -0.072 | 21.022 | 9.381 | deg |
misc_vals_timespan_before | -0.243 | 5.069 | 1.431 | days |
misc_vals_width_shift_range | 0.082 | 0.163 | 0.097 | frac |
result_metrics_report_f1-score_leak | 1 | 0.633 | 0.027 | |
result_dummy_metrics_report_dummy_f1-score_leak | 0.006 | 0.495 | 0.026 | |
result_dummy_metrics_report_dummy_support_leak | -0.235 | 411.2 | 173.273 | int |
top 50 scores,
- by metric="result_metrics_report_f1-score_leak"
- number of trials=1000
mean | std | corr | units | |
---|---|---|---|---|
misc_vals_balanced_classes | 0.98 | 0.141 | 0.027 | bool |
misc_vals_channel_shift_range | 0.129 | 0.071 | -0.087 | frac |
misc_vals_height_shift_range | 0.35 | 0.134 | 0.008 | frac |
misc_vals_max_cloud_cover | 0.171 | 0.168 | -0.222 | frac |
misc_vals_normalized | 0.28 | 0.454 | 0.2 | bool |
misc_vals_rescale | 0.142 | 0.106 | -0.063 | frac |
misc_vals_rotation_range | 21.399 | 13.651 | -0.092 | deg |
misc_vals_timespan_before | 1.984 | 1.769 | -0.238 | days |
misc_vals_width_shift_range | 0.418 | 0.099 | -0.302 | frac |
result_metrics_report_f1-score_leak | 0.684 | 0.021 | 1 | |
result_dummy_metrics_report_dummy_f1-score_leak | 0.505 | 0.061 | 0.105 | |
result_dummy_metrics_report_dummy_support_leak | 113.68 | 88.334 | -0.35 | int |
top 50 scores,
- by metric="result_metrics_report_f1-score_leak"
- number of trials=1000
mean | std | corr | units | |
---|---|---|---|---|
misc_vals_balanced_classes | 1 | 0 | nan | bool |
misc_vals_channel_shift_range | 0.169 | 0.06 | -0.063 | frac |
misc_vals_height_shift_range | 0.167 | 0.115 | 0.146 | frac |
misc_vals_max_cloud_cover | 0.485 | 0.286 | -0.121 | frac |
misc_vals_normalized | 0.06 | 0.24 | -0.028 | bool |
misc_vals_rescale | 0.266 | 0.125 | -0.105 | frac |
misc_vals_rotation_range | 19.203 | 9.334 | 0.232 | deg |
misc_vals_timespan_before | 2.099 | 1.21 | -0.285 | days |
misc_vals_width_shift_range | 0.261 | 0.111 | 0.075 | frac |
result_metrics_report_f1-score_leak | 0.668 | 0.014 | 1 | |
result_dummy_metrics_report_dummy_f1-score_leak | 0.508 | 0.04 | 0.093 | |
result_dummy_metrics_report_dummy_support_leak | 165.88 | 100.096 | -0.237 | int |