Skip to content

Latest commit

 

History

History
81 lines (53 loc) · 5.37 KB

experiments.md

File metadata and controls

81 lines (53 loc) · 5.37 KB

sky-eye

Thoughts

  • The problem can be decomposed into

    1. Segmenting the 'before' image, and;
    2. Classifying each pixel of identified buildings based on the 'after' image.
  • Strategy: get MVP, break problem into sub-components (building classification etc.), see which approach works best.

  • Experiments that can (probably) be run on the reduced-resolution component: broad architecture, best losses, polygonisation experiments.

Set-up

Questions

  • Does polygonisation help? Specifically, at the 'before' layer, we can polygonalise to smooth the predictions and them use each polygon to predict the majority-damage. This should help address some of the pixel drift also. The polygonisation isn't differentiable, so we probably can't do this end-to-end.

    • Current theory: best approach is to polygonalise the 'pre' image and then take the majority class of all pixels – solution to satellite drift.
  • What's the best set up for the loss of the building damage? There are four ordinal damage classes (not damages/slightly damaged/major damage/destroyed), as well as an implicit no-building class in the "post" heatmap.

    • How do we handle the 'no-building' case? Do we explicitly model it as an option, or just model the damage classes and mask out the loss.
  • How do we handle images with no polygons? These are penalised heavily in the loss.

Architecture

Joint models vs. dual models

  • Is a single model for the before/after images better, or two separate specialised models.
    • Pros: we can concatenate/combine the filters of before/after (maybe adding deformable conv/attentional mechanisms to account for pixel drift). Intuitively, seeing the 'before' picture helps you evaluate the extent of the damage moreso than just seeing the 'after' photo.
    • Cons: harder to tune the combined model.
  • What's the best way of combining?

Experiments

  • UNet vs. LinkNet – Linknet looks like it performs well.
    • Using efficientnet-b7: similar performance, but UNet seems to consume slightly more memory: UNet uses 90% GPU memory vs. LinkNet at 82%, although UNet trains 30% faster.
    • Performance seems pretty similar.
  • FPN uses 83% memory.
  • PSPNet uses 78% memory.

Conclusion: LinkNet has best memory/performance profile, with UNet close behind (faster, more memory).

Pretraining

  • Is it better to use models pretrained for building segmentation, or roll my own using a (potentially) nicer/more specialised architecture.
    • Pretrained models.
    • Most of the data that the models were pretrained on is also publically available – so the only other advantage is that the architectures demonstrably work.
  • Not a lot of difference between initialised from scratch and pretrained.
  • xdxd pretrained model has stability issues, selimsef_spacenet4_densenet121unet trains okay (with removing first encoder layer and head due to n_channels mismatch).
  • Biggest densenet works best – selimsef_spacenet4_densenet121unet and selimsef_spacenet4_resnet... don't seem to work as well.
  • EfficientNet outperforms pretrained model.

Conclusion: train from scratch.

Loss and training

Questions

Half precision training

Eval hacking

  • Evaluation is 30% building localisation + 70% classification.
  • However, in order to score a pixel correctly for classification, we first need to have localised it correctly.
  • This potentially implies we should be more recall oriented for localisation.
  • F-scores are harmonic-meaned across classes.

Other