This is a codebase for image captioning research.
It supports:
- Self critical training from Self-critical Sequence Training for Image Captioning
- Bottom up feature from ref.
- Test time ensemble
- Multi-GPU training. (DistributedDataParallel is now supported with the help of pytorch-lightning, see ADVANCED.md for details)
- Transformer captioning model.
A simple demo colab notebook is available here