Remote Sensing Image Captioning Based on Multi-layer Aggregated Transformer

Here, we provide the pytorch implementation of the paper: "Remote Sensing Image Captioning Based on Multi-Layer Aggregated Transformer".

For more information, please see our published paper in [IEEE | Lab Server] (Accepted by GRSL 2022)

Train

Firstly, download the RSICD dataset. Then preprocess the data as follows:

python create_input_files.py --karpathy_json_path ./RSICD_captions/dataset_rsicd.json --image_folder ./RSICD_captions/images/

After that, you can find some resulted metadata files in ./data/ Secondly, train model:

python train5.py

Note: During training, beam search is not used when computing scores on the validation set. To get the evaluation score with the beam search strategy, run the following command to get the score on the test set:

Test

python eval.py

Citation:

@ARTICLE{9709791,
  author={Liu, Chenyang and Zhao, Rui and Shi, Zhenwei},
  journal={IEEE Geoscience and Remote Sensing Letters}, 
  title={Remote-Sensing Image Captioning Based on Multilayer Aggregated Transformer}, 
  year={2022},
  volume={19},
  number={},
  pages={1-5},
  doi={10.1109/LGRS.2022.3150957}}

Reference:

Thanks to the following repository: a-PyTorch-Tutorial-to-Image-Captioning

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Remote Sensing Image Captioning Based on Multi-layer Aggregated Transformer

Train

Test

Citation:

Reference:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Remote Sensing Image Captioning Based on Multi-layer Aggregated Transformer

Train

Test

Citation:

Reference: