Official repository for EarthMarker.
Authors: Wei Zhang*, Miaoxin Cai*, Tong Zhang, Yin Zhuang, and Xuerui Mao
- The authors contributed equally to this work.
- [2024.01.06]: We have released the dataset RSVP! 🔥 🔥🔥
- [2024.12.22]: EarthMarker has been accepted to IEEE TGRS. 🎉
- [2024.07.19]: The paper for EarthMarker is released arxiv. 🚀
A visual prompting MLLM called EarthMarker is proposed in the remote sensing (RS) domain for the first time. EarthMarker can comprehend RS imagery under visual and text joint prompts, and flexibly switch interpretation levels, including image, region, and point levels. More importantly, the proposed EarthMarker fills the gap in visual prompting MLLMs for RS, significantly catering to the fine-grained interpretation needs of RS imagery in real-world applications. EarthMarker is capable of various RS visual tasks including scene classification, referring object classification, captioning, and relationship analyses, which are beneficial to making informed decisions in real-world applications.
The entire data of RSVP is released! 🚀 RSVP contains roughly 3.65 M image-point-text and image-region-text pairings.
link1: https://pan.baidu.com/s/1_kMO5bBje7JXTNpxDiCvqg?pwd=gqdb pwd: gqdb
link2: OneDrive version is uploading.
@article{zhang2024earthmarker,
title={EarthMarker: A Visual Prompting Multi-modal Large Language Model for Remote Sensing},
author={Zhang, Wei and Cai, Miaoxin and Zhang, Tong and Zhuang, Yin and Li, Jun and Mao, Xuerui},
journal={IEEE Transactions on Geoscience and Remote Sensing},
year={2024},
publisher={IEEE}
}
This paper benefits from llama. Thanks for their wonderful work.