CHOICE

Abstract

The rapid advancement of Large Vision-Language Models (VLMs), both general-domain models and those specifically tailored for remote sensing, has demonstrated exceptional perception and reasoning capabilities in Earth observation tasks. However, a benchmark for systematically evaluating their capabilities in this domain is still lacking. To bridge this gap, we propose CHOICE, an extensive benchmark designed to objectively evaluate the hierarchical remote sensing capabilities of VLMs. Focusing on 2 primary capability dimensions essential to remote sensing: perception and reasoning, we further categorize 6 secondary dimensions and 23 leaf tasks to ensure a well-rounded assessment coverage. CHOICE guarantees the quality of a total of 10,507 problems through a rigorous process of data collection from 50 globally distributed cities, question construction and quality control. The newly curated data and the format of multiple-choice questions with definitive answers allow for an objective and straightforward performance assessment. Our evaluation of 3 proprietary and 21 open-source VLMs highlights their critical limitations within this specialized context. We hope that CHOICE will serve as a valuable resource and offer deeper insights into the challenges and potential of VLMs in the field of remote sensing.

CHOICE Leaderboard

L2

Model	ILC	SII	CID	AttR	AssR	CSR

Perception

Model	Image-Level Comprehension					Single-Instance Identification							Cross-Instance Discernment
Model	IM	IQ	MR	SC	IC	LR	OC	OL	OP	AR	VG	HD	AC	SR	CD

Reasoning

Model	AttR		AssR		CSR
Model	TP	PP	EA	RA	DD	GD	SI

BibTeX


      @misc{an2025choicebenchmarkingremotesensing,
      title={CHOICE: Benchmarking the Remote Sensing Capabilities of Large Vision-Language Models}, 
      author={Xiao An and Jiaxing Sun and Zihan Gui and Wei He},
      year={2025},
      eprint={2411.18145},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.18145}, 
}