OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

📢 [Project Page] [V2 Blog Post] [Models V2] [Models V1.5] [huggingface space (to be updated)]

OmniParser is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface.

News

[2025/2] We release OmniParser V2 checkpoints. Watch Video
[2025/2] We introduce OmniTool: Control a Windows 11 VM with OmniParser + your vision model of choice. OmniTool supports out of the box the following large language models - OpenAI (4o/o1/o3-mini), DeepSeek (R1), Qwen (2.5VL) or Anthropic Computer Use. Watch Video
[2025/1] V2 is coming. We achieve new state of the art results 39.5% on the new grounding benchmark Screen Spot Pro with OmniParser v2 (will be released soon)! Read more details here.
[2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb.
[2024/10] OmniParser was the #1 trending model on huggingface model hub (starting 10/29/2024).
[2024/10] Feel free to checkout our demo on huggingface space! (stay tuned for OmniParser + Claude Computer Use)
[2024/10] Both Interactive Region Detection Model and Icon functional description model are released! Hugginface models
[2024/09] OmniParser achieves the best performance on Windows Agent Arena!

Install

Install environment:

conda create -n "omni" python==3.12
conda activate omni
pip install -r requirements.txt

Ensure you have the V2 weights downloaded in weights folder (ensure caption weights folder is called icon_caption_florence). If not download them with:

   rm -rf weights/icon_detect weights/icon_caption weights/icon_caption_florence 
   for f in icon_detect/{train_args.yaml,model.pt,model.yaml} icon_caption/{config.json,generation_config.json,model.safetensors}; do huggingface-cli download microsoft/OmniParser-v2.0 "$f" --local-dir weights; done
   mv weights/icon_caption weights/icon_caption_florence

Examples:

We put together a few simple examples in the demo.ipynb.

Gradio Demo

To run gradio demo, simply run:

python gradio_demo.py

Model Weights License

For the model checkpoints on huggingface model hub, please note that icon_detect model is under AGPL license since it is a license inherited from the original yolo model. And icon_caption_blip2 & icon_caption_florence is under MIT license. Please refer to the LICENSE file in the folder of each model: https://huggingface.co/microsoft/OmniParser.

📚 Citation

Our technical report can be found here. If you find our work useful, please consider citing our work:

@misc{lu2024omniparserpurevisionbased,
      title={OmniParser for Pure Vision Based GUI Agent}, 
      author={Yadong Lu and Jianwei Yang and Yelong Shen and Ahmed Awadallah},
      year={2024},
      eprint={2408.00203},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.00203}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
docs		docs
eval		eval
imgs		imgs
omnitool		omnitool
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
demo.ipynb		demo.ipynb
gradio_demo.py		gradio_demo.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

News

Install

Examples:

Gradio Demo

Model Weights License

📚 Citation

About

Releases 2

Packages

Contributors 7

Languages

License

microsoft/OmniParser

Folders and files

Latest commit

History

Repository files navigation

OmniParser: Screen Parsing tool for Pure Vision Based GUI Agent

News

Install

Examples:

Gradio Demo

Model Weights License

📚 Citation

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 7

Languages

Packages