From dcca70c364c08f93104070c6811a6d13cd443774 Mon Sep 17 00:00:00 2001 From: yadonglu Date: Mon, 27 Jan 2025 13:50:38 -0800 Subject: [PATCH] minor readme --- README.md | 2 +- docs/Evaluation.md | 2 +- eval/ss_pro_gpt4o_omniv2.py | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 129bd68b..68616539 100644 --- a/README.md +++ b/README.md @@ -12,7 +12,7 @@ **OmniParser** is a comprehensive method for parsing user interface screenshots into structured and easy-to-understand elements, which significantly enhances the ability of GPT-4V to generate actions that can be accurately grounded in the corresponding regions of the interface. ## News -- [2025/1] V2 is coming. We achieve new state of the art results (39.5%) on [Screen Spot Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main) with OmniParser v2 (will be released soon)! Read more details [here](https://github.com/microsoft/OmniParser/tree/master/docs). +- [2025/1] V2 is coming. We achieve new state of the art results 39.5% on the new grounding benchmark [Screen Spot Pro](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main) with OmniParser v2 (will be released soon)! Read more details [here](https://github.com/microsoft/OmniParser/tree/master/docs/Evaluation.md). - [2024/11] We release an updated version, OmniParser V1.5 which features 1) more fine grained/small icon detection, 2) prediction of whether each screen element is interactable or not. Examples in the demo.ipynb. - [2024/10] OmniParser was the #1 trending model on huggingface model hub (starting 10/29/2024). - [2024/10] Feel free to checkout our demo on [huggingface space](https://huggingface.co/spaces/microsoft/OmniParser)! (stay tuned for OmniParser + Claude Computer Use) diff --git a/docs/Evaluation.md b/docs/Evaluation.md index 5f43fa15..cc75ad1c 100644 --- a/docs/Evaluation.md +++ b/docs/Evaluation.md @@ -1,4 +1,4 @@ # Eval setup for ScreenSpot Pro -We adapt the eval code from ScreenSpot Pro (ss pro) official repo: https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main. This folder contains the inference script/results on this benchmark. We are under legal review proces to release omniparser v2. Once it is done, we will update the file so that it can load the v2 model. +We adapt the eval code from ScreenSpot Pro (ss pro) official [repo](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/tree/main). This folder contains the inference script/results on this benchmark. We going through legal review proces to release omniparser v2. Once it is done, we will update the file so that it can load the v2 model. 1. eval/ss_pro_gpt4o_omniv2.py: contains the prompt we use, it can be dropped in replacement for this [file](https://github.com/likaixin2000/ScreenSpot-Pro-GUI-Grounding/blob/main/models/gpt4x.py) in the original ss pro repo. 2. eval/logs_sspro_omniv2.json: contains the inferenced results for ss pro using GPT4o+OmniParserv2. diff --git a/eval/ss_pro_gpt4o_omniv2.py b/eval/ss_pro_gpt4o_omniv2.py index f8acb6e1..2b64d25c 100755 --- a/eval/ss_pro_gpt4o_omniv2.py +++ b/eval/ss_pro_gpt4o_omniv2.py @@ -10,7 +10,7 @@ from openai import BadRequestError model_name = "gpt-4o-2024-05-13" -OPENAI_KEY = os.environ.get("OPENAI_API_KEY") # "nEvErGoNnAgIvEyOuUp" +OPENAI_KEY = os.environ.get("OPENAI_API_KEY") def convert_pil_image_to_base64(image): buffered = BytesIO()