Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output garbled text when running "3. Evaluate your deployment" in GUI Model Deployment Guide #21

Open
GianMeng opened this issue Jan 23, 2025 · 1 comment

Comments

@GianMeng
Copy link

After deployed ui-tars-2B model with Ollama locally according to GUI Model Deployment Guide, I run python script and got garbled text like:
and,: , and orc Uran is, c Australia? C : ? True ,否则.
It seems this problem will also cause the output 0 problem when using ui-tars-desktop according to the log from app:

[2025-01-23 16:02:56.695] [info] (main) [vlmParams_images_len]: 1
[2025-01-23 16:02:56.697] [info] (main) [resizeFactor] maxPixels 1058400 currentPixels 1821369 resizeFactor 0.7623000448470658
[2025-01-23 16:02:56.846] [info] (main) [preprocessResizeImage] width: 1301 height: 813 size: 62.60KB
[2025-01-23 16:02:56.847] [info] (main) vlmBaseUrl http://localhost:11434/v1 vlmApiKey ollama
[2025-01-23 16:03:12.460] [info] (main) [vlm_invoke_time_cost]: 15612ms
[2025-01-23 16:03:12.460] [info] (main) [ui_tars_vlm_response_content] 懒陷入了, on right onceberry, ...(Omission)
[2025-01-23 16:03:12.460] [info] (main) [nl2Command] body {"prediction":" 懒陷入了, on right onceberry, ...(Omission)
[2025-01-23 16:03:12.461] [info] (main) [nl2Command] parsed []
[2025-01-23 16:03:12.461] [info] (main) [emitData] status running
[2025-01-23 16:03:12.461] [info] (main) ======data======
[] { size: { width: 1707, height: 1067 } } {
from: 'gpt',
value: '懒陷入了, on right onceberry, this you right right him right,, right,趁更何况,, right when,你耻, ...(Omission)
timing: { start: 1737619376694, end: 1737619392461, cost: 15767 },
reflections: []
}
========
[2025-01-23 16:03:12.462] [info] (main) [parsed] [] [parsed_length] 0

and this is the python script I used:

import base64
from openai import OpenAI

deployment = "ollama"
instruction = "click the start menu"
screenshot_path = r"C:\Gianmeng\Code\Mass\UI-TARS\Screenshots\screenshot.jpg"
assert deployment in ["ollama", "hf"]

if deployment == "ollama":
    client = OpenAI(
        base_url="http://127.0.0.1:11434/v1/",
        api_key="ollama",  # not used
    )
    # the model name created via ollama CLI, you can check it via command: `ollama list`
    model = "ui-tars"
else:
    client = OpenAI(base_url="<endpoint url>", api_key="<huggingface access tokens>")
    model = "tgi"

prompt = "You are a GUI agent. You are given a task and your action history, with screenshots. You need to perform the next action to complete the task. \n\n    ## Output Format\n    ```\n    Action_Summary: ...\n    Action: ...\n    ```\n\n    ## Action Space\n    click(start_box=‘<|box_start|>(x1,y1)<|box_end|>’)\nlong_press(start_box=‘<|box_start|>(x1,y1)<|box_end|>’, time=‘’)\ntype(content=‘’)\nscroll(direction=‘down or up or right or left’)\nopen_app(app_name=‘’)\nnavigate_back()\nnavigate_home()\nWAIT()\nfinished() # Submit the task regardless of whether it succeeds or fails.\n\n    ## Note\n    - Use English in `Action_Summary` part.\n    \n\n    ## User Instruction\n"
with open(screenshot_path, "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read()).decode("utf-8")

response = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": prompt + instruction},
                {"type": "image_url", "image_url": {"url": f"data:image/jpg;base64,{encoded_string}"}},
            ],
        },
    ],
)
print(response.choices[0].message.content)

依据GUI Model Deployment Guide,我在本地使用Ollama部署了ui-tars-2B,但是在进行第3步进行验证的时候,发现python脚本的输出都是乱码。可以从ui-tars-desktop的log中看出,这个问题好像会导致运行ui-tars-desktop的时候会输出0。

@JjjFangg
Copy link

The GGUF model has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to downgrade it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants