[Bug]: Agent not recognizing working directory #6612

BradKML · 2025-02-05T04:52:36Z

Is there an existing issue for the same bug?

I have checked the existing issues.

Describe the bug and reproduction steps

Logs here (DeepSeek v3 which should be very capable in code/reasoning) https://pastebin.com/9XThHQHh

Conversation led to possible need for repomap #2185 (comment)

OpenHands Installation

Docker command in README

OpenHands Version

0.23

Operating System

WSL on Windows

Logs, Errors, Screenshots, and Additional Context

See above for the full log

enyst · 2025-02-05T05:55:18Z

Brad, I read the entire log, and I don't think this is a bug. It's doing this: the LLM gets the working directory automatically, and corrects itself. That's what an agent should do, in the measure of possible, because an LLM will always make mistakes.

Please look at this:

the environment told the LLM the working directory MANY TIMES
the LLM still made a couple of mistakes
the environment still told the LLM the working directory
the LLM got it right.

That's not a bug, you know how they say, it's a feature. 😅

Note: the OBSERVATION is part of the context, it's sent to the LLM.

(slightly simplified below to make it more easily readable)

04:35:14 - ACTION
[Agent Controller]
FileReadAction(path='launch_scientist.py', start=0, end=-1, thought='', action='read')

04:35:14 - OBSERVATION
[Agent Controller] ErrorObservation
File not found: /workspace/launch_scientist.py. Your current working directory is /workspace.

==============
[Agent Controller] LEVEL 1 LOCAL STEP 3 GLOBAL STEP 26

04:35:33 - ACTION
[Agent Controller] MessageAction
CONTENT: The file launch_scientist.py was not found in the current working directory. This suggests that the AI-Scientist repo may not have been cloned or set up correctly. I need to verify if the repository has been cloned and if the necessary files are present. I will check the directory structure and look for the repository.

==============
[Agent Controller] LEVEL 1 LOCAL STEP 4 GLOBAL STEP 27

04:35:38 - ACTION
[Agent Controller] CmdRunAction
COMMAND:
ls

04:35:39 - OBSERVATION
[Agent Controller] CmdOutputObservation
metadata={
"exit_code": 0,
"pid": -1,
"username": "root",
"hostname": "5cbac231740b",
"working_dir": "/workspace",
"py_interpreter_path": "/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python",
"prefix": "",
"suffix": "\n[The command completed with exit code 0.]"
})**

--BEGIN AGENT OBSERVATION--
AI-Scientist
[The command completed with exit code 0.]
[Current working directory: /workspace]
[Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
--END AGENT OBSERVATION--

==============
[Agent Controller] LEVEL 1 LOCAL STEP 5 GLOBAL STEP 28

04:35:58 - ACTION
[Agent Controller 3919e77b7fe1460cbe85fc6130e79a73-delegate] MessageAction
CONTENT: The AI-Scientist repository is present in the directory, but the launch_scientist.py file is missing. This indicates that the repository may not have been set up correctly or the file has been moved. I need to verify if the repository has been cloned properly and if the necessary files are present. I will check the contents of the AI-Scientist directory.

==============
[Agent Controller] LEVEL 1 LOCAL STEP 6 GLOBAL STEP 29

04:36:06 - ACTION
[Agent Controller] CmdRunAction
COMMAND:
ls AI-Scientist

04:36:07 - OBSERVATION
[Agent Controller] CmdOutputObservation
metadata={
"exit_code": 0,
"pid": -1,
"username": "root",
"hostname": "5cbac231740b",
"working_dir": "/workspace",
"py_interpreter_path": "/openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python",
"prefix": "",
"suffix": "\n[The command completed with exit code 0.]"
})**

--BEGIN AGENT OBSERVATION--
LICENSE README.md ai-scientist-env ai_scientist data docs example_papers experimental launch_scientist.py requirements.txt review_ai_scientist review_iclr_bench templates
[The command completed with exit code 0.]

[Current working directory: /workspace]
[Python interpreter: /openhands/poetry/openhands-ai-5O4_aCHf-py3.12/bin/python]
[Command finished with exit code 0]
--END AGENT OBSERVATION--

==============
[Agent Controller] LEVEL 1 LOCAL STEP 7 GLOBAL STEP 30

04:36:12 - ACTION
[Agent Controller] FileReadAction(path='AI-Scientist/launch_scientist.py')
04:36:12 - OBSERVATION
[Agent Controller] [Read from /workspace/AI-Scientist/launch_scientist.py is successful.]
import argparse
import json......

enyst · 2025-02-05T06:07:21Z

Please note that your logs also include logs from kevin fork. Sorry, we can't help with it, and there are many differences.

In general, it would be helpful to isolate the logs you find problematic, from openhands, it would speed up our responses, and help our ability to look into the issue and see what's wrong.

In this case, I don't think it's a bug, and it's unclear to me what differences in the prompt could help. Give it all the time all contents of the directory, not only the working directory? Maybe, but that's not obviously better, in general. In the general case, it wouldn't be a file inside the first subdirectory, it could be anywhere, at any depth. On large repos a ls -R is more than the context window. We rely on the LLM to find its way to the file, and I would say that with weird bumps, it did succeed in this case. 🤔

enyst · 2025-02-05T06:18:02Z

What happens after is a bit strange, on how it gets stuck: the LLM sends a MessageAction, over and over again. That's weird because it's unwarranted. I need to look into it a bit.

A thought: you were using DelegatorAgent. That should work in simple tasks, but really the setup with micro-agents is experimental, it doesn't perform well, and I can only really recommend the same task with CodeAct. Or use the existing Delegator as example and develop some smarter agents. ;)

BradKML · 2025-02-05T07:01:59Z

@enyst I think I started OpenHands after Kevin broke, just to see how well it can handle the stuff in the new version, some of the problem is my fault ngl

Brad, I read the entire log, and I don't think this is a bug. It's doing this: the LLM gets the working directory automatically, and corrects itself. That's what an agent should do, in the measure of possible, because an LLM will always make mistakes... That's not a bug, you know how they say, it's a feature. 😅

This kinda suffers the "Romba hits a wall" type problem that I can't quite put a finger on, where it keeps bumping on the same issue without "turning left", or in this case kept ls and then proceed to think maybe it should download file, and then loop.

Give it all the time all contents of the directory, not only the working directory? Maybe, but that's not obviously better, in general. In the general case, it wouldn't be a file inside the first subdirectory, it could be anywhere, at any depth. On large repos a ls -R is more than the context window. We rely on the LLM to find its way to the file, and I would say that with weird bumps, it did succeed in this case. 🤔

The thing is that it kept sucking into this over and over and can't really break out of it in this case, would love to replicate again just to send the screenshots cus it is the same head-scratching behavior.

A thought: you were using DelegatorAgent. That should work in simple tasks, but really the setup with micro-agents is experimental, it doesn't perform well, and I can only really recommend the same task with CodeAct. Or use the existing Delegator as example and develop some smarter agents. ;)

Yeah that is the problem there, I would need some agent that can handle large steps, and Delegator is generally recommended, and I think @SmartManoj found a way to hack CodeAct to think in stepwise multi-task juggling, but that might as well be magic. Definitely would try with CodeAct and pass the results

I think at the very least we need pwd && ls -la && ls-la .. to be invoked more often, to handle things like multi-folder, and possibly the issue of "already cd into a sub-folder and need to climb out for context". No need to do recursive file listing. Or pray for eza adoption

enyst · 2025-02-05T08:15:57Z

Just to clarify, Delegator is not recommended. It's an experimental agent, pretty simple, which hasn't even been evaluated on benchmarks because we are sure it won't make it. (so it doesn't make much sense at this stage)

mamoodi · 2025-02-05T14:38:53Z

I modified the label from a bug based on the comments. Please close the issue if the issue is resolved.

BradKML · 2025-02-06T02:49:25Z

@mamoodi I might need to defer to @enyst observing this specific quirk in behavior (not sure if it counts as a bug)

What happens after is a bit strange, on how it gets stuck: the LLM sends a MessageAction, over and over again. That's weird because it's unwarranted. I need to look into it a bit.

BradKML added the bug Something isn't working label Feb 5, 2025

mamoodi added troubleshooting/help User requires help and removed bug Something isn't working labels Feb 5, 2025

kevin-support-bot bot mentioned this issue Feb 6, 2025

[Bug]: Agent not recognizing working directory SmartManoj/Kevin#254

Open

BradKML mentioned this issue Feb 6, 2025

[Bug]: Infinite Loop and Timeout Issue in SWE-Bench Evaluation Due to Context Overflow Handling in OpenHands Framework #6357

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Agent not recognizing working directory #6612

[Bug]: Agent not recognizing working directory #6612

BradKML commented Feb 5, 2025

enyst commented Feb 5, 2025

enyst commented Feb 5, 2025

enyst commented Feb 5, 2025 •

edited

Loading

BradKML commented Feb 5, 2025 •

edited

Loading

enyst commented Feb 5, 2025 •

edited

Loading

mamoodi commented Feb 5, 2025

BradKML commented Feb 6, 2025

[Bug]: Agent not recognizing working directory #6612

[Bug]: Agent not recognizing working directory #6612

Comments

BradKML commented Feb 5, 2025

Is there an existing issue for the same bug?

Describe the bug and reproduction steps

OpenHands Installation

OpenHands Version

Operating System

Logs, Errors, Screenshots, and Additional Context

enyst commented Feb 5, 2025

enyst commented Feb 5, 2025

enyst commented Feb 5, 2025 • edited Loading

BradKML commented Feb 5, 2025 • edited Loading

enyst commented Feb 5, 2025 • edited Loading

mamoodi commented Feb 5, 2025

BradKML commented Feb 6, 2025

enyst commented Feb 5, 2025 •

edited

Loading

BradKML commented Feb 5, 2025 •

edited

Loading

enyst commented Feb 5, 2025 •

edited

Loading