Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on long-term memory #28

Open
korbinian-hoermann opened this issue Jan 29, 2025 · 1 comment
Open

Clarification on long-term memory #28

korbinian-hoermann opened this issue Jan 29, 2025 · 1 comment

Comments

@korbinian-hoermann
Copy link

Hi,

first of all, congrats on the great work !
I was wondering, if you could clarify a few points on "long-term memory" for me.

Q1: As I understand it, you do not have an explicit long-term memory module (as e.g. in Agent Workflow Memory),
it's rather distributed across its neural network parameters. Is that correct ?

And as a follow-up: in this issue you mention, you're using 'history 5' for multi step tasks and give the following example:

# To predict third action
messages.append({
    "role": "user",
    "content": [
        {
            "type": "text",
            "text": PROMPT_FOR_COMPUTER + f"{instruction}"
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_init
        },
        {
            "type": "text",
            "text": previous_actions[0],
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_state_0
        },
        {
            "type": "text",
            "text": previous_actions[1],
        },
        {
            "type": "image_url",
            "image_url": screenshot_from_state_1
        }
    ],
})

Q2: Does this mean, the agent never sees a full action history (not even the textual representation), but maximum the last 5 time steps?
Q3: If this is the case, do you think the agent "long-term memory" would benefit from seeing and thereby connecting whole workflows with task execution ?
Q4: In the given example, does

{
            "type": "text",
            "text": previous_actions[1],
},

contain the full prediction (thought + action) ?

@JjjFangg
Copy link
Collaborator

We truly appreciate your attention to our work. Here are the answers to you question.

A1: Yes, you're correct. It's distributed across its neural network parameters.

A2: Yes, at most 5 history images are given.

A3: Yes, we believe that seeing all historical images is beneficial. However, the history5 approach is designed to balance computational efficiency and performance.

A4: Yes, your understanding is correct.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants