Skip to content

Commit

Permalink
cr
Browse files Browse the repository at this point in the history
  • Loading branch information
hwchase17 committed Oct 3, 2024
1 parent 0460aee commit ff54dcc
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 86 deletions.
139 changes: 58 additions & 81 deletions docs/docs/concepts/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ Memory in the context of LLMs and AI applications refers to the ability to proce
- Managing what messages (e.g., from a long message history) are sent to a chat model to limit token usage
- Summarizing past conversations to give a chat model context from prior interactions
- Selecting few shot examples (e.g., from a dataset) to guide model responses
- Maintaining persistent data (e.g., user preferences) across multiple chat sessions
- Allowing an LLM to update its own prompt using past information (e.g., meta-prompting)
- Retrieving information relevant to a conversation or question from a long-term storage system
- "Long term memory" - e.g. memory that persists across a thread
- Allowing an LLM to update its own prompt using past information
- Extract specific information from previous interactions

Below, we'll discuss each of these examples in some detail.
Below, we'll discuss each of these examples in some detail.

## Managing Messages

Expand Down Expand Up @@ -142,102 +142,79 @@ If few-shot examples are stored in a [LangSmith Dataset](https://docs.smith.lang

See this how-to [video](https://www.youtube.com/watch?v=37VaU7e7t5o) for example usage of dynamic few-shot example selection in LangSmith. Also, see this [blog post](https://blog.langchain.dev/few-shot-prompting-to-improve-tool-calling-performance/) showcasing few-shot prompting to improve tool calling performance and this [blog post](https://blog.langchain.dev/aligning-llm-as-a-judge-with-human-preferences/) using few-shot example to align an LLMs to human preferences.

## Maintaining Data Across Chat Sessions
## Long term memory

LangGraph's [persistence layer](https://langchain-ai.github.io/langgraph/concepts/persistence/#persistence) has checkpointers that utilize various storage systems, including an in-memory key-value store or different databases. These checkpoints capture the graph state at each execution step and accumulate in a thread, which can be accessed at a later time using a thread ID to resume a previous graph execution. We add persistence to our graph by passing a checkpointer to the `compile` method, as shown here.
LangGraph's [persistence layer](persistence.md#persistence) has checkpointers that enable [thread](persistence.md#threads)-level memory. There are several situations in which you want to enable memory *between* threads. For this we can use LangGraph's [Store](persistence.md#memory-store). This enables "long term memory".

```python
# Compile the graph with a checkpointer
checkpointer = MemorySaver()
graph = workflow.compile(checkpointer=checkpointer)
The `Store` interface is a very low level interface on top of well-known data structures. The interesting part of "long term memory" in LangGraph is not any particular novel data structures, but how you populate and use these datastructures. We've seen that both **what** you may want to save (the shape of the data) and **how** you use this data is often very application specific. We highlight a few interesting use cases below. In general, we believe more in giving developers tools and resources to build out memory pipelines themselves, rather than an opiononated memory service.

# Invoke the graph with a thread ID
config = {"configurable": {"thread_id": "1"}}
graph.invoke(input_state, config)
We've noticed two ways that users implement memory. One is to update memory "in the hot path" of the application. The other is to update it as a background job (or by using a service).

# get the latest state snapshot at a later time
config = {"configurable": {"thread_id": "1"}}
graph.get_state(config)
```
### Updating memory in the hot path

Persistence is critical sustaining a long-running chat sessions. For example, a chat between a user and an AI assistant may have interruptions. Persistence ensures that a user can continue that particular chat session at any later point in time. However, what happens if a user initiates a new chat session with an assistant? This spawns a new thread, and the information from the previous session (thread) is not retained. This motivates the need for a memory that can maintain data across chat sessions (threads).
This involves updating memory while the application is running. A concrete example of this is the way that ChatGPT does memory. ChatGPT can call tools to update or save a new memory, and it does that and then responds to the user.

For this, we can use LangGraph's `Store` interface to save and retrieve information across threads. Shared information can be namespaced by, for example, `user_id` to retain user-specific information across threads. Let's show how to use the `Store` interface to save and retrieve information.
This has several benefits. First of all, it happens realtime, so if the user starts a new thread right away that memory will be present. It also makes it possible to show the user that memory has been updated, and so can be a bit more transparent that way.

```python
from langgraph.store.memory import InMemoryStore
in_memory_store = InMemoryStore()

# Namespace for memories
user_id = "1"
namespace_for_memory = (user_id, "memories")

# Save memories
memory_id = str(uuid.uuid4())
memory = {"food_preference" : "I like pizza"}
in_memory_store.put(namespace_for_memory, memory_id, memory)

# Retrieve memories
memories = in_memory_store.search(namespace_for_memory)
memories[-1].dict()
{'value': {'food_preference': 'I like pizza'},
'key': '07e0caf4-1631-47b7-b15f-65515d4c1843',
'namespace': ['1', 'memories'],
'created_at': '2024-10-02T17:22:31.590602+00:00',
'updated_at': '2024-10-02T17:22:31.590605+00:00'}
```
This also has several downsides. It may slow down the final response since it needs to decide what to remember. It also means your application not only needs to think about the application logic, but also what to remember (which could result in more complicated instructions to the LLM).

### Updating memory as a background job

This involves updating memory as a completely separate process from your application. This can either be done as some of background job that you write, or by utilizing a separate memory service. This involves triggering some run over a conversation after it has finished to updated memory.

The `store` can be used in LangGraph to save or retrieve memories in any graph node. The compile the graph with a checkpointer and store.
This has some benefits. It's a completely separate process from your application, so generally incurs no latency. It also splits up the application logic from the memory logic, making it more modular and easy to manage.

This also has several downsides. It may not happen realtime, so users will not immediately see memory updated. You also have to think more about when to trigger this job - how do you know a conversation is finished?
## Update own instructions

This is an example of long term memory.

"Reflection" or "Meta-prompting" steps can use an LLM to generate or refine its own prompts or instructions. This approach allows the system to dynamically update and improve its own behavior, potentially leading to better performance on various tasks. This is particularly useful for tasks where the instructions are challenging to specify a priori.

Meta-prompting can use past information to update the prompt. As an example, this [Tweet generator](https://www.youtube.com/watch?v=Vn8A3BxfplE) uses meta-prompting to iteratively improve the summarization prompt used to generate high quality paper summaries for Twitter. In this case, we used a LangSmith dataset to house several papers that we wanted to summarize, generated summaries using a naive summarization prompt, manually reviewed the summaries, captured feedback from human review using the LangSmith Annotation Queue, and passed this feedback to a chat model to re-generate the summarization prompt. The process was repeated in a loop until the summaries met our criteria in human review.

This will utilize the memory store concept above to store the updated instructions in a shared namespace. This namespace will have only a single item (unless you want to update instructions specific for each user, but that's a separate issue). This will look something like:

```python
# Compile the graph with the checkpointer and store
graph = graph.compile(checkpointer=checkpointer, store=in_memory_store)

# Invoke the graph
user_id = "1"
config = {"configurable": {"thread_id": "1", "user_id": user_id}}

# First let's just say hi to the AI
for update in graph.stream(
{"messages": [{"role": "user", "content": "hi"}]}, config, stream_mode="updates"
):
print(update)
# Node that *uses* the instructions
def call_model(state: State, store: BaseStore):
instructions = store.search(("instructions",))[0]
# Application logic
prompt = prompt_template.format(instructions=instructions.value["instructions"])
...


# Node that updates instructions
def update_instructions(state: State, store: BaseStore):
current_instructions = store.search(("instructions",))[0]
# Memory logic
prompt = prompt_template.format(instructions=instructions.value["instructions"], conversation=state["messages"])
output = llm.invoke(prompt)
new_instructions = output['new_instructions']
store.put(("instructions",), current_instructions.key, {"instructions": new_instructions})
...
```

Then, we can access the store in any node of the graph by passing `store: BaseStore` as a node argument.
## Remember specific information

```python
def update_memory(state: MessagesState, config: RunnableConfig, *, store: BaseStore):

# Get the user id from the config
user_id = config["configurable"]["user_id"]

# Namespace the memory
namespace = (user_id, "memories")

# ... Analyze conversation and create a new memory

# Create a new memory ID
memory_id = str(uuid.uuid4())
A key part of long term memory is remembering specific information. We often see that the information that is best to remember is application specific.

# We create a new memory
store.put(namespace, memory_id, {"memory": memory})
```
A common pattern we see for remembering this information is to extract information from conversation history. The exact structure of what you extract is up to your application. For example, a coding assistant may want to remember what languages you are comfortable with, whether you like spaces or tabs, etc. A travel app may want to remember restaurants you like.

Anything saved to the store persists across graph executions (threads), allowing for information, such as user preferences or information, to be retained across threads.
One choice to make here is whether you extract a **list** of information, or rather you just continuously update a **single profile**. In the above example, the coding example is more of a "profile", while the travel app remembers a "list" of information.

The store is also built into the LangGraph API, making it accessible when using LangGraph Studio locally or when deploying to the LangGraph Cloud.

See more detail in the [persistence conceptual guide](https://langchain-ai.github.io/langgraph/concepts/persistence/#persistence) and this [how-to guide on shared state](../how-tos/memory/shared-state.ipynb).
### Profile

## Update own instructions
The profile is generally just a JSON blob with various key-value pairs. When remembering a profile, you will want to make sure that you are **updating** the profile each time. As a result, you will want to pass in the previous profile and ask the LLM to generate a new profile (or some JSON patch to apply to the old profile).

Meta-prompting uses an LLM to generate or refine its own prompts or instructions. This approach allows the system to dynamically update and improve its own behavior, potentially leading to better performance on various tasks. This is particularly useful for tasks where the instructions are challenging to specify a priori.
If the profile is large, this can get tricky. You may need to split the profile into subsections and update each one individually. You also may need to fix errors if the LLM generates incorrect JSON.

Meta-prompting can use past information to update the prompt. As an example, this [Tweet generator](https://www.youtube.com/watch?v=Vn8A3BxfplE) uses meta-prompting to iteratively improve the summarization prompt used to generate high quality paper summaries for Twitter. In this case, we used a LangSmith dataset to house several papers that we wanted to summarize, generated summaries using a naive summarization prompt, manually reviewed the summaries, captured feedback from human review using the LangSmith Annotation Queue, and passed this feedback to a chat model to re-generate the summarization prompt. The process was repeated in a loop until the summaries met our criteria in human review.
### Lists

Remember lists of information is easier in some ways, as the individual structures of each item is generally simpler and easier to generate.

## Retrieving relevant information from long-term storage
It is more complex overall, as you now have to enable the LLM to *delete* or *update* existing items in the list. This can be tricky to prompt the LLM to do.

A central challenge that spans many different memory use-case can be summarized simply: how can we retrieve *relevant information* from a long-term storage system and pass it to a chat model? As an example, assume we have a system that stores a large number of specific details about a user, but the user asks a specific question related to restaurant recommendations. It would be costly to trivially extract *all* personal user information and pass it to a chat model. Instead, we want to extract only the information that is most relevant to the user's current chat interaction (e,g,. food preferences, location, etc.) and pass it to the chat model.
You can choose to circumvent this problem entirely by just making this list of items append only, and not allowing updates.

There is a large body of work on retrieval that aims to address this challenge. See our tutorials focused on [RAG, or Retrieval Augmented Generation](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/), our conceptual docs on [retrieval](https://python.langchain.com/docs/concepts/#retrieval), and our [open source repository](https://github.com/langchain-ai/rag-from-scratch) along with [videos](https://www.youtube.com/playlist?list=PLfaIDFEXuae2LXbO1_PKyVJiQ23ZztA0x) on this topic.
Another thing you will have take into account when working with lists is how to choose the relevant items to use. Right now we support filtering by metadata. We will be adding semantic search shortly.
19 changes: 14 additions & 5 deletions docs/docs/concepts/persistence.md
Original file line number Diff line number Diff line change
Expand Up @@ -220,19 +220,19 @@ The final thing you can optionally specify when calling `update_state` is `as_no

![Update](img/persistence/shared_state.png)

A state schema specifies a set of keys / channels that are populated as a graph is executed. As discussed above, state can be written by a checkpointer to a thread at each graph step, enabling state persistence.
A [state schema](low_level.md#schema) specifies a set of keys that are populated as a graph is executed. As discussed above, state can be written by a checkpointer to a thread at each graph step, enabling state persistence.

But, what if we want to retrain some information *across threads*? Consider the case of a chatbot where we want to retain specific information about the user across *all* chat conversations (e.g., threads) with that user!

With checkpointers alone, we cannot share information across threads. This motivates the need for the `Store` interface. As an illustration, we can define an `InMemoryStore` to store information about a user across threads. We simply compile our graph with a checkpointer, as before, and will our new in_memory_store.
With checkpointers alone, we cannot share information across threads. This motivates the need for the `Store` interface. As an illustration, we can define an `InMemoryStore` to store information about a user across threads. We simply compile our graph with a checkpointer, as before, and will our new `in_memory_store`.
First, let's showcase this in isolation without using LangGraph.

```python
from langgraph.store.memory import InMemoryStore
in_memory_store = InMemoryStore()
```

Memories are namespaced by a `tuple`, which in our case will be `(<user_id>, "memories")`. We can think about this namespace as a directory, where each `user_id` can have various sub-directories of things that we want to store (e.g., `memories`, `preferences`, etc.).
Memories are namespaced by a `tuple`, which in this specific example will be `(<user_id>, "memories")`. The namespace can be any length and represent anything, does not have be user specific.

```python
user_id = "1"
Expand All @@ -259,6 +259,15 @@ memories[-1].dict()
'updated_at': '2024-10-02T17:22:31.590605+00:00'}
```

Each memory type is a Python class with certain attributes. We can access it as a dictionary by converting via `.dict` as above.
The attributes it has are:

- `value`: The value (itself a dictionary) of this memory
- `key`: The UUID for this memory in this namespace
- `namespace`: A list of strings, the namespace of this memory type
- `created_at`: Timestamp for when this memory was created
- `updated_at`: Timestamp for when this memory was updated

With this all in place, we use the `in_memory_store` in LangGraph. The `in_memory_store` works hand-in-hand with the checkpointer: the checkpointer saves state to threads, as discussed above, and the the `in_memory_store` allows us to store arbitrary information for access *across* threads. We compile the graph with both the checkpointer and the `in_memory_store` as follows.

```python
Expand All @@ -273,7 +282,7 @@ checkpointer = MemorySaver()
graph = graph.compile(checkpointer=checkpointer, store=in_memory_store)
```

We invoke the graph with a `thread_id`, as before, and also with a `user_id`, which we'll use to namespace our memories to this particular user as we showed above.
We invoke the graph with a `thread_id`, as before, and also with a `user_id`, which we'll use to namespace our memories to this particular user as we showed above.

```python
# Invoke the graph
Expand Down Expand Up @@ -308,7 +317,7 @@ def update_memory(state: MessagesState, config: RunnableConfig, *, store: BaseSt

```

As we showed above, we can also access the store in any node and use `search` to get memories. Recall the the memories are returned as a list, with each object being a dictionary with the `key` (memory_id) and `value` (the memory itself) along with some metadata.
As we showed above, we can also access the store in any node and use `search` to get memories. Recall the the memories are returned as a list of objects that can be converted to a dictionary.

```python
memories[-1].dict()
Expand Down

0 comments on commit ff54dcc

Please sign in to comment.