Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Context-window Compression #1

Open
iansinnott opened this issue May 13, 2023 · 7 comments
Open

Context-window Compression #1

iansinnott opened this issue May 13, 2023 · 7 comments

Comments

@iansinnott
Copy link
Owner

iansinnott commented May 13, 2023

It would be nice (and cost effective) to do something other than send the entire chat history with each message as context. I've never run into a context limit in my own usage (8k, gpt4) but it's also not cheap.

Essentially we want the same infinite chat thread experience as the official ChatGPT UI.

Current thinking:

  • Run chat summaries via cheap gpt3.5 and include the summary.
  • Do something with a vector store
  • Include context via the full-text search feature
    • this is already built, which is a big plus. I'm skeptical of the results though using traditional fts methods
@iansinnott
Copy link
Owner Author

Truncation may be the simpler approach. Specify a truncation window and just use that

@samrahimi
Copy link

Run the summaries on gpt-3.5-turbo-instruct and you'll get better results.

@bet0x
Copy link

bet0x commented Jan 4, 2024

What about removing the stop words? It properly implemented the meaning could be the same and it should reduce the context size:

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

nltk.download('stopwords')

def remove_stopwords(text):
    """
    Remove stopwords from the text.

    Parameters:
    text (str): The input text.

    Returns:
    str: The text without stopwords.
    """

    # Tokenize the text
    tokens = word_tokenize(text)

    # Load stopwords from NLTK
    stop_words = set(stopwords.words('english'))

    # Remove stopwords from the text
    filtered_text = ' '.join([word for word in tokens if word.lower() not in stop_words])

    return filtered_text

# Example text
example_text = "The cat sat on the mat. The cat is fluffy. Fluffy cats are cute. Cats like to sit on mats."

# Compress the context with an advanced method and a lower threshold
compressed_text_advanced = remove_stopwords(example_text)

print("ORG: " + example_text)
print("COM: " + compressed_text_advanced)

Output:

ORG: The cat sat on the mat. The cat is fluffy. Fluffy cats are cute. Cats like to sit on mats.
COM: cat sat mat . cat fluffy . Fluffy cats cute . Cats like sit mats .

@samrahimi
Copy link

I wonder if removing the stop words will affect the quality of the output. If you have a long prompt where much of the context js written in what seems like broken English, I would worry that the output is going to follow whatever style was prevalent in the prompt. Have you noticed an impact?

@iansinnott
Copy link
Owner Author

What about removing the stop words?

Hm, i wonder if that can be done in the browser. Despite having a desktop build this is entirely a frontend project—everything runs in a browser window. The database is Sqlite via WASM. There may be other libs to tackle this but I think nltk would be a non-starter since it's meant for a python environment.

Could create a lambda for this, but ideally it all runs locally for low-latency interactions.

@iansinnott
Copy link
Owner Author

I've been experimenting with adding a vector DB in the hopes that having access to similarity search would allow some creative context compression via selecting only relevant messages to include in context. However, it doesn't run in Safari, so that effort has stalled.

The next move here is likely to add a rolling context window, probably customizable by number of tokens.

Open to any suggestions though.

@iansinnott
Copy link
Owner Author

Having explored the in-browser vector storage and come up short with Victor [1] I think the initial move will probably be a sliding window of chat history + summary of whatever else is there. This is what langchain does with their ConversationSummaryBufferMemory which seems like it will be good enough for infinite chat threads that don't require more and more tokens.

Some considerations:

  • How long is the chat message window? 1k tokens? 200 tokens? no idea what works best, but since tokens are priced by the thousand I'm thinking of using 1k to start.
  • What model to use for summarization. whatever model the user already has selected would be the simple answer, but I'd generally want to use a cheaper model for summaries while using the best model for chatting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants