Experimental project to solve the NYT Connection puzzles using agentic workflow based on the langchain
ecosystem. In particular used:
langchain
's OpenAI LLM abstraction layer to interact with OpenAI'sgpt-4o
modellanggraph
's stateful orchestration framework to manage the agent's workflow
Historical NYT Connection Puzzles were used in testing the agent. Past puzzles can be found here.
Connections is a word game that challenges players to find themes between words. The user is presented with 16 words and must create groups of four items that share something in common. For example: Tropical fruit: banana, mango, pineapple, guava.
The agent uses the PuzzleState
class to manage the agent's state and controls the agent's workflow.
# Puzzle phase enums
PUZZLE_PHASE_UNINITIALIZED = "PUZZLE_PHASE_UNINITIALIZED"
PUZZLE_PHASE_SETUP = "PUZZLE_PHASE_SETUP"
PUZZLE_PHASE_SETUP_COMPLETE = "PUZZLE_PHASE_SETUP_COMPLETE"
PUZZLE_PHASE_SOLVING = "PUZZLE_PHASE_SOLVING"
PUZZLE_PHASE_COMPLETE = "PUZZLE_PHASE_COMPLETE"
class PuzzleState(TypedDict):
puzzle_phase: int = PUZZLE_PHASE_UNINITIALIZED
words_remaining: List[str] = []
invalid_connections: List[List[str]] = []
recommended_words: List[str] = []
recommended_connection: str = ""
recommended_correct: bool = False
found_yellow: bool = False
found_greeen: bool = False
found_blue: bool = False
found_purple: bool = False
mistake_count: int = 0
recommendation_count: int = 0
llm_temperature: float = 1.0
input_source_type: str = ""
The attributes words_remaining
and mistake_count
are used to determine when to terminate the agent. When a correct group of 4 words are found, these words are removed from words_remaining
. If a mistake is made, then mistake_count
is incremented. The agent is terminated when either words_reamaining
becomes empty or mistake_count
exceeds a threshold.
Overall control is performed by the run_planner()
function. The agent's workflow is defined by the StateGraph
class from langgraph
. The agent's workflow is defined by a series of nodes and edges. The nodes are the agent's processing steps and the edges are the transitions between the processing steps. This function determines the next step in the agent's workflow based on the puzzle_phase
of the agent.
Agent's workflow defintion:
workflow = StateGraph(PuzzleState)
workflow.add_node("run_planner", run_planner)
workflow.add_node("get_input_source", get_input_source)
workflow.add_node("read_words_from_file", read_words_from_file)
workflow.add_node("read_words_from_image", read_words_from_image)
workflow.add_node("get_recommendation", get_recommendation)
workflow.add_node("regenerate_recommendation", regenerate_recommendation)
workflow.add_node("apply_recommendation", apply_recommendation)
workflow.add_node("clear_recommendation", clear_recommendation)
workflow.add_conditional_edges(
"run_planner",
determine_next_action,
{
"get_input_source": "get_input_source",
"get_recommendation": "get_recommendation",
END: END,
},
)
workflow.add_conditional_edges(
"get_input_source",
route_input_source,
{
"read_words_from_file": "read_words_from_file",
"read_words_from_image": "read_words_from_image",
},
)
workflow.add_edge("read_words_from_file", "run_planner")
workflow.add_edge("read_words_from_image", "run_planner")
workflow.add_edge("get_recommendation", "apply_recommendation")
workflow.add_edge("clear_recommendation", "run_planner")
workflow.add_edge("regenerate_recommendation", "apply_recommendation")
workflow.add_conditional_edges(
"apply_recommendation",
is_end,
{
"run_planner": "run_planner",
"clear_recommendation": "clear_recommendation",
"regenerate_recommendation": "regenerate_recommendation",
},
)
workflow.set_entry_point("run_planner")
app = workflow.compile()
app.get_graph().draw_png("images/connection_solver_graph.png")
Diagram of the agent's workflow:
Major contents of the repo:
File/Folder | Description |
---|---|
src/agent/app.py |
Main entry point for the agent. Define workflow processing steps (aka graph nodes), workflow transitions (aka graph edges) and PuzzleState data structure. |
src/agent/tools.py |
Tools used by the agent: retrieve puzzle setup, interact with user and interface to OpenAI LLM |
src/agent/utils.py |
Utilities to be used by the agent. |
src/agent/tests/ |
Unit tests for the agent. |
src/agent_testbed/ |
Directory containing technical proof-of-concept code. |
data/ |
Directory containing past NYT Connection Puzzles for testing. |
prompt_testbed/ |
Directory containing sample prompts used in testing with the OpenAI Playground. |
While prompt engineering is a critical component to the agent's success, an equally critical function is setting up the right data structures to be used by the LLM. Speficially, randomizing the order of the words in words_remaining
seemed to allow the LLM to get unstuck from invalid groupings.
Automated testing is needed. Right now the agent is tested manually. This can be tedious as more test cases are needed. Automated testing would allow for more rapid development and testing of the agent.
Experiment tracking is needed. As different designs of the workflow and changes in functionality at different steps in the process, the results from testing should be automatically recorded. For this body of work, all of this was done either in hand-written notes or tracked via memory.
From a Virtual Coding Assistant perspective, perplexity.ai seemed to generate more useful code for langchain
and langgraph
. Github Copilot generated code for these libraries generated code that was not compatible with the current version of the libraries. This is probably due to GH Copilot is trained on code in public repos vs perplexity.ai uses a RAG based approach on current content in the web. perplexity.ai appears to support better at code generation for new and quickly evolving packages. However, once I have some code in the Visual Studio Code IDE, then GH Copilot reduced the effort to refactor and revise the code. For long standing packages, e.g, pandas
, numpy
, matplotlib
, GH Copilot generates useful code snippets.
Note: Due to the random nature of the LLM, the results vary from run to run. For example, running the same puzzle multiple times may result in different recommendations from the LLM. As a result, the puzzle may get solved in one run and not in another.
Expected Solution
π‘ MAKE GOOD ON, AS A PROMISE: FULFILL ,HONOR ,KEEP ,UPHOLD
π’ BEDDING: BLANKET ,SHAM ,SHEET ,THROW
π΅ ACTIONS IN CARD GAMES: DISCARD ,DRAW ,PASS ,PLAY
π£ CABINET DEPARTMENTS: ENERGY ,JUSTICE ,LABOR ,STATE
Example Run
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list5.txt
Words read from file: ['uphold', 'discard', 'honor', 'energy', 'state', 'play', 'justice', 'labor', 'pass', 'fulfill', 'draw', 'keep', 'blanket', 'sham', 'sheet', 'throw']
RECOMMENDED WORDS ['blanket', 'sheet', 'sham', 'throw'] with connection bedding items
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['blanket', 'sheet', 'sham', 'throw'] is correct
RECOMMENDED WORDS ['play', 'discard', 'draw', 'pass'] with connection Card game actions
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['play', 'discard', 'draw', 'pass'] is correct
RECOMMENDED WORDS ['honor', 'uphold', 'keep', 'fulfill'] with connection ways to maintain or adhere to something (e.g., a promise, duty)
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['honor', 'uphold', 'keep', 'fulfill'] is correct
RECOMMENDED WORDS ['energy', 'state', 'justice', 'labor'] with connection Departments of the US Government
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['energy', 'state', 'justice', 'labor'] is correct
SOLVED THE CONNECTION PUZZLE!!!
FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'invalid_connections': [],
'llm_temperature': 0.7,
'mistake_count': 0,
'recommendation_count': 4,
'recommended_connection': 'Departments of the US Government',
'recommended_correct': True,
'recommended_words': ['energy', 'state', 'justice', 'labor'],
'words_remaining': []}
Expected Solution
π‘ BRING ABOUT: GENERATE ,INSPIRE ,PROMPT ,PROVOKE
π’ THINGS THAT ARE OFTEN SCENTED: CANDLE ,INCENSE ,LOTION ,SOAP
π΅ THINGS THAT MIGHT STING: INSULT ,JELLYFISH ,NETTLE ,WASP
π£ VIDEO GAME FRANCHISES: CIVILIZATION ,HALO ,MADDEN ,METROID
Example Run
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list2.txt
Words read from file: ['inspire', 'madden', 'jellyfish', 'metroid', 'insult', 'candle', 'halo', 'provoke', 'soap', 'generate', 'incense', 'civilization', 'lotion', 'wasp', 'prompt', 'nettle']
RECOMMENDED WORDS ['madden', 'provoke', 'insult', 'incense'] with connection To Anger or Annoy
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['madden', 'provoke', 'insult', 'incense'] is incorrect
RECOMMENDED WORDS ['soap', 'lotion', 'candle', 'incense'] with connection Items that can produce a pleasant scent or are used in personal care
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['soap', 'lotion', 'candle', 'incense'] is correct
RECOMMENDED WORDS ['provoke', 'insult', 'nettle', 'incense'] with connection To Annoy or Irritate
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['provoke', 'insult', 'nettle', 'incense'] is incorrect
RECOMMENDED WORDS ['provoke', 'inspire', 'prompt', 'generate'] with connection words related to causing or inducing action or emotion
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['provoke', 'inspire', 'prompt', 'generate'] is correct
RECOMMENDED WORDS ['wasp', 'halo', 'metroid', 'jellyfish'] with connection video game titles
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['wasp', 'halo', 'metroid', 'jellyfish'] is incorrect
RECOMMENDED WORDS ['madden', 'civilization', 'metroid', 'halo'] with connection Video Game Titles
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['madden', 'civilization', 'metroid', 'halo'] is correct
RECOMMENDED WORDS ['jellyfish', 'nettle', 'insult', 'wasp'] with connection things that sting
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['jellyfish', 'nettle', 'insult', 'wasp'] is correct
SOLVED THE CONNECTION PUZZLE!!!
FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'invalid_connections': [ ['madden', 'provoke', 'insult', 'incense'],
['provoke', 'insult', 'nettle', 'incense'],
['wasp', 'halo', 'metroid', 'jellyfish']],
'llm_temperature': 0.7,
'mistake_count': 3,
'recommendation_count': 7,
'recommended_connection': 'things that sting',
'recommended_correct': True,
'recommended_words': ['jellyfish', 'nettle', 'insult', 'wasp'],
'words_remaining': []}
Expected Solution
π‘ RUMMAGE: COMB ,DIG ,ROOT ,SIFT
π’ SOUNDS OF THUNDER: CLAP ,PEAL ,ROLL ,RUMBLE
π΅ WAYS TO WEAR YOUR HAIR UP: BUN ,BRAID ,PONY ,TWIST
π£ THINGS THAT CAN HAVE LEAVES: BOOK ,SALAD ,TABLE ,TREE
Example Run
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list4.txt
Words read from file: ['rumble', 'table', 'pony', 'sift', 'roll', 'bun', 'tree', 'twist', 'salad', 'clap', 'comb', 'peal', 'dig', 'braid', 'root', 'book']
RECOMMENDED WORDS ['bun', 'pony', 'braid', 'comb'] with connection Related to hairstyles
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['bun', 'pony', 'braid', 'comb'] is incorrect
RECOMMENDED WORDS ['twist', 'braid', 'roll', 'bun'] with connection Hairstyles
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['twist', 'braid', 'roll', 'bun'] is incorrect
RECOMMENDED WORDS ['comb', 'dig', 'sift', 'root'] with connection Actions related to gardening or soil preparation
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['comb', 'dig', 'sift', 'root'] is correct
RECOMMENDED WORDS ['clap', 'rumble', 'peal', 'roll'] with connection Types of sounds
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['clap', 'rumble', 'peal', 'roll'] is correct
RECOMMENDED WORDS ['table', 'book', 'salad', 'tree'] with connection Types of leaves
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['table', 'book', 'salad', 'tree'] is correct
RECOMMENDED WORDS ['twist', 'bun', 'pony', 'braid'] with connection types of hairstyles
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['twist', 'bun', 'pony', 'braid'] is correct
SOLVED THE CONNECTION PUZZLE!!!
FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'invalid_connections': [ ['bun', 'pony', 'braid', 'comb'],
['twist', 'braid', 'roll', 'bun']],
'llm_temperature': 0.7,
'mistake_count': 2,
'recommendation_count': 6,
'recommended_connection': 'types of hairstyles',
'recommended_correct': True,
'recommended_words': ['twist', 'bun', 'pony', 'braid'],
'words_remaining': []}
This puzzle is defined by the image from the NYT Connection Puzzle grid for October 20, 2024. A screenshot of the NYT online Connection Puzzle is saved to disk. The agent reads the words from the image and solves the puzzle.
Puzzle Grid Screenshot
Expected Solution
Example Run
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Enter 'file' to read words from a file or 'image' to read words from an image: image
Please enter the image file location: src/agent_testbed/connection_puzzle_image.png
Words read from image: ['paddle', 'sew', 'row', 'story', 'oar', 'fore', 'column', 'racket', 'net', 'butt', 'feature', 'ball', 'clatter', 'table', 'ruckus', 'article']
RECOMMENDED WORDS ['oar', 'paddle', 'fore', 'row'] with connection Rowing-related terms
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['oar', 'paddle', 'fore', 'row'] is incorrect
RECOMMENDED WORDS ['oar', 'paddle', 'butt', 'ball'] with connection Parts of a Rowing Boat
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['oar', 'paddle', 'butt', 'ball'] is incorrect
RECOMMENDED WORDS ['story', 'feature', 'article', 'column'] with connection Parts of a newspaper or magazine
Is the recommendation accepted? (y/g/b/p/n): y
Recommendation ['story', 'feature', 'article', 'column'] is correct
RECOMMENDED WORDS ['racket', 'ruckus', 'clatter', 'row'] with connection Noise or commotion
Is the recommendation accepted? (y/g/b/p/n): g
Recommendation ['racket', 'ruckus', 'clatter', 'row'] is correct
RECOMMENDED WORDS ['net', 'table', 'ball', 'paddle'] with connection Table Tennis Terms
Is the recommendation accepted? (y/g/b/p/n): b
Recommendation ['net', 'table', 'ball', 'paddle'] is correct
RECOMMENDED WORDS ['fore', 'sew', 'butt', 'oar'] with connection Homophones of numbers (four, so, but, or)
Is the recommendation accepted? (y/g/b/p/n): p
Recommendation ['fore', 'sew', 'butt', 'oar'] is correct
SOLVED THE CONNECTION PUZZLE!!!
FINAL PUZZLE STATE:
{ 'found_blue': True,
'found_purple': True,
'found_yellow': True,
'input_source_type': 'image',
'invalid_connections': [ ['oar', 'paddle', 'fore', 'row'],
['oar', 'paddle', 'butt', 'ball']],
'llm_temperature': 0.7,
'mistake_count': 2,
'recommendation_count': 6,
'recommended_connection': 'Homophones of numbers (four, so, but, or)',
'recommended_correct': True,
'recommended_words': ['fore', 'sew', 'butt', 'oar'],
'words_remaining': []}
Expected Solution
π‘ FOOTBALL POSITIONS: CENTER ,GUARD ,QUARTERBACK ,SAFETY
π’ CABLE CHANNELS: DISCOVERY ,HISTORY ,NICKELODEON ,OXYGEN
π΅ FICTIONAL CLOWNS: HOMEY ,JOKER ,PENNYWISE ,RONALD
π£ WHAT βDβ MIGHT STAND FOR: DEFENSE ,DEMOCRAT ,DIMENSIONAL ,DRIVE
Example Run
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list3.txt
Words read from file: ['center', 'pennywise', 'democrat', 'safety', 'oxygen', 'history', 'guard', 'homey', 'joker', 'quarterback', 'ronald', 'defense', 'discovery', 'drive', 'nickelodeon', 'dimensional']
RECOMMENDED WORDS ['quarterback', 'safety', 'defense', 'guard'] with connection Football positions
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['quarterback', 'safety', 'defense', 'guard'] is incorrect
RECOMMENDED WORDS ['nickelodeon', 'joker', 'pennywise', 'ronald'] with connection Famous Clowns
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['nickelodeon', 'joker', 'pennywise', 'ronald'] is incorrect
RECOMMENDED WORDS ['quarterback', 'defense', 'guard', 'safety'] with connection Football Positions
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['quarterback', 'defense', 'guard', 'safety'] is incorrect
RECOMMENDED WORDS ['quarterback', 'center', 'dimensional', 'drive'] with connection Positions or terms related to football
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['quarterback', 'center', 'dimensional', 'drive'] is incorrect
FAILED TO SOLVE THE CONNECTION PUZZLE TOO MANY MISTAKES!!!
FINAL PUZZLE STATE:
{ 'found_blue': False,
'found_purple': False,
'found_yellow': False,
'invalid_connections': [ ['quarterback', 'safety', 'defense', 'guard'],
['nickelodeon', 'joker', 'pennywise', 'ronald'],
['quarterback', 'defense', 'guard', 'safety'],
[ 'quarterback',
'center',
'dimensional',
'drive']],
'llm_temperature': 0.7,
'mistake_count': 4,
'recommendation_count': 4,
'recommended_connection': 'Positions or terms related to football',
'recommended_correct': False,
'recommended_words': ['quarterback', 'center', 'dimensional', 'drive'],
'words_remaining': [ 'drive',
'safety',
'discovery',
'homey',
'joker',
'defense',
'dimensional',
'democrat',
'history',
'center',
'quarterback',
'pennywise',
'ronald',
'oxygen',
'guard',
'nickelodeon']}
Expected Solution
π‘ GRASSY AREA: GREEN ,LAWN ,PARK ,YARD
π’ DEAL WITH: ADDRESS ,ANSWER ,FIELD ,HANDLE
π΅ MOVIES WITH βSβ REMOVED: CAR ,GOODFELLA ,JAW ,SWINGER
π£ ___ LAW: CRIMINAL ,HARVARD ,LEMON ,NATURAL
Example Run
/usr/local/bin/python /workspaces/connection_solver/src/agent/app.py
Please enter the file location: data/word_list1.txt
Words read from file: ['goodfella', 'jaw', 'answer', 'handle', 'park', 'lemon', 'yard', 'field', 'natural', 'car', 'harvard', 'swinger', 'green', 'criminal', 'address', 'lawn']
RECOMMENDED WORDS ['park', 'lawn', 'field', 'yard'] with connection Outdoor spaces
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['park', 'lawn', 'field', 'yard'] is incorrect
RECOMMENDED WORDS ['lawn', 'yard', 'handle', 'jaw'] with connection Parts of a Tool
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['lawn', 'yard', 'handle', 'jaw'] is incorrect
RECOMMENDED WORDS ['answer', 'address', 'field', 'park'] with connection Things related to location or response
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['answer', 'address', 'field', 'park'] is incorrect
RECOMMENDED WORDS ['lawn', 'green', 'lemon', 'natural'] with connection Things that are green
Is the recommendation accepted? (y/g/b/p/n): n
Recommendation ['lawn', 'green', 'lemon', 'natural'] is incorrect
FAILED TO SOLVE THE CONNECTION PUZZLE TOO MANY MISTAKES!!!
FINAL PUZZLE STATE:
{ 'found_blue': False,
'found_purple': False,
'found_yellow': False,
'invalid_connections': [ ['park', 'lawn', 'field', 'yard'],
['lawn', 'yard', 'handle', 'jaw'],
['answer', 'address', 'field', 'park'],
['lawn', 'green', 'lemon', 'natural']],
'llm_temperature': 0.7,
'mistake_count': 4,
'recommendation_count': 4,
'recommended_connection': 'Things that are green',
'recommended_correct': False,
'recommended_words': ['lawn', 'green', 'lemon', 'natural'],
'words_remaining': [ 'lawn',
'park',
'address',
'swinger',
'answer',
'field',
'lemon',
'yard',
'jaw',
'handle',
'goodfella',
'car',
'criminal',
'green',
'harvard',
'natural']}