lechmazur

lechmazur

CEO, Advameg, Inc.

Achievements

confabulations confabulations Public

Hallucinations (Confabulations) Document-Based Benchmark for RAG

HTML 69 3
writing writing Public

This benchmark tests how well LLMs incorporate a set of 10 mandatory story elements (characters, objects, core concepts, attributes, motivations, etc.) in a short creative story

64 2
step_game step_game Public

Multi-Agent Step Race Benchmark: Assessing LLM Collaboration and Deception Under Pressure. A multi-player “step-race” that challenges LLMs to engage in public conversation before secretly picking a…

28 3
nyt-connections nyt-connections Public

Benchmark that evaluates LLMs using 436 NYT Connections puzzles

Python 27 3
divergent divergent Public

LLM Divergent Thinking Creativity Benchmark. LLMs generate 25 unique words that start with a given letter with no connections to each other or to 50 initial random words.

27 1
deception deception Public

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation …

21 2