ThreadPoolExecutor breaks dspy.Evaluate config in parallel execution #1766

glesperance · 2024-11-06T16:57:37Z

Using ThreadPoolExecutor to paralellize dspy calls breaks internal config management for threaded dspy.Evaluate.

Specifically doing this:

class QuestionAnswer(dspy.Signature):
    question: str = dspy.InputField(description="The question")
    answer: int = dspy.OutputField(description="The answer to the question")

solver = dspy.ChainOfThought(QuestionAnswer)

from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm

with ThreadPoolExecutor(max_workers=10) as executor:
    results = list(tqdm(executor.map(lambda x: solver(**x.inputs()), trainset), total=len(trainset)))

breaks

# This breaks, both models have the same score. The threads all run on gpt4o_mini which was the last configured 
# model before the ThreadPoolExecutor was created.
evaluator = dspy.Evaluate(devset=devset, metric=is_correct, num_threads=10, display_progress=True)

dspy.configure(lm=gpt4o)
evaluator(solver)

dspy.configure(lm=gpt4o_mini)
evaluator(solver)

# >>>>> gpt4o
# Average Metric: 47 / 50  (94.0): 100%|██████████| 50/50 [00:00<00:00, 1608.90it/s]
# 2024/11/06 11:49:23 INFO dspy.evaluate.evaluate: Average Metric: 47 / 50 (94.0%)
# >>>>> gpt4o_mini
# Average Metric: 47 / 50  (94.0): 100%|██████████| 50/50 [00:00<00:00, 1795.91it/s]
# 2024/11/06 11:49:23 INFO dspy.evaluate.evaluate: Average Metric: 47 / 50 (94.0%)

that is, all calls to dspy.configure will be ignored.

Turning off threading on dspy.Evaluate works properly:

# Turning off threading works as expected: both models have different scores again.

evaluator = dspy.Evaluate(devset=devset, metric=is_correct, num_threads=1, display_progress=True)

dspy.configure(lm=gpt4o)
evaluator(solver)

dspy.configure(lm=gpt4o_mini)
evaluator(solver)

# ###### Output ######
# >>>>> gpt4o
# Average Metric: 48 / 50  (96.0): 100%|██████████| 50/50 [00:00<00:00, 892.83it/s] 
# 2024/11/06 11:49:24 INFO dspy.evaluate.evaluate: Average Metric: 48 / 50 (96.0%)

# >>>>> gpt4o_mini
# Average Metric: 47 / 50  (94.0): 100%|██████████| 50/50 [00:00<00:00, 976.47it/s] 
# 2024/11/06 11:49:24 INFO dspy.evaluate.evaluate: Average Metric: 47 / 50 (94.0%)

See this notebook for full repro.

The text was updated successfully, but these errors were encountered:

isaacbmiller · 2024-11-06T17:12:47Z

This is unfortunately known, and the best way to address it is:

As of the time of writing - Run everything inside an evaluate call first with a dummy metric (lambda x, y, z=None, w=None: 1) and return your outputs using the kwarg in Evaluate
Once Add support for native parallel execution in DSPy #1690 merges, use that for parallel execution.

We should catch/warn in this situation @okhat @krypticmouse

isaacbmiller added the bug Something isn't working label Nov 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThreadPoolExecutor breaks dspy.Evaluate config in parallel execution #1766

ThreadPoolExecutor breaks dspy.Evaluate config in parallel execution #1766

glesperance commented Nov 6, 2024

isaacbmiller commented Nov 6, 2024

ThreadPoolExecutor breaks dspy.Evaluate config in parallel execution #1766

ThreadPoolExecutor breaks dspy.Evaluate config in parallel execution #1766

Comments

glesperance commented Nov 6, 2024

isaacbmiller commented Nov 6, 2024