You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using ThreadPoolExecutor to paralellize dspy calls breaks internal config management for threaded dspy.Evaluate.
Specifically doing this:
class QuestionAnswer(dspy.Signature):
question: str = dspy.InputField(description="The question")
answer: int = dspy.OutputField(description="The answer to the question")
solver = dspy.ChainOfThought(QuestionAnswer)
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm
with ThreadPoolExecutor(max_workers=10) as executor:
results = list(tqdm(executor.map(lambda x: solver(**x.inputs()), trainset), total=len(trainset)))
breaks
# This breaks, both models have the same score. The threads all run on gpt4o_mini which was the last configured
# model before the ThreadPoolExecutor was created.
evaluator = dspy.Evaluate(devset=devset, metric=is_correct, num_threads=10, display_progress=True)
dspy.configure(lm=gpt4o)
evaluator(solver)
dspy.configure(lm=gpt4o_mini)
evaluator(solver)
# >>>>> gpt4o
# Average Metric: 47 / 50 (94.0): 100%|██████████| 50/50 [00:00<00:00, 1608.90it/s]
# 2024/11/06 11:49:23 INFO dspy.evaluate.evaluate: Average Metric: 47 / 50 (94.0%)
# >>>>> gpt4o_mini
# Average Metric: 47 / 50 (94.0): 100%|██████████| 50/50 [00:00<00:00, 1795.91it/s]
# 2024/11/06 11:49:23 INFO dspy.evaluate.evaluate: Average Metric: 47 / 50 (94.0%)
that is, all calls to dspy.configure will be ignored.
Turning off threading on dspy.Evaluate works properly:
# Turning off threading works as expected: both models have different scores again.
evaluator = dspy.Evaluate(devset=devset, metric=is_correct, num_threads=1, display_progress=True)
dspy.configure(lm=gpt4o)
evaluator(solver)
dspy.configure(lm=gpt4o_mini)
evaluator(solver)
# ###### Output ######
# >>>>> gpt4o
# Average Metric: 48 / 50 (96.0): 100%|██████████| 50/50 [00:00<00:00, 892.83it/s]
# 2024/11/06 11:49:24 INFO dspy.evaluate.evaluate: Average Metric: 48 / 50 (96.0%)
# >>>>> gpt4o_mini
# Average Metric: 47 / 50 (94.0): 100%|██████████| 50/50 [00:00<00:00, 976.47it/s]
# 2024/11/06 11:49:24 INFO dspy.evaluate.evaluate: Average Metric: 47 / 50 (94.0%)
This is unfortunately known, and the best way to address it is:
As of the time of writing - Run everything inside an evaluate call first with a dummy metric (lambda x, y, z=None, w=None: 1) and return your outputs using the kwarg in Evaluate
Using
ThreadPoolExecutor
to paralellize dspy calls breaks internal config management for threadeddspy.Evaluate
.Specifically doing this:
breaks
that is, all calls to dspy.configure will be ignored.
Turning off threading on
dspy.Evaluate
works properly:See this notebook for full repro.
The text was updated successfully, but these errors were encountered: