You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I created a simple module and a set of 10 questions and answers to evaluate a single pdf loaded into chromadb. When evaluating using DSPy version 2.5.16 like
I get a semantic F1 score of 69, then when I run the optimization (which takes about 15 minutes) and evaluating it I get a score of about 79.
tp=dspy.MIPROv2(
metric=metric, auto="medium", num_threads=24
) # use fewer threads if your rate limit is smalloptimized_rag=tp.compile(
RAG(),
trainset=data[:7],
valset=data[7:],
max_bootstrapped_demos=2,
max_labeled_demos=2,
requires_permission_to_run=False,
seed=0
)
evaluate(optimized_rag)
However when I run this with version 2.5.20, I first get a score of 61 and after optimization I get a score of 69. These seem quite different from each other and significantly lower. Everything is the same except I upgrade the DSPy library. Interestingly the optimization now finished in about 2 minutes which is significantly faster. Any thoughts on these differences?
The text was updated successfully, but these errors were encountered:
Perhaps we should save the adapter logic as part of the saved program, actually, so when you load it in the future, it's exactly identical in behavior to your older runs.
What do you think?
(Separately, I wouldn't think too much about the 69 vs 79 scores, since you're working with a valset with 3 examples, so noise is going to have a lot of room.)
@okhat We can save the adapter code with cloudpickle, so it's technically doable. But I feel our adapter change should not really cause performance downgrade? Conceptually it's just parsing the input and output, and if there is a true performance downgrade then it could indicate that we are doing something wrong. So instead of officially supporting serializing DSPy model together with Adapter code, we may want to ensure that newer-version adapter doesn't have negative performance effect?
Hi!
I created a simple module and a set of 10 questions and answers to evaluate a single pdf loaded into chromadb. When evaluating using DSPy version 2.5.16 like
I get a semantic F1 score of 69, then when I run the optimization (which takes about 15 minutes) and evaluating it I get a score of about 79.
However when I run this with version 2.5.20, I first get a score of 61 and after optimization I get a score of 69. These seem quite different from each other and significantly lower. Everything is the same except I upgrade the DSPy library. Interestingly the optimization now finished in about 2 minutes which is significantly faster. Any thoughts on these differences?
The text was updated successfully, but these errors were encountered: