Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different optimization results between 2.5.16 -> 2.5.20 #1722

Open
nielsgl opened this issue Oct 30, 2024 · 2 comments
Open

Different optimization results between 2.5.16 -> 2.5.20 #1722

nielsgl opened this issue Oct 30, 2024 · 2 comments

Comments

@nielsgl
Copy link

nielsgl commented Oct 30, 2024

Hi!

I created a simple module and a set of 10 questions and answers to evaluate a single pdf loaded into chromadb. When evaluating using DSPy version 2.5.16 like

evaluate = dspy.Evaluate(
    devset=data, metric=metric, num_threads=24, display_progress=True, display_table=3
)
evaluate(rag)

I get a semantic F1 score of 69, then when I run the optimization (which takes about 15 minutes) and evaluating it I get a score of about 79.

tp = dspy.MIPROv2(
    metric=metric, auto="medium", num_threads=24
)  # use fewer threads if your rate limit is small

optimized_rag = tp.compile(
    RAG(),
    trainset=data[:7],
    valset=data[7:],
    max_bootstrapped_demos=2,
    max_labeled_demos=2,
    requires_permission_to_run=False,
    seed=0
)

evaluate(optimized_rag)

However when I run this with version 2.5.20, I first get a score of 61 and after optimization I get a score of 69. These seem quite different from each other and significantly lower. Everything is the same except I upgrade the DSPy library. Interestingly the optimization now finished in about 2 minutes which is significantly faster. Any thoughts on these differences?

@okhat
Copy link
Collaborator

okhat commented Oct 30, 2024

Hey @nielsgl ! We adjusted the adapters layer (which sits between signatures and LMs) in DSPy 2.5.19, which you can find in the releases page: https://github.com/stanfordnlp/dspy/releases

Perhaps we should save the adapter logic as part of the saved program, actually, so when you load it in the future, it's exactly identical in behavior to your older runs.

What do you think?

(Separately, I wouldn't think too much about the 69 vs 79 scores, since you're working with a valset with 3 examples, so noise is going to have a lot of room.)

@okhat okhat closed this as completed Nov 18, 2024
@okhat okhat reopened this Nov 18, 2024
@chenmoneygithub
Copy link
Collaborator

@okhat We can save the adapter code with cloudpickle, so it's technically doable. But I feel our adapter change should not really cause performance downgrade? Conceptually it's just parsing the input and output, and if there is a true performance downgrade then it could indicate that we are doing something wrong. So instead of officially supporting serializing DSPy model together with Adapter code, we may want to ensure that newer-version adapter doesn't have negative performance effect?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants