Non-deterministic failure for test_pdf_to_opensearch #20

bsowell · 2023-09-13T18:42:27Z

Describe the bug
I have seen some non-deterministic failures in the test_pdf_to_opensearch test. The issue appears to be related to how guidance is returning results. In some cases, the returned map does not contain an answer key:

E                         File "/home/runner/work/sycamore/sycamore/sycamore/execution/transforms/entity/entity_extractor.py", line 41, in extract_entity
E                           document.properties.update({f"{self._entity_to_extract}": entities["answer"]})
E                                                                                     ~~~~~~~~^^^^^^^^^^
E                         File "/home/runner/.cache/pypoetry/virtualenvs/sycamore-QdKr-uPZ-py3.11/lib/python3.11/site-packages/guidance/_program.py", line 470, in __getitem__
E                           return self._variables[key]
E                                  ~~~~~~~~~~~~~~~^^^^^
E                       KeyError: 'answer'

From https://github.com/aryn-ai/sycamore/actions/runs/6176295926/job/16765129778#step:10:312

This looks to be non-deterministic, because the very next run of the exact same workflow succeeded:

Failed run: https://github.com/aryn-ai/sycamore/actions/runs/6176295926/attempts/1
Successful run: https://github.com/aryn-ai/sycamore/actions/runs/6176295926/attempts/2

The code is failing from here:

sycamore/sycamore/execution/transforms/entity/entity_extractor.py

Line 41 in a076d86

document.properties.update({f"{self._entity_to_extract}": entities["answer"]})

We will likely need to add some additional logging to get to the bottom of this.

The text was updated successfully, but these errors were encountered:

bsowell · 2023-09-25T01:53:03Z

Still happening: https://github.com/aryn-ai/sycamore/actions/runs/6293759018/job/17084907000. It is infrequent enough, I suspect it may be dependent on the result result returned by OpenAI.

bsowell · 2023-09-26T18:14:22Z

We traced this to throttling from OpenAI. Looks like we are already doing basic retries, but we should see if there is anything better we can do and at the very least surface the error more clearly

ChillOrb · 2023-10-01T05:48:26Z

Hello @bsowell,

I've been investigating the non-deterministic failures you've mentioned. I noticed that test_pdf_to_opensearch does not pass llm_kwargs here, and OpenAI is ultimately set here. I believe you're referring to the retry mechanism here?

So, my queries are:

Would it be beneficial to add a similar mechanism to _generate_using_guidance?

To enhance our retry mechanism, I'm considering using This guide

I would appreciate your thoughts on this. Please let me know if I am heading in the right direction.

ChillOrb · 2023-10-01T06:05:55Z

Also, i think this is related to #36

bsowell · 2023-10-01T22:18:44Z

Hey @ChillOrb! Thanks so much for taking a look.

So guidance, which is another library for interacting with LLMs, does have some retries enabled by default (e.g. see the default parameter here). Even with that, we still sometimes see timeouts. I do think eventually we want to move off of guidance and just use openai directly -- we aren't getting too much value, and I think we will want to customize the behavior, for example by using some of the techniques you linked. One place to start might be to see if we are doing the right thing in terms of retries for the non-guidance based access.

I do think this is slightly different than #36. That issue is about OpenSearch. When writing to OpenSearch we can sometimes overwhelm the cluster and need to add mechanisms to backoff. Here we are hitting issues access OpenAI. Kind of unfortunate in this project that everything starts with Open :).

ChillOrb · 2023-10-09T03:08:47Z

Hey @bsowell ,

Thanks for clarifying,

Code here looks fine to me.
Still a few things i think could be addressed and would love to get your opinion,
1. Deciding on number of max retries (currently 3)
2. Using exponential backoff which I think can be useful since I predict this error can be more frequent as one scales up.
3. Also will it be a good idea to check for content length here
4. What are your views on creating a custom retry backoff mechanism as mentioned here and calculate delay and retries based on headers here ?
5. Custom retry can also help in surfacing the error more clearly.
6. Is there a need for batching request parallelly? Ref: here .

Let me know if this is something that needs to be worked on right now.

Add example questions under sort-all

Need to use logging to get error messages out so the show up. Improve docker build speed by caching the npm stuff. I was just getting failures to install without this, npm install would hang indefinitely.

bsowell added the bug Something isn't working label Sep 15, 2023

eric-anderson pushed a commit that referenced this issue Feb 2, 2024

Update README.md (#20)

061ec37

Add example questions under sort-all

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-deterministic failure for test_pdf_to_opensearch #20

Non-deterministic failure for test_pdf_to_opensearch #20

bsowell commented Sep 13, 2023

bsowell commented Sep 25, 2023

bsowell commented Sep 26, 2023

ChillOrb commented Oct 1, 2023 •

edited

Loading

ChillOrb commented Oct 1, 2023

bsowell commented Oct 1, 2023

ChillOrb commented Oct 9, 2023 •

edited

Loading

Non-deterministic failure for test_pdf_to_opensearch #20

Non-deterministic failure for test_pdf_to_opensearch #20

Comments

bsowell commented Sep 13, 2023

bsowell commented Sep 25, 2023

bsowell commented Sep 26, 2023

ChillOrb commented Oct 1, 2023 • edited Loading

ChillOrb commented Oct 1, 2023

bsowell commented Oct 1, 2023

ChillOrb commented Oct 9, 2023 • edited Loading

ChillOrb commented Oct 1, 2023 •

edited

Loading

ChillOrb commented Oct 9, 2023 •

edited

Loading