Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completed traces locked/frozen in pending status. Option to delete individual traces? #453

Closed
tgram-3D opened this issue Feb 16, 2024 · 24 comments
Labels
bug Something isn't working

Comments

@tgram-3D
Copy link

I just started using LangSmith, and it's awesome, but certain traces are locking up in "pending" status even after they complete successfully. It's happening on runs with higher concurrency parameters/more nested runs. Is this something you've run into before?

Also, there doesn't seem to be an option to delete individual traces from a project, so to clear these pending "lock ups" I have to delete and recreate the whole project. Could deleting traces maybe be a future feature request?

@hinthornw
Copy link
Collaborator

What version of langsmith are you using?

(and langchain, if you're using this)

@tgram-3D
Copy link
Author

@hinthornw I haven't updated anything since last week, I think. Current versions are langchain 0.1.5 and langsmith 0.0.87. Another thing I noticed is if I did a keyboard interrupt on a run and it also froze in pending status. I will do an update on Monday.

@hinthornw
Copy link
Collaborator

We fixed a couple issues with pending runs in more recent versions. Could you let me know if the issue persists after upgrading?

@tgram-3D
Copy link
Author

tgram-3D commented Feb 19, 2024

@hinthornw I'm getting pip dependency errors with langchain when trying to update langsmith to 0.1.2, or any version above 0.0.87.

I updated langchain to 0.1.7, then had to downgrade to 0.1.6 and downgrade langchain-community to 0.0.19 due to this current pwd import issue with the PebbloSafeLoader in langchain_community.document_loaders:

(langchain-ai/langchain#17514)

Then when I try to update langsmith I get this:

langchain 0.1.6 requires langsmith<0.1,>=0.0.83, but you have langsmith 0.1.2 which is incompatible.
langchain-community 0.0.19 requires langsmith<0.1,>=0.0.83, but you have langsmith 0.1.2 which is incompatible.
langchain-core 0.1.23 requires langsmith<0.0.88,>=0.0.87, but you have langsmith 0.1.2 which is incompatible.

My code still works fine with langsmith 0.1.2 though, and I have not had any issues with pending runs yet. I just ran the same script that locked up in pending on Friday with the same concurrency parameters and the run completed successfully.

I did get a new error though while running that script:

LangSmithConnectionError('Connection error caused failure to post https://api.smith.langchain.com/runs/batch in LangSmith API. Please confirm your LANGCHAIN_ENDPOINT. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2406)\')))"))')

This error popped up twice during a processing step that iterates through a ton of text and makes ~30 concurrent calls to OpenAI each iteration. The traces from the nested runs were logged successfully though, so maybe everything's all good?

Thank you for the help.

@vishal-git
Copy link

What version of langsmith are you using?

(and langchain, if you're using this)

I am having the same issue. Several runs and traces are frozen in "pending" status. The status hasn't changed even after a couple of days.

Here's the version info:

 name         : langsmith
 version      : 0.1.5
 description  : Client library to connect to the LangSmith LLM Tracing and Evaluation Platform.

dependencies
 - pydantic >=1,<3
 - requests >=2,<3

required by
 - langchain >=0.1.0,<0.2.0
 - langchain-community >=0.1.0,<0.2.0
 - langchain-core >=0.1.0,<0.2.0

And,

 name         : langchain
 version      : 0.1.9
 description  : Building applications with LLMs through composability

dependencies
 - aiohttp >=3.8.3,<4.0.0
 - dataclasses-json >=0.5.7,<0.7
 - jsonpatch >=1.33,<2.0
 - langchain-community >=0.0.21,<0.1
 - langchain-core >=0.1.26,<0.2
 - langsmith >=0.1.0,<0.2.0
 - numpy >=1,<2
 - pydantic >=1,<3
 - PyYAML >=5.3
 - requests >=2,<3
 - SQLAlchemy >=1.4,<3
 - tenacity >=8.1.0,<9.0.0

required by
 - ragas *

@hinthornw
Copy link
Collaborator

@vishal-git this is because the span end never made it to the server. they will never be marked finished unless they are patched with an end time.

@vishal-git
Copy link

Okay, that makes sense. Thank you for replying.

Can you please advise how to set an end time to avoid this situation? This is happening way too often and we are stuck with so many traces (and runs) in 'pending' mode.

@snlamm
Copy link
Contributor

snlamm commented Feb 29, 2024

I'm running into a similar issue as well, even using the latest version of node.js langsmith - 0.1.8.
Chains get stuck in pending state (even as they successfully complete). For longer chains, children runs stop being logged after the chain reaches a larger size.
The issue seems to have started relatively recently (at least that I've noticed).

@snlamm
Copy link
Contributor

snlamm commented Feb 29, 2024

if it's helpful, I'm also seeing some of these logs (I'm not sure if they can be safely ignored or not):

Error in handler LangChainTracer, handleChainEnd: Error: Failed to update run: 409 Conflict {"detail":"Payloads already received"}

@hinthornw
Copy link
Collaborator

hinthornw commented Mar 19, 2024

Hmm will forward to our JS folks - less familiar with some of the corner cases in node

@snlamm
Copy link
Contributor

snlamm commented Mar 20, 2024

@hinthornw thanks for checking back! for me, this has since resolved 👌

@snlamm
Copy link
Contributor

snlamm commented Mar 20, 2024

though, if it's helpful, I'm still seeing the Error in handler LangChainTracer, handleChainEnd: Error: Failed to update run: 409 Conflict {"detail":"Payloads already received"} when using node.js langgraph (v0.0.10) with langsmith (v0.1.8)

@athmedata
Copy link

athmedata commented Apr 1, 2024

I am getting similar pending RunnableSequence issue with langsmith 0.1.38 and langchain 0.1.13, but this time with this error

Failed to batch ingest runs: LangSmithError('Failed to post https://api.smith.langchain.com/runs/batch in LangSmith API. HTTPError(\'422 Client Error: unknown for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"start_time must be an ISO 8601 timestamp"}\')')

also calculating start_time on langsmith.Client() looks deprecated
datetime.datetime.utcnow()

@AyatKhraisat
Copy link

I have similar issue

Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to post https://api.smith.langchain.com/runs/batch in LangSmith API. Please confirm your LANGCHAIN_ENDPOINT. ConnectionError(MaxRetryError("HTTPSConnectionPool(host='api.smith.langchain.com', port=443): Max retries exceeded with url: /runs/batch (Caused by ProtocolError('Connection aborted.', timeout('The write operation timed out')))"))')

@LuciferLuther
Copy link

LuciferLuther commented Apr 3, 2024

I have the same issue as well, and I have not been able to solve it yet.
I'm using the latest version of both langsmith and langchain.

@hinthornw hinthornw added the bug Something isn't working label Apr 10, 2024
@sergiovadyen
Copy link

I have a similar issue:

Failed to batch ingest runs: LangSmithError('Failed to post https://api.smith.langchain.com/runs/batch in LangSmith API. HTTPError(\'422 Client Error: unknown for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"start_time must be an ISO 8601 timestamp"}\')')

@hinthornw
Copy link
Collaborator

Thank you all for your patience. @sergiovadyen and @thmedata do you have a code snippet I can use to reproduce your 422's?

422 is a bit different than a connection error, which is potentially different from the 409 errors. The latter two are possibly related though.

@athmedata
Copy link

athmedata commented Apr 17, 2024

@hinthornw
I use langsmith with langchain

by just importing langsmith and calling

langsmith_client = langsmith.Client()

the error happened after I upgraded the Langsmith package

@hinthornw
Copy link
Collaborator

hinthornw commented May 8, 2024

@athmedata that sounds like an unrelted error, probably related to API keys?

[edited] OK interesting. So the same code; different langsmith versions; now it's getting a 422.

@athmedata
Copy link

@hinthornw
I don't think it is related to API keys because I work with 2 environments with 2 versions of Langsmith, and the updated version only has the problem start_time timestamp issue

@RobertCorwin-AustinAI
Copy link

This is still happening for us but only after upgrading to 0.2.6 from 0.0.333: Failed to batch ingest runs: LangSmithError('Failed to POST https://api.smith.langchain.com/runs/batch in LangSmith API. HTTPError('422 Client Error: unknown for url: https://api.smith.langchain.com/runs/batch\', '{"detail":"start_time must be an ISO 8601 timestamp"}')')

same API keys, etc. LangSmith version 0.1.82

@hinthornw
Copy link
Collaborator

hinthornw commented Jun 30, 2024

@RobertCorwin-AustinAI could you share a code snippet to help us debug?

Pending runs occur when the run patch event doesn't make it to the server. The reason for that is usually connectivity or rate limiting related, but in this case it seems like some issue with how the timestamp is being passed to the client.

There's clearly something I need to fix or clarify; I'm having a hard time reproducing this particular case though, since all the timestamps we create in the lib use datetime.now(timezone.utc).isoformat()

@RobertCorwin-AustinAI
Copy link

I know you'll hate hearing this but it appears to happen at random... although we're using an agent and also a chain in a tool and it doesn't seem to happen for the "AgentExecutor" but it does happen in the "Runnable Sequence" which involves calling parallel search functions - RunnableParallel. maybe the parallelism has something to do with it? Is there some way we can monitor the actual calls to the LangSmith API? It seems to me that might help a lot. I actually tried to do that using a packet sniffer on my network card but it was taking too much time. Is there some way to log the actual calls in a verbose debug mode or something? thx

@hinthornw
Copy link
Collaborator

Going to close as stale and likely overlapping with #808

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

9 participants