Completed traces locked/frozen in pending status. Option to delete individual traces? #453

tgram-3D · 2024-02-16T21:28:03Z

I just started using LangSmith, and it's awesome, but certain traces are locking up in "pending" status even after they complete successfully. It's happening on runs with higher concurrency parameters/more nested runs. Is this something you've run into before?

Also, there doesn't seem to be an option to delete individual traces from a project, so to clear these pending "lock ups" I have to delete and recreate the whole project. Could deleting traces maybe be a future feature request?

hinthornw · 2024-02-16T21:33:19Z

What version of langsmith are you using?

(and langchain, if you're using this)

tgram-3D · 2024-02-16T21:58:53Z

@hinthornw I haven't updated anything since last week, I think. Current versions are langchain 0.1.5 and langsmith 0.0.87. Another thing I noticed is if I did a keyboard interrupt on a run and it also froze in pending status. I will do an update on Monday.

hinthornw · 2024-02-17T01:43:22Z

We fixed a couple issues with pending runs in more recent versions. Could you let me know if the issue persists after upgrading?

tgram-3D · 2024-02-19T16:30:37Z

@hinthornw I'm getting pip dependency errors with langchain when trying to update langsmith to 0.1.2, or any version above 0.0.87.

I updated langchain to 0.1.7, then had to downgrade to 0.1.6 and downgrade langchain-community to 0.0.19 due to this current pwd import issue with the PebbloSafeLoader in langchain_community.document_loaders:

(langchain-ai/langchain#17514)

Then when I try to update langsmith I get this:

langchain 0.1.6 requires langsmith<0.1,>=0.0.83, but you have langsmith 0.1.2 which is incompatible.
langchain-community 0.0.19 requires langsmith<0.1,>=0.0.83, but you have langsmith 0.1.2 which is incompatible.
langchain-core 0.1.23 requires langsmith<0.0.88,>=0.0.87, but you have langsmith 0.1.2 which is incompatible.

My code still works fine with langsmith 0.1.2 though, and I have not had any issues with pending runs yet. I just ran the same script that locked up in pending on Friday with the same concurrency parameters and the run completed successfully.

I did get a new error though while running that script:

LangSmithConnectionError('Connection error caused failure to post https://api.smith.langchain.com/runs/batch in LangSmith API. Please confirm your LANGCHAIN_ENDPOINT. SSLError(MaxRetryError("HTTPSConnectionPool(host=\'api.smith.langchain.com\', port=443): Max retries exceeded with url: /runs/batch (Caused by SSLError(SSLEOFError(8, \'EOF occurred in violation of protocol (_ssl.c:2406)\')))"))')

This error popped up twice during a processing step that iterates through a ton of text and makes ~30 concurrent calls to OpenAI each iteration. The traces from the nested runs were logged successfully though, so maybe everything's all good?

Thank you for the help.

vishal-git · 2024-02-25T14:14:35Z

What version of langsmith are you using?

(and langchain, if you're using this)

I am having the same issue. Several runs and traces are frozen in "pending" status. The status hasn't changed even after a couple of days.

Here's the version info:

 name         : langsmith
 version      : 0.1.5
 description  : Client library to connect to the LangSmith LLM Tracing and Evaluation Platform.

dependencies
 - pydantic >=1,<3
 - requests >=2,<3

required by
 - langchain >=0.1.0,<0.2.0
 - langchain-community >=0.1.0,<0.2.0
 - langchain-core >=0.1.0,<0.2.0

And,

 name         : langchain
 version      : 0.1.9
 description  : Building applications with LLMs through composability

dependencies
 - aiohttp >=3.8.3,<4.0.0
 - dataclasses-json >=0.5.7,<0.7
 - jsonpatch >=1.33,<2.0
 - langchain-community >=0.0.21,<0.1
 - langchain-core >=0.1.26,<0.2
 - langsmith >=0.1.0,<0.2.0
 - numpy >=1,<2
 - pydantic >=1,<3
 - PyYAML >=5.3
 - requests >=2,<3
 - SQLAlchemy >=1.4,<3
 - tenacity >=8.1.0,<9.0.0

required by
 - ragas *

hinthornw · 2024-02-27T21:40:50Z

@vishal-git this is because the span end never made it to the server. they will never be marked finished unless they are patched with an end time.

vishal-git · 2024-02-27T22:02:59Z

Okay, that makes sense. Thank you for replying.

Can you please advise how to set an end time to avoid this situation? This is happening way too often and we are stuck with so many traces (and runs) in 'pending' mode.

snlamm · 2024-02-29T07:46:15Z

I'm running into a similar issue as well, even using the latest version of node.js langsmith - 0.1.8.
Chains get stuck in pending state (even as they successfully complete). For longer chains, children runs stop being logged after the chain reaches a larger size.
The issue seems to have started relatively recently (at least that I've noticed).

snlamm · 2024-02-29T08:43:27Z

if it's helpful, I'm also seeing some of these logs (I'm not sure if they can be safely ignored or not):

Error in handler LangChainTracer, handleChainEnd: Error: Failed to update run: 409 Conflict {"detail":"Payloads already received"}

hinthornw · 2024-03-19T23:07:10Z

Hmm will forward to our JS folks - less familiar with some of the corner cases in node

snlamm · 2024-03-20T08:23:14Z

@hinthornw thanks for checking back! for me, this has since resolved 👌

snlamm · 2024-03-20T08:25:43Z

though, if it's helpful, I'm still seeing the Error in handler LangChainTracer, handleChainEnd: Error: Failed to update run: 409 Conflict {"detail":"Payloads already received"} when using node.js langgraph (v0.0.10) with langsmith (v0.1.8)

athmedata · 2024-04-01T12:04:44Z

I am getting similar pending RunnableSequence issue with langsmith 0.1.38 and langchain 0.1.13, but this time with this error

Failed to batch ingest runs: LangSmithError('Failed to post https://api.smith.langchain.com/runs/batch in LangSmith API. HTTPError(\'422 Client Error: unknown for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"start_time must be an ISO 8601 timestamp"}\')')

also calculating start_time on langsmith.Client() looks deprecated
datetime.datetime.utcnow()

AyatKhraisat · 2024-04-01T13:04:19Z

I have similar issue

Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to post https://api.smith.langchain.com/runs/batch in LangSmith API. Please confirm your LANGCHAIN_ENDPOINT. ConnectionError(MaxRetryError("HTTPSConnectionPool(host='api.smith.langchain.com', port=443): Max retries exceeded with url: /runs/batch (Caused by ProtocolError('Connection aborted.', timeout('The write operation timed out')))"))')

LuciferLuther · 2024-04-03T21:00:31Z

I have the same issue as well, and I have not been able to solve it yet.
I'm using the latest version of both langsmith and langchain.

sergiovadyen · 2024-04-15T13:28:47Z

I have a similar issue:

Failed to batch ingest runs: LangSmithError('Failed to post https://api.smith.langchain.com/runs/batch in LangSmith API. HTTPError(\'422 Client Error: unknown for url: https://api.smith.langchain.com/runs/batch\', \'{"detail":"start_time must be an ISO 8601 timestamp"}\')')

hinthornw · 2024-04-16T20:25:51Z

Thank you all for your patience. @sergiovadyen and @thmedata do you have a code snippet I can use to reproduce your 422's?

422 is a bit different than a connection error, which is potentially different from the 409 errors. The latter two are possibly related though.

athmedata · 2024-04-17T10:54:09Z

@hinthornw
I use langsmith with langchain

by just importing langsmith and calling

langsmith_client = langsmith.Client()

the error happened after I upgraded the Langsmith package

hinthornw · 2024-05-08T05:39:19Z

@athmedata that sounds like an unrelted error, probably related to API keys?

[edited] OK interesting. So the same code; different langsmith versions; now it's getting a 422.

athmedata · 2024-05-12T06:54:52Z

@hinthornw
I don't think it is related to API keys because I work with 2 environments with 2 versions of Langsmith, and the updated version only has the problem start_time timestamp issue

RobertCorwin-AustinAI · 2024-06-28T21:04:27Z

This is still happening for us but only after upgrading to 0.2.6 from 0.0.333: Failed to batch ingest runs: LangSmithError('Failed to POST https://api.smith.langchain.com/runs/batch in LangSmith API. HTTPError('422 Client Error: unknown for url: https://api.smith.langchain.com/runs/batch\', '{"detail":"start_time must be an ISO 8601 timestamp"}')')

same API keys, etc. LangSmith version 0.1.82

hinthornw · 2024-06-30T16:32:09Z

@RobertCorwin-AustinAI could you share a code snippet to help us debug?

Pending runs occur when the run patch event doesn't make it to the server. The reason for that is usually connectivity or rate limiting related, but in this case it seems like some issue with how the timestamp is being passed to the client.

There's clearly something I need to fix or clarify; I'm having a hard time reproducing this particular case though, since all the timestamps we create in the lib use datetime.now(timezone.utc).isoformat()

RobertCorwin-AustinAI · 2024-07-01T01:39:45Z

I know you'll hate hearing this but it appears to happen at random... although we're using an agent and also a chain in a tool and it doesn't seem to happen for the "AgentExecutor" but it does happen in the "Runnable Sequence" which involves calling parallel search functions - RunnableParallel. maybe the parallelism has something to do with it? Is there some way we can monitor the actual calls to the LangSmith API? It seems to me that might help a lot. I actually tried to do that using a packet sniffer on my network card but it was taking too much time. Is there some way to log the actual calls in a verbose debug mode or something? thx

hinthornw · 2024-09-06T23:41:01Z

Going to close as stale and likely overlapping with #808

hinthornw added the bug Something isn't working label Apr 10, 2024

hinthornw mentioned this issue Apr 16, 2024

insert runs error #604

Closed

hinthornw closed this as completed Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completed traces locked/frozen in pending status. Option to delete individual traces? #453

Completed traces locked/frozen in pending status. Option to delete individual traces? #453

tgram-3D commented Feb 16, 2024

hinthornw commented Feb 16, 2024

tgram-3D commented Feb 16, 2024

hinthornw commented Feb 17, 2024

tgram-3D commented Feb 19, 2024 •

edited

Loading

vishal-git commented Feb 25, 2024

hinthornw commented Feb 27, 2024

vishal-git commented Feb 27, 2024

snlamm commented Feb 29, 2024 •

edited

Loading

snlamm commented Feb 29, 2024 •

edited

Loading

hinthornw commented Mar 19, 2024 •

edited

Loading

snlamm commented Mar 20, 2024 •

edited

Loading

snlamm commented Mar 20, 2024

athmedata commented Apr 1, 2024 •

edited

Loading

AyatKhraisat commented Apr 1, 2024

LuciferLuther commented Apr 3, 2024 •

edited

Loading

sergiovadyen commented Apr 15, 2024

hinthornw commented Apr 16, 2024

athmedata commented Apr 17, 2024 •

edited

Loading

hinthornw commented May 8, 2024 •

edited

Loading

athmedata commented May 12, 2024

RobertCorwin-AustinAI commented Jun 28, 2024

hinthornw commented Jun 30, 2024 •

edited

Loading

RobertCorwin-AustinAI commented Jul 1, 2024

hinthornw commented Sep 6, 2024

Completed traces locked/frozen in pending status. Option to delete individual traces? #453

Completed traces locked/frozen in pending status. Option to delete individual traces? #453

Comments

tgram-3D commented Feb 16, 2024

hinthornw commented Feb 16, 2024

tgram-3D commented Feb 16, 2024

hinthornw commented Feb 17, 2024

tgram-3D commented Feb 19, 2024 • edited Loading

vishal-git commented Feb 25, 2024

hinthornw commented Feb 27, 2024

vishal-git commented Feb 27, 2024

snlamm commented Feb 29, 2024 • edited Loading

snlamm commented Feb 29, 2024 • edited Loading

hinthornw commented Mar 19, 2024 • edited Loading

snlamm commented Mar 20, 2024 • edited Loading

snlamm commented Mar 20, 2024

athmedata commented Apr 1, 2024 • edited Loading

AyatKhraisat commented Apr 1, 2024

LuciferLuther commented Apr 3, 2024 • edited Loading

sergiovadyen commented Apr 15, 2024

hinthornw commented Apr 16, 2024

athmedata commented Apr 17, 2024 • edited Loading

hinthornw commented May 8, 2024 • edited Loading

athmedata commented May 12, 2024

RobertCorwin-AustinAI commented Jun 28, 2024

hinthornw commented Jun 30, 2024 • edited Loading

RobertCorwin-AustinAI commented Jul 1, 2024

hinthornw commented Sep 6, 2024

tgram-3D commented Feb 19, 2024 •

edited

Loading

snlamm commented Feb 29, 2024 •

edited

Loading

snlamm commented Feb 29, 2024 •

edited

Loading

hinthornw commented Mar 19, 2024 •

edited

Loading

snlamm commented Mar 20, 2024 •

edited

Loading

athmedata commented Apr 1, 2024 •

edited

Loading

LuciferLuther commented Apr 3, 2024 •

edited

Loading

athmedata commented Apr 17, 2024 •

edited

Loading

hinthornw commented May 8, 2024 •

edited

Loading

hinthornw commented Jun 30, 2024 •

edited

Loading