You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#336 exposes some interesting behaviour that isn't fully resolved with #338 and I think warrants further investigation.
Running the failed retrieval as suggested in the issue (i.e. with a curl that refuses to write the binary output so terminates the connection) ends with this set of logs on the daemon:
2023-07-03T11:11:52.497+1000 DEBUG lassie/retriever retriever/retriever.go:290 retrieval-event {"code": "finished", "payloadCid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "storageProviderId": ""}
2023-07-03T11:11:52.497+1000 DEBUG lassie/retriever retriever/retriever.go:290 retrieval-event {"code": "failed", "payloadCid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "storageProviderId": "", "errorMessage": "context canceled"}
2023-07-03T11:11:52.497+1000 DEBUG lassie/retriever retriever/retriever.go:290 retrieval-event {"code": "failed-retrieval", "payloadCid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "storageProviderId": ""}
2023-07-03T11:11:52.497+1000 WARN lassie/retriever retriever/retriever.go:233 Failed to retrieve from miner for bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y: context canceled
2023-07-03T11:11:52.497+1000 DEBUG lassie/httpserver http/ipfs.go:199 unclean close {"cid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "retrievalID": "d3f163d7-7186-4656-9096-4ecfb647ea23"}
2023-07-03T11:11:52.497+1000 INFO lassie/httpserver http/ipfs.go:201 unable to send early termination {"err": "flushing buff: write tcp 127.0.0.1:8989->127.0.0.1:60042: write: broken pipe"}
2023-07-03T11:11:52.598+1000 WARN lassie/retriever retriever/parallelpeerretriever.go:196 Unable to successfully cancel all retrieval attempts withing 100ms
2023-07-03T11:11:52.598+1000 WARN lassie/retriever retriever/parallelpeerretriever.go:196 Unable to successfully cancel all retrieval attempts withing 100ms
Those last two lines are the most interesting, the rest looks reasonable. We expect two of these if we're running graphsync and http retrievals for this request, that seems fine. But even if you expand retriever/parallelpeerretriever.go:196's timeout value to 1s, it still fails to properly clean up.
This is post context-cancellation, waiting for the active goroutines to terminate but giving them a bit of grace time.
It's possible we have something in our call stack that doesn't pay proper attention to a context cancellation and is stuck in a wait condition; likely leading to a goroutine leak, which is what this WARN output is supposed to highlight.
The text was updated successfully, but these errors were encountered:
#336 exposes some interesting behaviour that isn't fully resolved with #338 and I think warrants further investigation.
Running the failed retrieval as suggested in the issue (i.e. with a
curl
that refuses to write the binary output so terminates the connection) ends with this set of logs on the daemon:Those last two lines are the most interesting, the rest looks reasonable. We expect two of these if we're running graphsync and http retrievals for this request, that seems fine. But even if you expand retriever/parallelpeerretriever.go:196's timeout value to 1s, it still fails to properly clean up.
This is post context-cancellation, waiting for the active goroutines to terminate but giving them a bit of grace time.
It's possible we have something in our call stack that doesn't pay proper attention to a context cancellation and is stuck in a wait condition; likely leading to a goroutine leak, which is what this WARN output is supposed to highlight.
The text was updated successfully, but these errors were encountered: