Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible goroutine leak from bad-write daemon connection endings #343

Closed
rvagg opened this issue Jul 3, 2023 · 0 comments · Fixed by #346
Closed

Possible goroutine leak from bad-write daemon connection endings #343

rvagg opened this issue Jul 3, 2023 · 0 comments · Fixed by #346
Labels
bug Something isn't working tech debt Non-critical change that can wait to be addressed

Comments

@rvagg
Copy link
Member

rvagg commented Jul 3, 2023

#336 exposes some interesting behaviour that isn't fully resolved with #338 and I think warrants further investigation.

Running the failed retrieval as suggested in the issue (i.e. with a curl that refuses to write the binary output so terminates the connection) ends with this set of logs on the daemon:

2023-07-03T11:11:52.497+1000    DEBUG   lassie/retriever        retriever/retriever.go:290      retrieval-event {"code": "finished", "payloadCid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "storageProviderId": ""}
2023-07-03T11:11:52.497+1000    DEBUG   lassie/retriever        retriever/retriever.go:290      retrieval-event {"code": "failed", "payloadCid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "storageProviderId": "", "errorMessage": "context canceled"}
2023-07-03T11:11:52.497+1000    DEBUG   lassie/retriever        retriever/retriever.go:290      retrieval-event {"code": "failed-retrieval", "payloadCid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "storageProviderId": ""}
2023-07-03T11:11:52.497+1000    WARN    lassie/retriever        retriever/retriever.go:233      Failed to retrieve from miner  for bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y: context canceled
2023-07-03T11:11:52.497+1000    DEBUG   lassie/httpserver       http/ipfs.go:199        unclean close   {"cid": "bafkreifyxdnkixhcqz4agfx6jekd27ib7kwxvhaddbvh4ycj4n22jj5d7y", "retrievalID": "d3f163d7-7186-4656-9096-4ecfb647ea23"}
2023-07-03T11:11:52.497+1000    INFO    lassie/httpserver       http/ipfs.go:201        unable to send early termination        {"err": "flushing buff: write tcp 127.0.0.1:8989->127.0.0.1:60042: write: broken pipe"}
2023-07-03T11:11:52.598+1000    WARN    lassie/retriever        retriever/parallelpeerretriever.go:196  Unable to successfully cancel all retrieval attempts withing 100ms
2023-07-03T11:11:52.598+1000    WARN    lassie/retriever        retriever/parallelpeerretriever.go:196  Unable to successfully cancel all retrieval attempts withing 100ms

Those last two lines are the most interesting, the rest looks reasonable. We expect two of these if we're running graphsync and http retrievals for this request, that seems fine. But even if you expand retriever/parallelpeerretriever.go:196's timeout value to 1s, it still fails to properly clean up.

This is post context-cancellation, waiting for the active goroutines to terminate but giving them a bit of grace time.

It's possible we have something in our call stack that doesn't pay proper attention to a context cancellation and is stuck in a wait condition; likely leading to a goroutine leak, which is what this WARN output is supposed to highlight.

@rvagg rvagg added bug Something isn't working tech debt Non-critical change that can wait to be addressed labels Jul 3, 2023
rvagg added a commit that referenced this issue Jul 5, 2023
rvagg added a commit that referenced this issue Jul 5, 2023
hannahhoward pushed a commit that referenced this issue Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tech debt Non-critical change that can wait to be addressed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant