Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

S3 crt client HeadObject call freezes. #3098

Open
Manjunathagopi opened this issue Aug 30, 2024 · 8 comments
Open

S3 crt client HeadObject call freezes. #3098

Manjunathagopi opened this issue Aug 30, 2024 · 8 comments
Labels
bug This issue is a bug. p2 This is a standard priority issue

Comments

@Manjunathagopi
Copy link

Describe the bug

We are running a service where we read data from s3 parallelly(multithreading), one day we saw all the threads are still running as head response from S3 freezes with no response ever returned(All new threads are also in struck state leading to accumulation of lot of threads). However this issue is fixed once we restarted the service.

Unfortunately, we have not been able to reproduce this issue since. While investigating, we found a similar issue in the AWS SDK for JavaScript here. Could you please confirm if this is indeed a similar issue? The suggested solution in that case was to configure an HTTP timeout.

We considered doing the same, but we discovered that the S3 CRT client does not honor timeout configurations, as mentioned in this issue. Could you provide information on when the AWS S3 CRT client will support timeout configurations? This support is crucial to ensure that we do not encounter S3 API call freezes in the future.

Expected Behavior

HeadObject call shouldnot freeze

Current Behavior

HeadObject call freezes.

Reproduction Steps

Unable to reproduce, but its better to configure HTTP timeout

Possible Solution

No response

Additional Information/Context

No response

AWS CPP SDK version used

1.11.269

Compiler and Version used

gcc (GCC) 4.8.5

Operating System and version

CentOS Linux and version 7

@Manjunathagopi Manjunathagopi added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Aug 30, 2024
@jmklix
Copy link
Member

jmklix commented Aug 30, 2024

Thanks for taking the time to look for older similar issues, but it's hard to tell if your situation is similar to the 10 year old js-v1 issue. I don't have any timeline for when timeout configurations might be supported by the CRT client. I would recommend that you 👍 the feature request, because that helps us when prioritizing new feature requests.

We can also look more into why you are seeing the HeadObject call freeze if you give us more info:

@jmklix jmklix added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. p2 This is a standard priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Aug 30, 2024
@Manjunathagopi
Copy link
Author

Manjunathagopi commented Sep 4, 2024

Hello,
@jmklix Unfortunately we are unable to reproduce this issue regularly so we plan of running some overnight testing with aws cpp SDK trace level logging enabled. Luckily same issue is seen again.
You can find trace level logs from this link this(expires in 7days).
NOTE: Issue started happening from the below log
2024-09-04T13:09:31+05:30 2024-09-04T07:39:31.129857333Z stdout F Sep 4 13:09:31.129 668_001 app: ,INFO, com.amagi.darti.s3_reader.buffered_reader, Opening s3://amagicloud-onecp-sigma8/Media/S3/668/+240211472001XA+.mxf
Note above mentioned log is our custom log for headobject call this is just there for reference.

@DmitriyMusatkin
Copy link
Contributor

What version of Curl are you building against and can you try building against newer curl.
From logs it looks like, curl is trying to establish a new connection, sees existing one in the pool, but determines its dead, so it tries to obtain new ips from dns and gets stuck in the loop there. We dont have a lot of custom code around dns resolution in cpp sdk, so it might be due to some sort of bug in curl.

Note: for s3 crt client, crt is only used in the put/get apis, all the other apis still go through the regular curl based implementation (and timeout settings will apply to those as usual)

@jmklix jmklix added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. and removed response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. labels Sep 4, 2024
@Manjunathagopi
Copy link
Author

Manjunathagopi commented Sep 4, 2024

@DmitriyMusatkin

  1. we are using curl version curl 7.29.0
  2. So configuring timeout settings will cancel curl based operations if it gets struck and eventually headobject call will return with error? and this applies even for crt client?

@DmitriyMusatkin
Copy link
Contributor

curl 7.29.0 is over a decade old at this point. i would not be surprised if it has some issues with mva dns.
yes, timeout should apply in this case. looks like its having issue with obtaining ips to establish connection, so connection timeout should stop it attempting.
as i mentioned HeadObject in s3-crt cpp client does not actually use crt under the covers and goes through regular s3 client path.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Sep 5, 2024
@jmklix jmklix added the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Sep 5, 2024
Copy link

Greetings! It looks like this issue hasn’t been active in longer than a week. We encourage you to check if this is still an issue in the latest release. Because it has been longer than a week since the last update on this, and in the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or add an upvote to prevent automatic closure, or if the issue is already closed, please feel free to open a new one.

@github-actions github-actions bot added closing-soon This issue will automatically close in 4 days unless further comments are made. closed-for-staleness and removed closing-soon This issue will automatically close in 4 days unless further comments are made. labels Sep 15, 2024
@Manjunathagopi
Copy link
Author

Manjunathagopi commented Sep 24, 2024

Hi @DmitriyMusatkin we are using centos7 for which curl 7.29.0 is the latest version in centos7.
I also configured timeout configs still headobject freeze issue is seen.
Can you please suggest some solution to fix this?

@DmitriyMusatkin
Copy link
Contributor

Im not sure there is an easy fix for this. From the logs it looks like curl acquires connection from a pool, realizes that it has been closed (because it probably hit idle timeout on s3 side), tries to acquire a new connection and then stops after dns resolution. We haven't observed this behavior with newer versions of curl and we don't automatically test versions of curl that old.
So unfortunately, it sounds like the only path forward would be for someone to repro it with that version of curl and figure out whats going on curl side and see if sdk can do any mitigations to avoid that behavior.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p2 This is a standard priority issue
Projects
None yet
Development

No branches or pull requests

3 participants