Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Amazon flaky results for unknown URLs #316

Open
ascheman opened this issue Jan 20, 2024 · 1 comment
Open

Bug: Amazon flaky results for unknown URLs #316

ascheman opened this issue Jan 20, 2024 · 1 comment
Assignees
Labels

Comments

@ascheman
Copy link
Member

ascheman commented Jan 20, 2024

Amazon seems to behave differently for unknown URLs depending on misc. request parameters.
Currently I run into test errors with the test case BrokenHttpLinksCheckerSpec:bad amazon link is identified as problem.
It seems to work in GitHub actions but fails on my local machine, either from single test execution from IDE (IntelliJ) as well as from a full gradlew test run.

I could track it down to the following behaviour:

  • When executed locally, Amazon returns a status 200 and requires a captcha resolution. The test case requires a 503 return code which results in a finding found by the HSC checker.
  • When executed in GitHub it seems to work as expected, returning a 503 (unfortunately we do not yet have some logging of results available).

Locally I could further change the behaviour of Amazon by setting the User-Agent header of the request.
This could even be implemented with curl

  • curl -X HEAD -v https://www.amazon.com/dp/4242424242 uses curl's default User-Agent (curl/8.4.0 in my case) and returns a 503 (the same holds true for GET requests)
  • Using curl with the default HSC User-Agent header "Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0": curl -H "User-Agent: Mozilla/5.0 (X11; Linux i686; rv:10.0) Gecko/20100101 Firefox/10.0" -X GET -v https://www.amazon.com/dp/4242424242 returns a status 200 and a captcha request
image

Cf. bug-316.zip

Perhaps this is similar to the the behaviour we see in #219?

I suggest to set the User-Agent header to something HSC specific (e.g, hsc/version).

@ascheman ascheman added the bug label Jan 20, 2024
@ascheman ascheman self-assigned this Jan 20, 2024
ascheman added a commit that referenced this issue Jan 20, 2024
ascheman added a commit that referenced this issue Jan 20, 2024
ascheman added a commit that referenced this issue Jan 20, 2024
ascheman added a commit that referenced this issue Jan 21, 2024
#316 Use product version for user agent
@ascheman
Copy link
Member Author

For whatever reason the problem mostly occurs locally (but seldomly also during GitHub action build).

ascheman added a commit that referenced this issue Mar 26, 2024
and fix other minor issues
ascheman added a commit that referenced this issue Mar 26, 2024
and fix other minor issues
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant