-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
429 errors with concurrent mode disabled #152
Comments
@wjdp would you be open to accepting a PR that added a self-imposed rate limit for certain URLs? e.g. something like this (what do you think of the general concept if someone else was to open the change?) RateLimit:
- URL: "github.com"
MaxRequests: 10
Within: 1s |
Hi @ChrisChinchilla 👋From "works locally, fails on CI" I suspect your local broadband bottlenecks you enough that rate limiting isn't a problem there but the fat pipe at your CI provider allows for a much more effective DoS directed at github 😉 hence the 429s. This is a problem with htmltest, right now we just scale to whatever bandwidth is available even if that hits external rate limits. Hey @robskillington 👋Yes, this sounds like the solution we need! I think we probably want a generic option first rather than a specific option but am open to either really. Very happy for someone else to pick this up! Generic optionRate limit connections per hostname without needing to specify it per host in config. You'd set a limit in config, or perhaps we have a sensible default, which is applied to all hostnames. Specific optionAs you suggest. Or perhaps both? A default and custom overrides. Config namingLikely the most contentious part of a PR for this would be the config names.
Perhaps something like RateLimit:
- Default: true
MaxRequests: 20
Within: 1s [optional]
- URL: "github.com"
MaxRequests: 10
Within: 1s [optional] What do you think? Other solutionsA thought I had half way through writing this reply: If we see a 429 we could do an exponential backoff to attempt to recover. This would solve the problem without having to resort to hard-coded rate limits which could still break by hitting unpredictable rate limits. Arguably this could be a better solution as it will fix the 429 problem for everyone without them having to set up rate limits nor htmltest having to impose a default rate limit. We may find ourselves in a position of needing to do both as some servers likely aren't as nice as GitHub with their status codes though. I'm sure there's a Go package that could handle this for us somewhat easily. A cursory Google throws up https://github.com/cenkalti/backoff |
OK thanks @wjdp we (@robskillington and I work together) ended up getting it working on CI by reducing the links checked to a bare minimum, but hopefully Rob's suggestion will solve the issue for others in the long term :) |
I wonder if HEAD requests are not affected by rate limiting? I find that using Get with Range=0 is not great for link checking... |
Hi @robskillington |
I'm getting this too. Doesn't just apply to GitHub but to lots of other endpoints too. |
I have just run into this in a major way trying to check links on https://afterglow-guide.deepsymmetry.org/ since it has many dozens of links to source code lines in the GitHub repository. For now, I am going to have to disable checking them, but it would be very nice if 429s could lead to per-host exponential backoff (or tailored backoff if an interval could be parsed from the 429 response, but that is a nice-to-have compared to the necessity of just having a queue to retry these links at progressively longer intervals). |
Describe the bug
I can see there are similar reports on this issue (#105), but I have the same result even without concurrent mode enabled. Interestingly, the checks work beautifully when run locally, but we get the error when it runs on CI jobs.
etc. And yes, it's just GitHub links.
To Reproduce
htmltest is running in CI over a Hugo site like this.
.htmltest.yml
Please copy in your config file
Versions
The text was updated successfully, but these errors were encountered: