-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define unique user-agent #107
Comments
So far, we've tried to encourage configurations that did not depend on any feature of the client, especially things like the user agent or source IP. The HSTS preload list website has no promises (only requirements), and there are no guarantees any particular part of the system will remain the same in the future.
It is safest helps if every (modern) browser and everyone connecting using other libraries (e.g. via the hstspreload commandline tool) gets the same responses with the same headers. In particular, a site will not end up on Mozilla's HSTS preload list unless their scanner is able to observe the header. This also brings to mind the issue is that people copy-paste recommendations from others on the internet. If these "recommended" configurations have a specific user agent they can start sniffing, preloading issues could become very difficult to debug. (I try to track bad recommendations on the web and ask their owners to improve them, but this is a manual, imperfect process.) For these reasons, I'm against tweaking the user agent. However, I'll let @nharper have final say about it. |
In my case, having a custom user agent would have prevented the bot from being blacklisted. Being the default |
@nharper, do you have an opinion about this either way? |
I don't think the hstspreload tool needs to specify its own UA. In general, I don't like servers sniffing UA strings. |
Reopening this issue as we now have a more compelling reason to reconsider custom UA strings. The outbound scans used by hstspreload.org and the bulk updates to check for preloading eligibility have started to be blocked by several CDNs' spam/fraud detection. These CDNs only offer allowlisting by (User-Agent, ASN)-tuples, and they are understandably not a fan of allowlisting the default go UA string. While I agree that we do not want to encourage UA sniffing generally speaking, I don't think we have many other options here. Once we get a custom string, we still need to reach out to the affected CDNs to start the process of unblocking. I'd not seen this discussion before filing #118 and tweaking the UA string to "user-agent: hsts-preload-bot" on a new branch, but I'm happy to settle on a more amenable custom string, if folks have a strong opinion on what it should be. |
I think now's a good time! This has turned int something people ask for more regularly. We could still discourage UA detection by scanning using the default user agent first, and e.g. redoing the whole scan using the custom UA if there is a relevant failure. |
If we want to change the UA string from the default golang string, we could also consider scanning using a few common UA strings from browsers. This way the behavior observed by the probe would more closely match what browsers would see in the real world. |
Sounds like a good idea! Maybe also include things like |
From my perspective, scanning multiple times with multiple UAs feels a little silly to me. The number one reason to scan at all is to ensure that the site has authorized preloading. As long as that works with any UA, I'm comfortable saying that we're authorized. Conversely, scanning n times adds a bunch of additional complexity. Besides the obvious code complexity, there's also more legwork for maintainers when the emails with harder-to-debug failures start rolling in. That's not to say that I'm 100% opposed to this, but I'm not totally clear on what problem scanning with multiple UAs would actually solve. |
@jdeblasio hstspreload.org has always scanned for more than the super-basic requirements, and issued errors or warnings for practices that could leave users unprotected. If a site is dynamically calculating whether to send an HSTS header, then users with a client that doesn't have preloaded HSTS are more likely to be unprotected because they're not getting dynamic HSTS. Also, dynamic HSTS configuration means that the header may change or get dropped by accident. We had to specifically add a guard for the removal criteria because this was happening too often, and I think it would be good to encourage sending the header as unconditionally as possible. |
I'm assuming (perhaps incorrectly) that CDNs are adding the STS header based on a configuration option. Could you work with CDNs so that the HTTP response they send in spam/fraud cases still includes the STS header, i.e. apply the STS header before the spam/fraud check? (This is also assuming that the CDN's response to such a request is an HTTP response vs closing a connection or similar.) That would be more in line with the philosophy that an STS header should be set unconditionally on a domain. |
I think I'd like to argue that we should spin off the "check multiple fetch with multiple UAs" idea into a separate feature request. I 100% agree that scanning for more than super-basic requirements is great, and that this could help solve a real issue that occurs in some cases. There are just also some additional risks. One thing I'm worried about, for instance, is that we'll run into CDNs who aren't enthusiastic about allowlisting fetches that look like they're from a bot but are using a browser-like UA string. If we encounter that, then we've obligated ourselves to either bake in ways to account for those CDNs (more complexity), or remove the check (wasted effort). Separate from that improvement is the present buggy reality that some folks behind CDNs can't preload their domains without manual intervention because those requests are getting blocked. The former is a cool nice-to-have. The latter needs addressing pretty urgently. |
Could I ask what makes it urgent? I think it's worth looking at solutions, but we've successfully asked sites to handle this on their end for over half a decade. Do we know what CDNs are causing most of the issues? Is it e.g. mostly Cloudflare? We could consider asking them if they would apply the domain's HSTS setting to their interception page. |
In any case, I offer this strawperson:
|
There's another reality here: we don't have a ton of cycles for HSTS preload stuff right now. We (the Chromium-based maintainers) are definitely committed to supporting the list for as long as it's valuable, and we might be able to give it more cycles in the future, but presently we're looking to get the most value per (very little) time spent. Setting a single UA header is a trivial change that meets the present need. We'd also be delighted to receive PRs for more comprehensive solutions. |
From my perspective, setting a hstspreload-specific UA string gets us an immediate win with virtually no downside and whether sites selectively serve headers based on UA string is a bit of an orthogonal issue to what hstspreload uses. We can consider fancier approaches later if we can articulate benefits that are worth the implementation effort. Site operators are already responsible for the consequences of "bad" HSTS behavior like ignoring the deployment recommendations when submitting their domain for preloading, regardless of whether they selectively serve headers based on UA. The immediate need we have now is for hstspreload.org and our bulk update infrastructure to be identifiable so it s header checks can be unblocked at the CDN level. We've so far identified 2 CDNs (including Cloudflare) that are known to be blocking requests, and after discussing it with them, the established way to circumvent this for bots is to allowlist based on ASN and UA string. If there are no objections to this immediate path forwards, I suggest we:
|
The user agent for hstspreload requests is generic
user-agent: Go-http-client/2.0
Can this be set to something specific to identify the bot? This would enable server admins to whitelist the bot if necessary and distinquish from any otherbot using the go http library.
Can I suggest
hstspreload client/2.0
??The text was updated successfully, but these errors were encountered: