-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sum.golang.org: set a useful User-Agent string #44468
Comments
That is an example program to check the go.sum file against go.sum database like sum.golang.org, not the git host. #35699 has prior discussion about adding User-Agent string to go commands. |
I ultimately blocked these IPs because they appeared to be wantonly crawling the server. This is poor behavior for a crawler. It caused service issues for Go users as a result. sum.golang.org is being a poor citizen of the web. Another aside: it really frustrates me that all discussions are locked after they're "decided", this isn't the first time that I've had a new perspective or new information to contribute and couldn't because the decision was already made. |
A solution which would respect the privacy of the user is adding a GOUSERAGENT environment variable or some such similar thing and then just setting it appropriately on the proxy servers, and to a generic value for end-users. |
If By the way, blocking all the traffic from sum.golang.org or proxy.golang.org will prevent all Go users of your package from getting the sum data. I suspect the traffic to refresh the cc @katiehockman @heschik @bcmills @jayconrod @matloob @rsc This issue isn't locked or closed. We appreciate your input and new information. |
Right. And we would not have blocked it if we had any idea what the clients were, i.e. if they set their User-Agent properly. As far as we could tell, it was just some skiddie on GCP running a scraper written in Go. The aggressiveness is fine (though you should obey robots.txt!), given the utility - supporting the Go ecosystem - but without knowing what it's being used for, we have no context and have to consider it a violation of our terms of service, which prohibits scraping outside of a few specific purposes.
I'll update this if I see it again later, but I didn't save the logs and we get a lot of traffic - and because it's not easily distinguished from any other kind of traffic, it's hard to find the activity again. |
Setting a user agent string for the go command seems reasonable, as long as it doesn't contain the version: we want to avoid different content being served to different versions of the go command. Note however that the fetch service backing I'm not at all sure where the scraping is coming from though. The go command doesn't scrape. If you share logs we can look into whether there's a bug here. |
@jayconrod I am guessing that's part of import paths resolving traffic https://golang.org/cmd/go/#hdr-Remote_import_paths |
Could be... those requests will all have the query string |
Here's an example:
These IPs appear to come from Google. At the moment I'm getting several requests per second in this shape from Google IP blocks. |
This crawling is actually starting to get out of hand. Is there someone on the infrastructure team I can escalate to? |
@ddevault Thanks for sharing the example. They look like requests triggered by git remote calls from a If packages and modules hosted in your site are actively used by Go users or the site hosts many packages, the aggregated volume of traffic originated from us may be significant. If it is causing an issue in your service, can you please file a separate issue with specific details of requests you are seeing and what problem the traffic has caused? The problem doesn't seem like an issue about missing User-Agent string any more. |
I wonder if the latest Go release, with its changes to modules, is causing a larger burden on hosting services. In any case, I may follow-up in a second ticket later on, but for infrastructure issues I would prefer to be contacted directly by the sysadmins responsible: sir@cmpwn.com |
@ddevault Can you please file a separate issue to discuss the issue? Other code hosting service owners may be interested in the topic and we'd like to keep our conversation in public. |
See #44577 |
The change that sets @ddevault can you please verify git http requests coming from proxy.golang.org now have the User-Agent string? Thank you! |
I can verify that I'm receiving requests with that User-Agent now. Thanks! |
I was looking through some suspicious traffic on my git hosting service and it took me a while to understand that it was coming from sum.golang.org.
The code here:
https://github.com/golang/mod/blob/master/gosumcheck/main.go#L187
Should be updated to set a meaningful User-Agent so that admins like me are less confused when reading our access logs.
Aside: the page at sum.golang.org should include a link to the source code, it was not easy to find.
The text was updated successfully, but these errors were encountered: