-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Scaling registry updates #2452
Comments
Thanks for the issue @SimonSapin! I've been thinking about this as well after seeing that post. I believe your tl;dr; is correct in that we're fine here. We've already implemented almost all the mitigation strategies pointed out by the github staff, namely:
Using a special API to detect whether the repository doesn't have any commits seems like it'll be useful, though, regardless (as it's faster). Probably best discussed in a separate issue though! (as you've done) I'm going to close this for now as there's not really anything for us to do. We're already employing basically all of the mitigation strategies outlined in that thread, and we have other mitigation strategies in place for if operations on the registry become a pain in the future. |
You may want to make sure that Cargo doesn’t freak out if you do that. The Also, it seems like removing the git history would mean losing some data. For example in #2326 (comment) I relied on commit dates. I couldn’t have done that analysis without git history. Does the PostgreSQL database behind crates.io have more data than what’s in the index? Could that data be made more readily available? |
Yes support for rolling the history into one commit has been with Cargo since day 1. And yes it would break scripts that rely on git history, but that's not really something we can work around. And no I don't think the crates.io database can be used to rebuild the index. |
I’m getting off-topic here but I mentioned the database not to rebuild the index but making it available to enable anyone to do all kinds of unforeseen analysis like "which crates/versions were uploaded in this date range and might have be in GNU tarball format" (#2326 (comment)) or "make a distribution graph of crates by download count" |
Yes, the database should contain enough information to do something like that. To make it more accessible we'd likely want to just enhance the JSON api |
@alexcrichton Hopefully too many people don't decide to start their project names with "rust" :). Specifically, as of 8551e70, /ru/st/ contains 118 entries (and /go/og/ contains 113). |
TL;DR: This is a problem we don’t have yet. I mostly want to record some information in case we do in the long term.
This comment: CocoaPods/CocoaPods#4989 (comment) explains how the CocoaPods/Specs repository gets so much traffic that GitHub rate-limits it severely, causing fetches to take a very long time or fail.
This sounds exactly like rust-lang/crates.io-index.
Rate-limiting from GitHub has not been a problem for us as far as I know, but there may be some precautions we can take to avoid it.
I think we’re OK here since Cargo uses libgit2 which does not support shallow clones anyway.
Here as well we’re doing pretty good since rust-lang/crates.io-index already has two levels of directory nesting, each (roughly) with two characters from the start of crates’s names. 26^4 is 456,976; npm has 249,825 packages right now.
Another comment CocoaPods/CocoaPods#4989 (comment) suggests:
This sounds beneficial even if we don’t hit rate-limiting. I’ve filed #2451 separately.
The text was updated successfully, but these errors were encountered: