-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase download buffer size and improve tarball import logging #11171
Increase download buffer size and improve tarball import logging #11171
Conversation
We are piping curl downloads into `unpackTarfileToSink()`, but the latter is typically slower than the former if you're on a fast connection. So the download could appear unnecessarily slow. (There is even a risk that if the Git import is *really* slow for whatever reason, the TCP connection could time out.) So let's make the download buffer bigger by default - 64 MiB is big enough for the Nixpkgs tarball. Perhaps in the future, we could have an unlimited buffer that spills data to disk beyond a certain threshold, but that's probably overkill.
This happens in parallel with the download (which starts later), so you only see this message when the download has finished but the import hasn't.
Maybe related to / fixes these issues: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of the buffer being at capacity, could the cause be an ever increasing chunk size of the data transferred to the consuming thread?
That way the pace of consumption becomes spaced apart in time more and more, creating a spiking download rate at best, and dropped connections at worst. A larger buffer would only allow those intervals of blocking due to consumer processing to grow larger and larger, because a larger buffer fits larger chunks.
debug("download buffer is full; going to sleep"); | ||
static bool haveWarned = false; | ||
warnOnce(haveWarned, "download buffer is full; consider increasing the 'download-buffer-size' setting"); | ||
state.wait_for(state->request, std::chrono::seconds(10)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
10 seconds seems a little excessive, or is this a placeholder for infinity because we're getting interrupted by a signal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually the download thread will be woken up long before the 10 second limit by the FileTransfer::download
main loop.
debug("download buffer is full; going to sleep"); | ||
static bool haveWarned = false; | ||
warnOnce(haveWarned, "download buffer is full; consider increasing the 'download-buffer-size' setting"); | ||
state.wait_for(state->request, std::chrono::seconds(10)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An std::string
is a contiguous region of memory, so resizing it to fit an ever larger amount of data could cause O(n²) copying to new regions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure it's O(n lg n) amortized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I guess it isn't quite that, but it's a lot of unnecessary copying nonetheless.
Forgot about the increments being proportional.
No it's the buffer being full, as you can see running it with Not sure how the chunk size increases? Also, the client thread moves the current contents of the buffer so that's O(1). |
Ohh, you're trying to download faster for no other reason than the progress bar?
It increases because more data gets appended to |
It's mostly cosmetic, but it's possible that if the download thread "starves", TCP connections start dropping or get throttled. |
Successfully created backport PR for |
Successfully created backport PR for |
Successfully created backport PR for |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/2024-07-29-nix-team-meeting-minutes-165/49970/1 |
Motivation
Downloads of
tarball
andgithub
flakes could appear quite slow. This was because we are piping curl downloads intounpackTarfileToSink()
, but the latter is typically slower than the former if you're on a fast connection. This would cause the curl download to be put to sleep. (There is even a risk that if the Git import is really slow for whatever reason, the TCP connection could time out.)So let's make the download buffer bigger by default - 64 MiB is big enough for the Nixpkgs tarball. Perhaps in the future, we could have an unlimited buffer that spills data to disk beyond a certain threshold, but that's probably overkill.
This also adds a warning if the download buffer is full, and shows when we are unpacking an archive into the Git cache.
Context
Priorities and Process
Add 👍 to pull requests you find important.
The Nix maintainer team uses a GitHub project board to schedule and track reviews.