Increase download buffer size and improve tarball import logging #11171

edolstra · 2024-07-24T18:27:37Z

Motivation

Downloads of tarball and github flakes could appear quite slow. This was because we are piping curl downloads into unpackTarfileToSink(), but the latter is typically slower than the former if you're on a fast connection. This would cause the curl download to be put to sleep. (There is even a risk that if the Git import is really slow for whatever reason, the TCP connection could time out.)

So let's make the download buffer bigger by default - 64 MiB is big enough for the Nixpkgs tarball. Perhaps in the future, we could have an unlimited buffer that spills data to disk beyond a certain threshold, but that's probably overkill.

This also adds a warning if the download buffer is full, and shows when we are unpacking an archive into the Git cache.

Context

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

We are piping curl downloads into `unpackTarfileToSink()`, but the latter is typically slower than the former if you're on a fast connection. So the download could appear unnecessarily slow. (There is even a risk that if the Git import is *really* slow for whatever reason, the TCP connection could time out.) So let's make the download buffer bigger by default - 64 MiB is big enough for the Nixpkgs tarball. Perhaps in the future, we could have an unlimited buffer that spills data to disk beyond a certain threshold, but that's probably overkill.

This happens in parallel with the download (which starts later), so you only see this message when the download has finished but the import hasn't.

cole-h · 2024-07-24T18:34:58Z

Maybe related to / fixes these issues:

roberth

Instead of the buffer being at capacity, could the cause be an ever increasing chunk size of the data transferred to the consuming thread?
That way the pace of consumption becomes spaced apart in time more and more, creating a spiking download rate at best, and dropped connections at worst. A larger buffer would only allow those intervals of blocking due to consumer processing to grow larger and larger, because a larger buffer fits larger chunks.

roberth · 2024-07-25T14:00:00Z

src/libstore/filetransfer.cc

            debug("download buffer is full; going to sleep");
+            static bool haveWarned = false;
+            warnOnce(haveWarned, "download buffer is full; consider increasing the 'download-buffer-size' setting");
            state.wait_for(state->request, std::chrono::seconds(10));


10 seconds seems a little excessive, or is this a placeholder for infinity because we're getting interrupted by a signal?

Usually the download thread will be woken up long before the 10 second limit by the FileTransfer::download main loop.

roberth · 2024-07-25T14:02:48Z

src/libstore/filetransfer.cc

            debug("download buffer is full; going to sleep");
+            static bool haveWarned = false;
+            warnOnce(haveWarned, "download buffer is full; consider increasing the 'download-buffer-size' setting");
            state.wait_for(state->request, std::chrono::seconds(10));
        }



An std::string is a contiguous region of memory, so resizing it to fit an ever larger amount of data could cause O(n²) copying to new regions.

Pretty sure it's O(n lg n) amortized.

Right, I guess it isn't quite that, but it's a lot of unnecessary copying nonetheless.
Forgot about the increments being proportional.

edolstra · 2024-07-25T14:35:23Z

Instead of the buffer being at capacity, could the cause be an ever increasing chunk size of the data transferred to the consuming thread?

No it's the buffer being full, as you can see running it with -vvvv.

Not sure how the chunk size increases? Also, the client thread moves the current contents of the buffer so that's O(1).

roberth · 2024-07-25T14:54:57Z

No it's the buffer being full, as you can see running it with -vvvv.

Ohh, you're trying to download faster for no other reason than the progress bar?
I thought this was about reliability, because I couldn't fathom performance theater as a goal. (No disrespect to theater)
My comments are about downloading at a steady pace, to make the backpressure work without massive waits that could make it unreliable.
Also jittery for other network users if you do manage to max out the internet connection during this burst. (One bug burst after the limit increase)

Not sure how the chunk size increases?

It increases because more data gets appended to data while the download thread is doing a burst and the consumer thread is slowly eating away the previous chunk of data data that was indeed moved.
This causes the consumer thread to work on an even larger piece of data the next time, causing more delay in the consumer, causing the download thread's data to grow larger again, etc.

edolstra · 2024-07-29T12:05:33Z

Ohh, you're trying to download faster for no other reason than the progress bar?

It's mostly cosmetic, but it's possible that if the download thread "starves", TCP connections start dropping or get throttled.

github-actions · 2024-07-29T13:02:54Z

Successfully created backport PR for 2.21-maintenance:

[Backport 2.21-maintenance] Increase download buffer size and improve tarball import logging #11212

github-actions · 2024-07-29T13:02:57Z

Successfully created backport PR for 2.22-maintenance:

[Backport 2.22-maintenance] Increase download buffer size and improve tarball import logging #11213

github-actions · 2024-07-29T13:02:59Z

Successfully created backport PR for 2.23-maintenance:

[Backport 2.23-maintenance] Increase download buffer size and improve tarball import logging #11214

nixos-discourse · 2024-08-01T10:01:49Z

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2024-07-29-nix-team-meeting-minutes-165/49970/1

edolstra added 4 commits July 24, 2024 20:10

Log download durations

caf4e98

Warn if the download buffer is full

f6a9a71

Show when we're unpacking an archive into the Git cache

01839b5

This happens in parallel with the download (which starts later), so you only see this message when the download has finished but the import hasn't.

edolstra requested a review from Ericson2314 as a code owner July 24, 2024 18:27

github-actions bot added the fetching Networking with the outside (non-Nix) world, input locking label Jul 24, 2024

edolstra added the backport 2.23-maintenance Automatically creates a PR against the branch label Jul 24, 2024

edolstra added backport 2.21-maintenance Automatically creates a PR against the branch backport 2.22-maintenance Automatically creates a PR against the branch labels Jul 24, 2024

roberth reviewed Jul 25, 2024

View reviewed changes

roberth approved these changes Jul 29, 2024

View reviewed changes

roberth merged commit 6e3bba5 into NixOS:master Jul 29, 2024
15 checks passed

github-actions bot mentioned this pull request Jul 29, 2024

[Backport 2.21-maintenance] Increase download buffer size and improve tarball import logging #11212

Merged

github-actions bot mentioned this pull request Jul 29, 2024

[Backport 2.22-maintenance] Increase download buffer size and improve tarball import logging #11213

Merged

github-actions bot mentioned this pull request Jul 29, 2024

[Backport 2.23-maintenance] Increase download buffer size and improve tarball import logging #11214

Merged

shyim mentioned this pull request Aug 8, 2024

Show when we're unpacking an archive into the Git cache domenkozar/nix#2

Open

This was referenced Aug 8, 2024

Nix 2.21.x -> 2.22.x download buffer is full on CentOS 7 #10630

Open

Fetching paths from cache.nixos.org gets slow in certain Docker environments #11258

Open

lentilus mentioned this pull request Aug 12, 2024

slow nix build in devcontainer #11249

Closed

edolstra deleted the speed-up-tarball-downloads branch August 29, 2024 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase download buffer size and improve tarball import logging #11171

Increase download buffer size and improve tarball import logging #11171

edolstra commented Jul 24, 2024

cole-h commented Jul 24, 2024

roberth left a comment

roberth Jul 25, 2024

edolstra Jul 25, 2024

roberth Jul 25, 2024

edolstra Jul 25, 2024 •

edited

Loading

roberth Jul 25, 2024

edolstra commented Jul 25, 2024

roberth commented Jul 25, 2024

edolstra commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

nixos-discourse commented Aug 1, 2024

Increase download buffer size and improve tarball import logging #11171

Increase download buffer size and improve tarball import logging #11171

Conversation

edolstra commented Jul 24, 2024

Motivation

Context

Priorities and Process

cole-h commented Jul 24, 2024

roberth left a comment

Choose a reason for hiding this comment

roberth Jul 25, 2024

Choose a reason for hiding this comment

edolstra Jul 25, 2024

Choose a reason for hiding this comment

roberth Jul 25, 2024

Choose a reason for hiding this comment

edolstra Jul 25, 2024 • edited Loading

Choose a reason for hiding this comment

roberth Jul 25, 2024

Choose a reason for hiding this comment

edolstra commented Jul 25, 2024

roberth commented Jul 25, 2024

edolstra commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

github-actions bot commented Jul 29, 2024

nixos-discourse commented Aug 1, 2024

edolstra Jul 25, 2024 •

edited

Loading