Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nix copy to ssh is sometimes very slow #2246

Open
johanot opened this issue Jun 22, 2018 · 6 comments
Open

nix copy to ssh is sometimes very slow #2246

johanot opened this issue Jun 22, 2018 · 6 comments
Labels

Comments

@johanot
Copy link

johanot commented Jun 22, 2018

I'm using nix copy to copy the closure and paths from my local machine to my servers.
nix copy to remote host is sometimes very slow.

It doesn't happen everytime I deploy, but it usually happens when I copy large paths to the remote. Say I copy 2gig data, the upload will usually stop after 1.99gig after a few seconds (plenty of bandwidth) and then hang for up to 5 whole minutes! .. before completing the very last couple of MBs.

I tried strace'ing the nix-daemon threads on my local machine (which were sleeping or waiting for mutexes), and subsequently strace'd the SSH-subthread, which was hanging in: select(8, [3 5], [], NULL, NULL) = 1 (in [3]) .

Looks like an SQLite statement running slow on the remote host?

My network bandwidth is fine, my local host is idling and no significant load on the remote host. No Swap, Plenty of free mem.

I don't know how to help further?
Please tell me what further info you need to debug. I'll be happy to help.

Local machine: Ubuntu 17 with Nix 2.0.4
Remote host: NixOS 18.03

@johanot
Copy link
Author

johanot commented Jun 25, 2018

Dupe of #1988 possibly.

@lheckemann
Copy link
Member

Since 2825e05 claims it might fix #1988, maybe try giving it another shot with current master (or nixUnstable next time it's updated in nixpkgs)

@knedlsepp
Copy link
Member

I still experience huge differences in speed between scp and ssh copies done by nix-build even on current 2.3.1.

@stale
Copy link

stale bot commented Feb 16, 2021

I marked this as stale due to inactivity. → More info

@stale stale bot added the stale label Feb 16, 2021
@stale
Copy link

stale bot commented Apr 29, 2022

I closed this issue due to inactivity. → More info

@stale stale bot closed this as completed Apr 29, 2022
@thufschmitt thufschmitt reopened this Feb 24, 2023
@eliasp
Copy link
Member

eliasp commented Jun 11, 2024

I often have massive performance issues when copying paths to remote hosts and I started seeing this since I started using deploy to manage my remote hosts.

While the performance in ~95% of the cases is just fine, it seems to happen especially with /nix/store/*-source-paths - due to many small files?
I get something like 4-6MB/hour for those copying path … operations.
Restarting deploy usually causes it to get stuck at the same item.

Aborting deploy, then using nix-copy-closure --to root@targethost /nix/store/foobar... usually copies the path immediately and restarting deploy afterwards makes it progress as expected.

strace hints at thread syncing issues/a race condition:

1877182 futex(0x5625b95f58f8, FUTEX_WAIT_BITSET_PRIVATE, 0, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>                                                                                                                                                                                                                   
1877181 futex(0x5625b95f6330, FUTEX_WAIT_BITSET_PRIVATE, 12, NULL, FUTEX_BITSET_MATCH_ANY <unfinished ...>                                                                                                                                                                                                                  
1877185 <... sched_yield resumed>)      = 0                                                                                                                                                                                                                                                                                 
1877185 sched_yield()                   = 0

I also created backtraces from the corresponding Nix processes that hint at something like this, but I lack the knowledge of Nix internals to make an informed statement about those results - maybe they provide to someone else a little more insights and might give a clue at what's going wrong here:
https://gist.github.com/eliasp/779ae1b2760541c1727326505019d953

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants