Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs add hangs and stalls with large folders #3885

Closed
entr0p1 opened this issue Apr 29, 2017 · 28 comments
Closed

ipfs add hangs and stalls with large folders #3885

entr0p1 opened this issue Apr 29, 2017 · 28 comments
Labels
topic/perf Performance

Comments

@entr0p1
Copy link

entr0p1 commented Apr 29, 2017

Version information:

go-ipfs version: 0.4.8-
Repo version: 5
System version: amd64/linux
Golang version: go1.8

Type:

Bug

Severity:

High

Description:

When adding files to the repository, it gets to a seemingly arbitrary point (around 8-10GB?) in to the file transfer, then the progress bar disappears and the job stalls. I usually pipe the output into a text file to get a file and hash list, but have tried without and still no luck I'm afraid.

The directory being added is 24.91GB in size, and contains 101143 objects.

The actual content being stored can be found with an explanation on this page: climate-mirror/datasets#333

The command I'm using is:

ipfs add -r -p -w (FOLDER NAME)>(FOLDER NAME)_filelist.txt

I was originally running golang installed from the Yum repositories, which turned out to be an old version (v1.6.3). I've since removed this one and installed straight from their site (running v1.8.1 now) which has significantly improved the performance of the ipfs add overall.

# go version
go version go1.8.1 linux/amd64

The machine is a Virtual Machine on Citrix XenServer, running Oracle Linux 7 and the datastore is on an NFS volume directly attached to the VM, mounted via /etc/fstab on boot. The VM has 2x vCPU cores pinned to it and 8GB of memory. I can provide further details on the physical hardware if it helps.

I'm running SELinux but it's not had any effect on the functionality since the start, as far as I can tell.

I installed IPFS using ipfs-update and am currently on the latest version (v0.4.8).

I've just tried stopping the ipfs daemon and will see if that improves, so far it's looking like I've gotten further than before. I'll provide an update when it completes with the result

@Kubuxu
Copy link
Member

Kubuxu commented Apr 29, 2017

Try enabling sharding ipfs config --json Experimental.ShardingEnabled true which allows for adding big directories.

@Kubuxu
Copy link
Member

Kubuxu commented Apr 29, 2017

Also see: ipfs/notes#212

@entr0p1
Copy link
Author

entr0p1 commented Apr 29, 2017

Update: It failed at 99.6%

Thanks for the super fast response @Kubuxu, I will give that a crack now and let you know the results. Fingers crossed

@entr0p1
Copy link
Author

entr0p1 commented Apr 29, 2017

Got all the way through and failed with this error:

ERROR commands/h: unexpected EOF client.go:247

Tried a second time and it made it 1% in and threw the same error. I've restarted the IPFS daemon and get the same result still.

Stopped the daemon and it seems to be running now, will report back soon.

@entr0p1
Copy link
Author

entr0p1 commented Apr 29, 2017

It bombed out again, this time saying "Killed". Trying again.

I've noted since turning on the sharding it seems to really use the RAM; to the point where it's filling swap as well. Unsure if that's related but thought it might be worth mentioning. Will post the results of the second attempt.

@Kubuxu
Copy link
Member

Kubuxu commented Apr 29, 2017

Interesting observation regarding sharding.

cc @whyrusleeping

@entr0p1
Copy link
Author

entr0p1 commented Apr 29, 2017

Yeah, no good I'm afraid

 8.95 GB / 24.91 GB [======================>----------------------------------------]  
Killed

It seems that it fills the RAM, then swap (other processes on the system have crashed I noticed) and then finally bombs out itself.

Let me know if you guys need any info from me, I'll do my best to give logs and such

@entr0p1
Copy link
Author

entr0p1 commented Apr 29, 2017

screenshot 2017-04-30 01 08 01

Here's some visual data on the memory usage and how CPU looks around the time it peaks.

@whyrusleeping
Copy link
Member

Hrm... I'll take a look at this after breakfast...

@whyrusleeping
Copy link
Member

@dojobel If you don't mind, could you try running this branch: #3888 ?
I set it to flush the cached directories

@entr0p1
Copy link
Author

entr0p1 commented Apr 30, 2017

Not at all, I've had a go (heh) but I must be noobing something up..

# make install
can't load package: package github.com/ipfs/go-ipfs/cmd/ipfs: cannot find package "github.com/ipfs/go-ipfs/cmd/ipfs" in any of:
	/usr/local/go/src/github.com/ipfs/go-ipfs/cmd/ipfs (from $GOROOT)
	/opt/go/src/github.com/ipfs/go-ipfs/cmd/ipfs (from $GOPATH)
bin/check_go_version 1.7
bin/check_go_path /opt/go/src/github.com/go-ipfs 
go-ipfs must be built from within your $GOPATH directory.
expected within '/opt/go' but got '/opt/go/src/github.com/go-ipfs'
make: *** [check_go_path] Error 1

I've stashed my original executable with ipfs-update stash and what I've done to get the branch is:

git clone https://github.com/ipfs/go-ipfs.git
cd go-ipfs
git checkout fix/add-mem-growth-hack

I've tried a make install straight in that folder and it told me it needs to run from under the $GOPATH. I moved the cloned folder into /opt/go/src/github.com/go-ipfs and end up receiving the above error. Have I missed something?

@entr0p1
Copy link
Author

entr0p1 commented Apr 30, 2017

Yep, it was me being a noob indeed - had to move the folder from "/opt/go/src/github.com/go-ipfs" to "/opt/go/src/github.com/ipfs/go-ipfs".

Have gotten it to make install successfully. I'm running an ipfs add now, will report back when it completes and let you know if that fix worked.

@entr0p1
Copy link
Author

entr0p1 commented Apr 30, 2017

It's stalled at 2.98%, 764.11MB / 24.91GB done.

The IPFS daemon is using a lot of memory since the import and is also using a fair bit of CPU (probably normal).

# ps aux | grep ipfs
root     23647 78.1 77.2 6646036 6107444 ?     Ssl  15:55  48:31 /opt/go/bin/ipfs daemon --enable-gc=true --manage-fdlimit=true
root     32524  1.1  0.7 340280 60736 pts/8    Sl+  16:20   0:25 ipfs add -r -p -w ftp.ncdc.noaa.gov

capture

I've had to restart it as it was on the cusp of using all the RAM and crashing other processes. It's leveled out after the restart

# ps aux | grep ipfs
root     14158  5.4  1.0 362152 81700 ?        Ssl  16:59   0:06 /opt/go/bin/ipfs daemon --enable-gc=true --manage-fdlimit=true

I'm going to try importing without the Daemon running again and will report back

@entr0p1
Copy link
Author

entr0p1 commented Apr 30, 2017

image
Doesn't look much better, it's currently 41.45% (10.33GB / 24.91GB) into the ipfs add. Going to have to cancel again to protect other processes I'm afraid

Edit: adding graphs
image

@entr0p1
Copy link
Author

entr0p1 commented May 3, 2017

Hi guys,

Just a quick update - I managed to get these problem folders to import by turning off sharding again. Unsure if the patch you did had anything to do with it, but something is definitely improved. Will be trying with some much larger folders over the coming days/weeks.

@entr0p1
Copy link
Author

entr0p1 commented May 6, 2017

Okay so I've managed to import things ranging between 20-40GB comfortably. It is a little slow, but I suspect that's just the nature of how IPFS works (dedupe has never been fast in my experience but that's not why you use it)

I have had to actually stop the IPFS daemon in order to get an ipfs add to actually work though, I'm unsure if that's due to turning sharding on and off or if something has been stuffed in the background. If I have the daemon running, the transfer simply hangs on the CLI and never gets as far as even showing the progress bar.

I'm adding something of about 270GB in size now and will report back if it works with that or bombs out.

@whyrusleeping
Copy link
Member

@dojobel Thanks for all these updates! For now, when adding very large sets of data, its advisable to do so without the daemon running, or by using ipfs add --local. The issue is that you will clog up the DHT when adding such huge volumes of objects, and that slows everything else your node is doing down. It then also has the impact of consuming lots of extra memory.

Let me know how things go on latest master, there have been several improvements since your last post (may 6th) that should help out, as well as some very exciting upcoming changes around providing and datastore performance which should land in 0.4.12.

@rngkll
Copy link

rngkll commented Feb 9, 2018

While trying to build an Ubuntu repository with @ElOpio, using ipfs, the add fails with the following error:

added QmUbpNwof3P2hNqf9pkNVNtRwrxBcgPxA5LfGgzfzKC2Er mirror/pool/universe/w/widelands/widelands_17-2~ubuntu12.04.1_i386.deb
 830.21 GB / 843.55 GB [==============================================================================================================================>--]  98.42% 2m8s21:58:17.480 ERROR commands/h: unexpected EOF client.go:247
Error: unexpected EOF

we always get the same error, in a different moment of the process.

@whyrusleeping
Copy link
Member

@rngkll what version of ipfs are you using? And are you doing the add with a running daemon? If so, does the daemon die when you get that EOF? It could be that its running out of memory.

@rngkll
Copy link

rngkll commented Feb 9, 2018

ipfs version 0.4.13

The daemon is running but we are using --local, the daemon showed a message at the same time.

21:58:17.481 ERROR commands/h: err: write tcp4 127.0.0.1:5001->127.0.0.1:38934: write: connection timed out handler.go:285
IPFS_PATH=/data/ipfsrepo/ ipfs add --progress --local=true --nocopy --fscache --recursive /data/repos/ubuntu/mirror/

@whyrusleeping
Copy link
Member

Do you mind trying out latest master? We have some fixes that should improve that process a bit, It might help. Note, there is a known bug with the gateway on latest master, a fix is in progress, but if you're using this node as a public ipfs gateway, then I recommend waiting until the fix is in.

@rngkll
Copy link

rngkll commented Feb 9, 2018

NP, We will build the master branch and test with that one.

@rngkll
Copy link

rngkll commented Feb 11, 2018

Verision
./ipfs version --all
go-ipfs version: 0.4.14-dev-eca0486e1
Repo version: 6
System version: amd64/linux
Golang version: go1.9.4

Output

 
 849.23 GB / 849.35 GB [==========================================================================================================================================]  99.99% 0s
panic: interface conversion: interface {} is cmdkit.Error, not *coreunix.AddedObject

goroutine 70 [running]:
github.com/ipfs/go-ipfs/core/commands.glob..func7.2(0xc42001a3c0)
        /home/alvaro/gocode/src/github.com/ipfs/go-ipfs/core/commands/add.go:405 +0xa60
created by github.com/ipfs/go-ipfs/core/commands.glob..func7.3
        /home/alvaro/gocode/src/github.com/ipfs/go-ipfs/core/commands/add.go:467 +0xc7

@whyrusleeping
Copy link
Member

@rngkll Thats.... a new bug. You shouldnt be seeing that. Thank's a bunch for reporting.

@rngkll
Copy link

rngkll commented Mar 1, 2018

@whyrusleeping Thanks for your help.

Now we are getting something different with not a lot of output.

Version
repoadmin@ipfsapt:/data/repos/ubuntu$ ./ipfs version --all
go-ipfs version: 0.4.14-dev-199a52d
Repo version: 6
System version: amd64/linux
Golang version: go1.10

Output

repoadmin@ipfsapt:/data/repos/ubuntu$ IPFS_PATH="/data/ipfsrepo" IPFS_FD_MAX="4096" ./ipfs add --progress --local --nocopy --fscache --quieter --recursive mirror/
 862.24 GB / 862.36 GB [=========================================================================================]  99.99% 1sError: Failed to get block for zb2rhZxJmJ44gKDyCN8776PgLakmT3Kjo4gfZp8XU5bPMUYs9: data in file did not match.
 repos/ubuntu/mirror/dists/artful-backports/universe/dep11/Components-i386.yml.gz offset 0
Qmb7pkxDyj22XCQGPpMverBkZ5R5UsiwDpSregrivxro9t

Is there a way that I can get more information on the error?

Thanks

@whyrusleeping
Copy link
Member

@rngkll that one looks like the file in question has changed on disk. Would you mind filing a new issue for that? This thread is getting a bit long

@whyrusleeping
Copy link
Member

Though i think its a result of the --fscache option and the file you are adding being changed on disk

@Stebalien
Copy link
Member

Many things have changed since this issue was reported, including flushing some intermediate datastructures in add.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic/perf Performance
Projects
None yet
Development

No branches or pull requests

5 participants