Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Daemon crases after requesting specific listing of links #3912

Closed
Zaijo opened this issue May 10, 2017 · 20 comments
Closed

Daemon crases after requesting specific listing of links #3912

Zaijo opened this issue May 10, 2017 · 20 comments
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@Zaijo
Copy link

Zaijo commented May 10, 2017

Version information:

go-ipfs version: 0.4.8-
Repo version: 5
System version: amd64/darwin
Golang version: go1.8

Type:

Bug

Severity:

High

Description:

IPFS Daemon crases on requesting the Turkish wikipedia "wiki" folder listing of links.

> ipfs ls /ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC
Error: Post http://127.0.0.1:5001/api/v0/ls?D=true&arg=%2Fipfs%2FQmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC&encoding=json&stream-channels=true: EOF

Daemon log: https://pastebin.com/Gks7tqWE

@Kubuxu
Copy link
Member

Kubuxu commented May 10, 2017

This isn't complete log, and I can't reproduce it. Could you capture the log straight to a file and to ls then.

@Zaijo
Copy link
Author

Zaijo commented May 10, 2017

Here is complete log. I can perfectly reproduce this bug every time I need.
ipfs-daemon.crash-report.zip

@Kubuxu
Copy link
Member

Kubuxu commented May 10, 2017

Looks like goroutine explosion (error is pthreads unable to create thread).
Would not be a problem if we had #3762

gx/ipfs/QmQvbWzZPGpoppaAvBtj6QmyBZPw4ivFD7ryyHesxuYYDa/yamux.(*Session).keepalive                                   286
gx/ipfs/QmQvbWzZPGpoppaAvBtj6QmyBZPw4ivFD7ryyHesxuYYDa/yamux.(*Session).send                                        286
gx/ipfs/QmTU8NWsDYNShMA3hjPfEZTg3pD7YgX62sFmZdEgbjtWq2/go-libp2p-swarm.(*Swarm).dialAddrs                           414
gx/ipfs/QmQvbWzZPGpoppaAvBtj6QmyBZPw4ivFD7ryyHesxuYYDa/yamux.(*Stream).Read                                         432
sync.runtime_SemacquireMutex                                                                                        484
net.runtime_pollWait                                                                                                547
gx/ipfs/QmW832cCfBWbTV2vRPzMyQuZAaUuEEWveVsVJm7U7h7HhT/go-libp2p-conn.(*Dialer).Dial                                1794
syscall.Syscall6                                                                                                    2033
gx/ipfs/QmTU8NWsDYNShMA3hjPfEZTg3pD7YgX62sFmZdEgbjtWq2/go-libp2p-swarm.(*activeDial).wait                           3239

But I am not sure why there are 2k active dials going on.
^^ @whyrusleeping

@Kubuxu Kubuxu added the kind/bug A bug in existing code (including security flaws) label May 10, 2017
@whyrusleeping
Copy link
Member

@Zaijo how much ram does your machine have? Its a macbook right?

@Kubuxu
Copy link
Member

Kubuxu commented May 10, 2017

It isn't ram, it is thread count. The crash is due to pthereads being unable to spawn thread for goroutine.

The interesting thing here is 2k dials in progress.

@whyrusleeping
Copy link
Member

@Kubuxu right, but i've seen thread death occur once ipfs starts swapping. Things start happening really slowly, and then go decides to create more threads.

@matthewrobertbell
Copy link

matthewrobertbell commented May 30, 2017

I did the same

ipfs ls /ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC

Output from the daemon: https://gist.github.com/anonymous/9b40eec6552d63ef253fdb531fc73c6d

No results were ever returned, the daemon didn't crash. I hope that helps.

Macbook Pro, 8GB ram, 0.4.9 official OS X build, poor internet connection.

@whyrusleeping
Copy link
Member

I think this will be resolved in latest master, there was an issue in dial rate limiting. @mattseh could you try again using a build from latest master and let us know if things are still broken?

@Zaijo
Copy link
Author

Zaijo commented Sep 2, 2017

@whyrusleeping It's MacBook Pro 8 GB RAM. Menawhile I upgraded to MacOS Sierra.

@Kubuxu
Copy link
Member

Kubuxu commented Sep 2, 2017

It was because of 2k concurrent dials, which can't happen anymore due to fix in go-libp2p-swarm.

@Kubuxu Kubuxu closed this as completed Sep 2, 2017
@matthewrobertbell
Copy link

matthewrobertbell commented Sep 20, 2017

I ran this again with 0.4.11-rc2, on both my Macbook Pro 8GB ram and a linux server with 16 cores and 128GB ram.

After 20 minutes, the ls command has failed to return, so it still seems broken to me.

On my Macbook the CPU is still maxed, and on the server, it is using 12 cores (low nice value, so using as much CPU as it can, that more important things are not using).

On both machines, IPFS is using 500-600MB of RAM.

Cheers

Edit: After two hours, the server IPFS is still using 10-12 cores, with no result, I have killed it.

@Stebalien Stebalien reopened this Sep 20, 2017
@Kubuxu
Copy link
Member

Kubuxu commented Sep 20, 2017

@mattseh how did you create that object or where is this object from?

EDIT: disregard that, it is Turkish Wiki snapshot

@Kubuxu
Copy link
Member

Kubuxu commented Sep 20, 2017

So the problem probably is that ipfs ls buffers the output and returns everything as one JSON object and the --resolve-type is true by default.

Turkish snapshot will have 512k objects in it. The fact that ls by default resolves types doesn't help.


@mattseh can you try running it with --resolve-type=false because the default is true and you will end up downloading most of the snapshot unfortunately.

(We really need new, better format for files and directories).

@matthewrobertbell
Copy link

time ipfs ls /ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC --resolve-type=false is still running 5 mins after being started, using more than 10 cores, on the big server previously mentioned. I will let this run overnight and see if it completes.

@Stebalien
Copy link
Member

If possible, could you take a CPU profile for us?

curl -o profile 'http://127.0.0.1:5001/debug/pprof/profile'

@whyrusleeping
Copy link
Member

(along with a copy of the ipfs binary youre using, more details here: https://github.com/ipfs/go-ipfs/blob/master/docs/debug-guide.md )

@matthewrobertbell
Copy link

matthewrobertbell commented Sep 21, 2017

It finally successfully completed:

real 232m17.259s
user 0m4.616s
sys 0m1.088s

Will run again and gather the above requested info.

@matthewrobertbell
Copy link

matthewrobertbell commented Sep 21, 2017

IPFS Binary is the official 0.4.11-rc2 for linux 64 bit.

ipfs debug data.zip

@foxcool
Copy link

foxcool commented Nov 18, 2017

Try to repeat this. Started ipfs daemon and try to get http://127.0.0.1:8080/ipfs/QmRNXpMRzsTHdRrKvwmWisgaojGKLPqHxzQfrXdfNkettC/Anasayfa.html

In log...
`
Daemon is ready
23:13:45.843 ERROR core/serve: invalid ipfs path: cid too short gateway_handler.go:584
23:13:45.844 ERROR core/serve: invalid ipfs path: cid too short gateway_handler.go:584
23:13:45.844 ERROR core/serve: invalid ipfs path: cid too short gateway_handler.go:584
...

`

@lidel
Copy link
Member

lidel commented Apr 25, 2022

Closing as I was unable to reproduce the original crash.

(a lot changed since 0.4.x – including perf. improvements. please update to the latest versions)

@lidel lidel closed this as completed Apr 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

7 participants