-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LevelDB is using a lot of FDs duing --nocopy add #3763
Comments
I see you merged the pull request. I pulled and re-built from master, but still get the same error. |
@jefft0 interesting... can you help debug this for us? Theres a program i use for this called Start up your daemon and get its pid. Then do whatever it is that you do to reproduce this problem, and when its about to happen, grab the output of It might help to grab it a few times, before, during (as close as you can get to the error) and after. With this info we should be able to figure out where those pids are coming from and fix the bug |
See the attached output files from |
LevelDB is using 500 FDs, this shouldn't happen. |
what the hell... |
Any ideas on why so many file descriptors? |
@jefft0 I've tried to reproduce the issue several times now with no luck. What sort of disks do you have? and is there anything weird about your setup? (like, different weird filesystems, mountpoints, OS tweaks that might affect this) |
My files are in an 8-drive JBOD housing over USB. I'm using the default mount point for Ubuntu, but through symbolic links in the |
@jefft0 and your .ipfs directory (the datastore and blocks directories) are stored on a normal disk? (ssd or spinner?) |
Yes, they're on my SSD system drive. |
(My default home directory) |
@jefft0 Okay, i'll run some experiments with the data being on a mounted drive. In the meantime, if you don't mind, could you grab a stack dump from your daemon while the file descriptor usage is really high?
I'm curious to see if there are tons of concurrent processes hitting leveldb for some reason |
Attached. |
@whyrusleeping Here's another data point. I tried the same setup on macOS 10.12 . I'm not running the daemon as I Anyway, attached is the |
Is |
I don't think it is. |
@jefft0 i have no idea why leveldb is behaving that way, it doesnt do so on my machines. Just seeing your lsof output now |
@whyrusleeping On the Ubuntu machine you tested, what version of LevelDB? How was it installed? |
I am able to reproduce the error in a fresh Ubuntu virtual machine (both VirtualBox and Parallels) with a fresh installation of the latest go-ipfs. I transfer a 700MB gzip file and expand it as the new installation's So, if someone wants to reproduce and debug this issue, I can get you the 700MB gzip file. (You don't need the original files that were added.) Would that be useful? |
... or better yet, make a virtual machine for you to log into? |
@jefft0 Yeah, getting that gzip file would be great! |
See ipfs-too-many-fds.tar.gz . |
@jefft0 the archive you sent was an ipfs repo. Was your issue with adding an ipfs repo into ipfs? Or did you mean to zip up something else? |
That's what I meant to send. (Don't worry about the private key. It was generated just for this test.) Steps to reproduce
For me, it gives the error "too many open files". |
@whyrusleeping, okay I will look into upgrading the leveldb process and see how low I can get the ulimit. |
@kevina just wanted to test my theory about batching, so i disabled it by making the batch code just write directly through to the db. It made the add take slightly longer to fail, but still got a too many FD error nonetheless. Its all tons of leveldb files, This feels like a leveldb bug to me... |
Okay, I found the problem. The leveldb was not being properly compacted and there where too many "*.ldb" files (I counted over 4000), each of them open. I was able to fix the problem by calling:
after the database is open in There are a bunch of parameters related to compaction and I'm not sure why the its is not being called automatically in our case. On a possible related note, we disable compression, I am not sure how much speed benefit this gives us and may also relate to the fact that the leveldb is not automatically compacting (but this is just a wild guess). |
I remember we disabled compression back when we were writing all the block data into leveldb, we can probably try turning it back on and seeing how that works. Though that has nothing to do with compaction as far as i can tell. Its worrying that compaction was never run... I think that happens as a background process. @jefft0 Do you run the ipfs daemon for long periods? Or do you mainly do @kevina do you think we should just run compaction at daemon startup all the time? |
In my use case, I am doing the initial |
@whyrusleeping we could run compaction when we open the database (as oppose to just starting the daemon). But I see that as a bit of a hack. Note also that the first compaction could take a very long time (over 10 minutes in this case). |
Hrm... How about we find a nice way to add it to 'ipfs repo fsck' ?
…On Tue, May 16, 2017, 09:28 Kevin Atkinson ***@***.***> wrote:
@whyrusleeping <https://github.com/whyrusleeping> we could run compaction
when we open the database (as oppose to just starting the daemon). But I
see that as a bit of a hack. Note also that the first compaction could take
a very long time (over 10 minutes in this case).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3763 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABL4HCtJMklkuYvGKsbd5T-lM0BaXdHLks5r6c6YgaJpZM4MV8et>
.
|
@jefft0 Ah, that makes sense then. In your usecase, leveldb was never able to successfully complete a compaction run |
@whyrusleeping Are you saying that if I start the daemon, it will do a compaction? |
... I'll try it. But if memory serves, I was also getting the error when I was running the daemon, trying to serve the files that I had added. |
@jefft0 well, in your rather special case, it will likely start a compaction after a few minutes, then fail because of too many open file descriptors. What i would try it raising your ulimit to something above 5000, and then starting the daemon with It will likely take half an hour or so, since the compaction code @kevina said "do nothing else, just compact" and the default background compaction tries not to get in the way too much. |
Do I need to add this line of code?
|
@jefft0 no, you shouldnt have to. leveldb will do the compaction automatically in the background. Just make sure you run your daemon with a very high ulimit |
@whyrusleeping we cold add it to |
@kevina Yeah, i like that idea. |
really, we just need to move away from leveldb. It has many issues. I have high hopes for badger, but i'm waiting on a few issues to be resolved: dgraph-io/badger#28 |
@whyrusleeping : We should be able to resolve them fairly quickly. ETA: week or so. |
@manishrjain Thats great news, thanks! |
dgraph-io/badger#28 is now resolved. |
@jefft0 has this been resolved? I'm fairly certain the compaction issue can be resolved by running |
I would say yes, it been resolved. Maybe I passed a threshold in having so many files that it always compacts. I dunno. But the simple fact is that it keeps adding files without error. Also, when I run |
I'm adding the |
@schomatis so I thought the move is to make badger the default, not the only option. Has this changed, once badger is working will no longer support the original configuration with leveldb/flatfs? Or is the intent to move to badger/flatfs? I can see many advantages to still supporting flatfs and hope we won't abandon support for it completely. |
@kevina Sorry for the confusion, leveldb/flatfs will still be supported after the transition (AFAIK), what I meant to say is that, as this issue was raised for the scenario of a default installation,
this error won't happen after Badger is the default datastore (yes it will happen if |
Version information:
Type: bug
Priority: P1
Description:
With a script, I used
ipfs add --nocopy --raw-leaves
to individually add 9110 webcam video files, each about 200 MB. (I did not use recursive add.) This is a fresh installation of the latest code in master. I did not start the daemon. Now, even after restarting my computer, when I try toadd --nocopy
another file, I get the error:The
.ipfs/blocks
folder has 73689 .data files, and the.ipfs/datastore
folder has 4314 .ldb files.Maybe
ipfs add --nocopy
is trying to open all the .ldb files at once?The text was updated successfully, but these errors were encountered: