-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ipfs add performance for large trees of files & directories #6523
Comments
@Stebalien & @magik6k this analysis came from spelunking through the source code. Please let me know if there's anything there that doesn't sound right. In particular I'd like a second opinion on the theory (untested) that branches of the directory tree may be missed when writing the directories to the blockstore |
This is mostly correct.
This should be slightly mitigated by the fact that we check if we already have a block before writing it (in the blockservice itself).
Not quite. We flush everything first. Unfortunately, this does mean that we end up storing a bunch of intermediate directories that we don't need. For some context, we primarily use MFS to make it easy to pause for GC. However, honestly, that probably isn't worth it given the performance hit. I agree we should explore using unixfs directly. |
Do you mean when we call |
It will recursively flush. Flush calls Finally, it calls |
(unless I'm missing something) |
Ah yes you're right, thanks 👍 I'll remove that section from the original issue above. |
@Stebalien given this analysis, my uneducated guess is that changing |
We should measure how much datastore is called when adding stuff with different layouts.
My bet would be that for large directories / lots of directories, |
Another note on using unixfs directly - It would make it easier to add small files in parallel - each call to |
I was thinking along those lines - maybe a queue for incoming files and a queue for outgoing with backpressure in each |
@Stebalien can we just add the current root for any in-progress |
Probably? We really do need some kind of internal in-memory pinning system (which shouldn't be too hard). The primary issue is that the importer would need to be GC aware so it can pause. |
We can just do something similar to how |
Yeah, you're right. |
If there is a significant |
In most cases where we call (hack / poc idea: check if |
|
Yeah, that happens in blockservice, and it's easy to disable that check there (or just override (https://github.com/ipfs/go-blockservice/blob/master/blockservice.go#L141, https://github.com/ipfs/go-blockservice/blob/master/blockservice.go#L173) |
These data were generated with this script running on my laptop. It creates
Edit: I added the timings for Badger DB
|
We really need tracing in those code paths, this would let us know what is the precise cause. It might be worth comparing that with badger. |
Also please let me know if there's other particular things you'd like to see stats about, you guys are more familiar with where bottlenecks most likely to be |
You might also want to get stats from |
I added the timings for Badger DB to the table in my comment above |
Very interesting. |
I wanted to test the impact of adding many files in a directory tree with a varying branch out factor.
It doesn't seem to have a big performance impact. In these graphs the branch out factor (items / directory) is on the x-axis and duration is on the y-axis, where each line represents a These graphs show Has / Get / Put / Object counts for (Spreadsheet / Code) |
I ran some simulations for varying numbers of files of size 1k, with a few different branch-out factors. Again branch-out factor doesn't appear to make a difference, and it appears that time complexity increases polynomially (Spreadsheet - See Sheet 2). |
Are there any good comparisons we could make with things that are non-ipfs, to get some other baselines to contrast things with? I'm loving these graphs, and they're super awesome for helping us see some things about our own code paths, but after ogling for a while, I realized the number of zeros on most of the axes are hard for my brain to parse -- are these comparable to how fast a plain filesystem would flush? If there's a constant factor difference, that's useful to know; the rough order of mag of that constant factor would also be useful to know. (plain filesystem flush might not be a fair comparison -- we're hashing, etc, And That's Different; maybe 'git add' or operations like that would be better? There's also a pure golang git implementation out there, which could be amusing if we want to avoid comparing directly to one of the most hyper-optimized c things on the globe.) |
Did some analysis here and I'm having trouble reproducing a slow ipfs add. Definition: A slow ipfs add is one where recursively adding a directory Test setup: I've been running the tests on an 8TB HDD with the benchmark below, on Windows 10:
Test 1: Add arch repo to go-ipfs 0.4.22 and also copy paste the repoI used WSL to rsync the repo, but because of WSL weirdness the directory symlinks are annoying to work with so I only interacted with the pool directory (which holds 80% of the data as shown in ipfs-inactive/package-managers#79). Note: robocopy is generally considered a pretty fast Windows copy-paste utility, but if someone has a better alternative I'm open to it. Folder Size: 38.6 GB Powershell: ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc
Measure-Command{ipfs22 add -r --offline --silent .\addtest\arch\arch\pool\};Measure-Command {robocopy /MIR /NFL /NDL .\addtest\arch\arch\pool\ .\testarch\} Results (minutes:seconds): Test 2: Add extrapolated arch repo to go-ipfs 0.4.22 and also copy paste the repoI tried running @dirkmc s tests above and got similar results to what he did on both the add and copy-paste. I then modified the tests to more precisely follow the distribution from ipfs-inactive/package-managers#79. I unfortunately neglected to include the functionality allowing file sizes to be non-exponents of 2. I doubt that will really change the results, but I will post an update after I re-run the code. Folder parameters: Powershell: ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc
Measure-Command{ipfs22 add -r --offline --silent .\addtest\256k\};Measure-Command {robocopy /MIR /NFL /NDL .\addtest\256k\ .\test256k\} Results (hours:minutes:seconds): Update: Ran the code again with some variance of file sizes (for each file of size
|
@aschmahmann Windows is quite known for its slow filesystem. I would suggest trying on Linux. |
@Kubuxu while it's possible that Windows is not experiencing issues that OS X/Linux do either bc the file system is slow, or bc there's some bug in OS X/Linux not present in Windows, I have yet to see any evidence that this is a problem on OS X/Linux either. Additionally, the arch test is basically the same as the one @dirkmc and @andrew ran and runs in a similar amount of time. So at least at the 40 GB range the speed differences don't appear to be super noticeable. |
@aschmahmann could you try replicating this walk-through that Andrew wrote up: ipfs-inactive/package-managers#18 |
@dirkmc I can try, but it's going to be a little rough depending on how accurate I want the replication to be. I alluded to above I rsynced the Ubuntu repo (1.2 TB) in WSL which took quite a long time, but the WSL links won't resolve in Windows. I'm going to see if there's a way I can do one of: A) Turn the WSL links to Windows links and use IPFS on Windows |
Test 3: Add entire ubuntu repo to go-ipfs 0.4.22 and also copy paste the repoI used deleted the WSL links and rsync'd as admin and got normal Windows links! (Windows normally requires admin to create symlinks, it doesn't seem like this shouldn't be necessary in Windows 10 developer mode but occurred nonetheless). So with my now Windows compatible WSL-rsync'd ubuntu repo... Folder parameters: Powershell: ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc
Measure-Command{ipfs22 add -r --offline --silent .\ubuntu\};Measure-Command {robocopy /MIR /NFL /NDL .\ubuntu\ .\testubuntu} Results (hours:minutes:seconds): Looks like Something I noticed was that towards the very end of the test I came back to my machine and noticed that my RAM was maxed out (I have only 16GB of ram), I was hitting many hard faults, and disk read speeds were very low. I did some brief looking at the ram with VMMap and it looks like a lot of it is from Badger (1.3 TB of mapped files, but also 10GB of ram allocated). It looks like we're probably using more ram than we should, but not necessarily by a huge margin. For those of you more experienced then me at looking through these ram allocation things I've attached a dump a took towards the end of the program running ipfs22.txt. Given that this issue was filed because of the ubuntu add experiment being slow and that I'm having trouble replicating it on reasonable hardware I recommend we pause this issue for now until a user comes back to us with a data set that they're having trouble importing. Note, I haven't tested this on Linux or OS X yet, but as @Kubuxu mentioned Windows is supposed to be the slow one and it's doing just fine 😄. If you feel like testing on another OS is important here, please post away and we can discuss. @Stebalien @dirkmc thoughts? |
Two possible reasons:
Note: WRT linux/windows filesystem operations, badger shouldn't hit the windows slowness issues as it doesn't perform tons of filesystem operations. Instead, it just writes to mmapped files. |
It would be great if it's performing well enough that we can reprioritize other projects above this one, I'm glad you did this research. It would be good to understand why the use case Andrew was looking at was so slow. It looks like he was doing this on Ubuntu, so maybe there's an issue on Linux that doesn't manifest on Windows? It's also possible that the hardware he was using wasn't up to the task, perhaps because it was severely memory limited or had a very slow disk. |
It looks from ipfs-inactive/package-managers#18 like his machine had plenty of 32GB of ram (my initial though too). No idea what the drive performance was though, it was a cloud box but data wasn't in my face when I went to the website. If I have time will try and run this on Ubuntu over the weekend (just got to get my machine set up). I'll probably use ntfs drivers instead of ext4 to avoid another 6hr file copy though (unless we think ext4 is potentially the problem). |
Note that NTFS on Linux has a way different performance characteristic than Ext4. From the datastore benchmarking dataset (done on c5d instance on aws, nvme ssd, not the greatest / cleanest dataset, but gives the idea) - https://ipfs.io/ipfs/QmPjUTbqYAsHfeuSsqZtPLZueH4r9oPnyj1kKckca9XANJ Also some notes:
|
If the problem only exists on SSDs then according to @Stebalien it's fine to deprioritize this issue, since if we're really focusing on 1TB+ package mirrors they will likely be stored on HDDs (the issue that spun this off ipfs-inactive/package-managers#18 that also used HDDs). Unless I'm misreading your first graph (add performance on badger) the results are confusing. It looks like NTFS is by far the fastest and for some reason ext4 performance is by far the worst. Is this accurate? If so then perhaps we're dealing with badger+ext4 issues. |
That's the correct observation, though it may indicate some issues with how ntfs implementation on Linux handles sync writes (given that it's faster than everything else too)
That's fair, and I agree with this decision, but we should keep in mind that there are scenarios where we can still get better. |
Test 1-ext4: Add arch repo to go-ipfs 0.4.22 and also copy paste the repoI rsync'd the data from the NTFS partition to a blank ext4 partition on the same drive. Then tried Powershell: ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc
Measure-Command {ipfs22 add -r --offline --silent ./arch/}
Measure-Command {cp -a ./arch/ ./testarch/} Results (hours:minutes:seconds): This seems to match with our expectations of ext4+badger being very slow and not with @dirkmc and @andrew 's results. @dirkmc any idea if the partitions were ext4 formatted and if they were running on HDDs or SSDs? Note: Given how poorly ext4+badger performed compared to NTFS in @magik6k 's benchmark we might have expected an even larger gap than just 3x. @magik6k were the NTFS results collected on Linux or Windows? Also, of course those benchmarks were done on SSDs (which badger is optimized for) instead of HDDs so they may not translate super well. Do we have HDD benchmarks or the code to generate them? Addendum: I've tried a number of different tests to poke at this (flatfs+ext4 is even slower than badger despite the benchmarks, copy-paste to NTFS is fast, but adding to NTFS from Linux is slow, etc.). It looks like this may have to do with Windows vs Linux, where ipfs+badger+Windows is fine, but ipfs+badger+Linux is slow. Not yet sure if it's us using badger improperly or badger issues. A big 👍 for developers using multiple OSes or we probably don't learn things like this. |
Agreed, these are really useful findings, thanks Adin. My tests were all done on my mac with an SSD and plenty of RAM. I'm not sure about the characteristics of the machine Andrew was using originally. |
That was on one of the Debian versions AWS provided at the time, not sure exactly which one |
@dirkmc @magik6k thanks for the info. I ran more tests and Ubuntu 19.10+badger+ext4 performs much much better if you turn off Test 3-ext4: Add entire ubuntu repo to go-ipfs 0.4.22I then went and ran the ubuntu repo test after turning off sync writes and ext4 journaling. For those interested in turning off journaling I followed the slightly wrong instructions at https://foxutech.com/how-to-disable-enable-journaling/. Instructions for posterity:
Powershell: ipfs init --profile=badgerds --empty-repo
ipfs pin rm QmUNLLsPACCz1vLxQVkXqqLX5R1X345qqfHbsf67hvA3Nn #removes only pin
ipfs repo gc
#manually edited IPFS config file to use "syncWrites" : false, could probably have used the cli
Measure-Command{ipfs22 add -r --offline --silent .\ubuntu\} Results (hours:minutes:seconds): This super snazzy result (plus some profiling I did with pprof - thanks @dirkmc for the easy setup) indicate that the problem is that linux+ext4+badger is really bad with Two main camps going forward are probably:
My plan is to go for option 2 as it is the fastest way to get results. Have also posted on the badger issue and maybe they'll have some advice and/or a fix. |
Amazing ❤️ |
@aschmahmann - I created a placeholder graph here for us to organize this information. Can you help add in the right data for 0.5 vs 0.4.23 using badger vs flat fs on linux? |
Just my 2¢, Is it possible to include benchmarks for both ext4, btrfs and optionally ntfs underlying filesystems? I'm in the position of deploying some VMs dedicated purely to ipfs related tasks, and having a recommended file system type for maximum performance tuning seems appropriate and useful imho. |
Unfortunately, running these benchmarks is a manual process so we're not going to be able to re-run them on all filesystems. However, if you want to try a multi-filesystem benchmark, I'd be interested in the results. (in general, I'd recommend against NTFS). |
Has this been abandoned? |
Calling
ipfs add
on a directory with a large tree of sub-directories and files is slow. This use case is particularly important for file-system based package managers.Background
IPFS deals with immutable blocks. Blocks are stored in the blockstore.
The UnixFS package breaks files up into chunks, and converts them to IPLD objects.
The DAG Service stores IPLD objects in the blockstore.
The Mutable File System (MFS) is an abstraction that presents IPFS as a file system. For example consider the following directory structure with associated hashes:
If the contents of
fish.txt
changes, the CID forfish.txt
will also change. The link fromocean → fish
will change, so the CID forocean
will change. The link fromanimals → ocean
will change so the CID foranimals
will change. MFS manages those links and the propagation of changes up the tree.Algorithm
ipfs add
uses the MFS package to add files and directories to the IPFS blockstore. To add a directory with a large tree of sub-directories and files:animals
in the example above)For each directory
animals/ocean
This adds the empty directory to the blockstore.
animals/ocean/fish.txt
animals/ocean
animals/ocean/fish.txt
)Note: This again adds the IPLD Node root to the blockstore
directory.GetNode()
Note that at this stage, the links to files in the directories have been created, so the directory created here will have a different CID than the empty directory created before the files were added. Calling
directory.GetNode()
(confusingly) writes the directory with links to files to the blockstore(†) Although we've already created the directory, it's necessary to again ensure it exists before adding the file, because after processing every 256k files, the MFS internal directory cache structure is dereferenced to allow for golang garbage collection
Areas for Improvement
Proposed Improvements
The above issues would be mitigated if we interact directly with the UnixFS API instead of with the MFS API:
Future Work
It has been noted that disk throughput and CPU usage are not close to maximum while adding large numbers of files. Future work should focus on analyzing these findings.
The text was updated successfully, but these errors were encountered: