-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: ipfs repo size command #4550
Conversation
ipfs repo size just display the repository size without needing to count the blocks, which takes a significant amount of the repo stat time. ipfs-cluster queries the repository size to allocate pins to those peers with most free space using repo/stat. When done too often on a busy node, this has resulted in downgraded performance. Counting blocks is an information that cluster does not use. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>
This is still relatively slow so I'm not 100% sure it fixes the problem, responses do show a ~40% improvement. Any of you guys can think of a better way to keeping track of the repo size? Would it be doable to calculate it once and then update the value on every write/gc ? |
Maybe we should just add this as a flag to repo stat? |
I think we should leave it up to the datastore. That is, ask the datastore its size and let it do whatever it needs. For flatfs, we should keep a running count. If you think the performance sucks now, it's going to absolutely tank when we upgrade to a kernel with meltdown protections enabled (syscall performance will tank, we're currently doing a |
Would it be an option at all to keep track of repo size separately? i.e. start at zero, then increment for every added block, decrement for every deleted block? That'd make repo size an accounting function and not an IO operation anymore. I don't know how Badger or LevelDB keep track of size, they're probably smarter than flatfs. In that case accounting would still be an option for flatfs? |
This sounds like a good way to do. I can try do this. Do you have any pointers? (There's an interface and datastores are implemented in different repos right ?) What datastores are available now?
Sure, it's cosmetic so I can do it at the end (along with tests) |
Although this might brake backwards compatibility I would lean towards having the stat command only return what is is already computed (much like how the stat system call works on unix) and have this command do the actual computation. |
The each implementation of a datastore should decide on their size is measured and how accurate the result is. i.e. Some datastores may consider Size as the number of entries, others as the disk space they use etc. Some implementations may opt to improve the performance of the operation by caching or by reducing the accuracy of the calculation. Related: ipfs/kubo#4550
The each implementation of a datastore should decide on their size is measured and how accurate the result is. i.e. Some datastores may consider Size as the number of entries, others as the disk space they use etc. Some implementations may opt to improve the performance of the operation by caching or by reducing the accuracy of the calculation. Related: ipfs/kubo#4550
This adds a PersistentDatastore interface which allows datastores to report DiskUsage(). It implementes the interface on all wrapper types, which return 0 if the wrapped datastore does not provide this method. Related: ipfs/kubo#4550
This adds a PersistentDatastore interface which allows datastores to report DiskUsage(). It implementes the interface on all wrapper types, which return 0 if the wrapped datastore does not provide this method. Related: ipfs/kubo#4550
See #5010 |
ipfs repo size just display the repository size without needing to
count the blocks, which takes a significant amount of the repo stat
time.
ipfs-cluster queries the repository size to allocate pins to those
peers with most free space using repo/stat. When done too often on a busy
node, this has resulted in downgraded performance. Counting
blocks is an information that cluster does not use.
License: MIT
Signed-off-by: Hector Sanjuan hector@protocol.ai