Feat: ipfs repo size command #4550

hsanjuan · 2018-01-05T17:58:44Z

ipfs repo size just display the repository size without needing to
count the blocks, which takes a significant amount of the repo stat
time.

ipfs-cluster queries the repository size to allocate pins to those
peers with most free space using repo/stat. When done too often on a busy
node, this has resulted in downgraded performance. Counting
blocks is an information that cluster does not use.

License: MIT
Signed-off-by: Hector Sanjuan hector@protocol.ai

ipfs repo size just display the repository size without needing to count the blocks, which takes a significant amount of the repo stat time. ipfs-cluster queries the repository size to allocate pins to those peers with most free space using repo/stat. When done too often on a busy node, this has resulted in downgraded performance. Counting blocks is an information that cluster does not use. License: MIT Signed-off-by: Hector Sanjuan <hector@protocol.ai>

hsanjuan · 2018-01-05T18:02:35Z

This is still relatively slow so I'm not 100% sure it fixes the problem, responses do show a ~40% improvement.

Any of you guys can think of a better way to keeping track of the repo size? Would it be doable to calculate it once and then update the value on every write/gc ?

whyrusleeping · 2018-01-05T18:19:11Z

Maybe we should just add this as a flag to repo stat?

Stebalien · 2018-01-05T19:55:43Z

Any of you guys can think of a better way to keeping track of the repo size? Would it be doable to calculate it once and then update the value on every write/gc?

I think we should leave it up to the datastore. That is, ask the datastore its size and let it do whatever it needs. For flatfs, we should keep a running count. If you think the performance sucks now, it's going to absolutely tank when we upgrade to a kernel with meltdown protections enabled (syscall performance will tank, we're currently doing a du (effectively)). For badger, we can just ask badger (it provides a Size function).

ghost · 2018-01-06T00:20:19Z

Would it be an option at all to keep track of repo size separately? i.e. start at zero, then increment for every added block, decrement for every deleted block? That'd make repo size an accounting function and not an IO operation anymore.

I don't know how Badger or LevelDB keep track of size, they're probably smarter than flatfs. In that case accounting would still be an option for flatfs?

hsanjuan · 2018-01-08T09:34:59Z

I think we should leave it up to the datastore. That is, ask the datastore its size and let it do whatever it needs. For flatfs, we should keep a running count.

This sounds like a good way to do. I can try do this. Do you have any pointers? (There's an interface and datastores are implemented in different repos right ?) What datastores are available now?

Maybe we should just add this as a flag to repo stat?

Sure, it's cosmetic so I can do it at the end (along with tests)

kevina · 2018-01-08T18:40:25Z

Although this might brake backwards compatibility I would lean towards having the stat command only return what is is already computed (much like how the stat system call works on unix) and have this command do the actual computation.

The each implementation of a datastore should decide on their size is measured and how accurate the result is. i.e. Some datastores may consider Size as the number of entries, others as the disk space they use etc. Some implementations may opt to improve the performance of the operation by caching or by reducing the accuracy of the calculation. Related: ipfs/kubo#4550

This adds a PersistentDatastore interface which allows datastores to report DiskUsage(). It implementes the interface on all wrapper types, which return 0 if the wrapped datastore does not provide this method. Related: ipfs/kubo#4550

hsanjuan · 2018-05-09T19:50:15Z

See #5010

ghost assigned hsanjuan Jan 5, 2018

ghost added the status/in-progress In progress label Jan 5, 2018

hsanjuan mentioned this pull request Jan 8, 2018

"received alert for freespace in" ipfs-cluster/ipfs-cluster#280

Closed

hsanjuan mentioned this pull request Jan 18, 2018

Add Size() method to datastores ipfs/go-datastore#74

Merged

magik6k mentioned this pull request Mar 7, 2018

Feat: Implement a PersistentDatastore by adding DiskUsage method ipfs/go-ds-flatfs#27

Merged

hsanjuan mentioned this pull request May 9, 2018

Efficient "repo stat" (DiskUsage) and "--size-only" flag #5010

Merged

hsanjuan closed this May 9, 2018

ghost removed the status/in-progress In progress label May 9, 2018

Stebalien deleted the feat/repo-size branch February 28, 2019 22:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: ipfs repo size command #4550

Feat: ipfs repo size command #4550

hsanjuan commented Jan 5, 2018

hsanjuan commented Jan 5, 2018

whyrusleeping commented Jan 5, 2018

Stebalien commented Jan 5, 2018

ghost commented Jan 6, 2018

hsanjuan commented Jan 8, 2018

kevina commented Jan 8, 2018

hsanjuan commented May 9, 2018

Feat: ipfs repo size command #4550

Feat: ipfs repo size command #4550

Conversation

hsanjuan commented Jan 5, 2018

hsanjuan commented Jan 5, 2018

whyrusleeping commented Jan 5, 2018

Stebalien commented Jan 5, 2018

ghost commented Jan 6, 2018

hsanjuan commented Jan 8, 2018

kevina commented Jan 8, 2018

hsanjuan commented May 9, 2018