Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: ipfs repo size command #4550

Closed
wants to merge 1 commit into from
Closed

Feat: ipfs repo size command #4550

wants to merge 1 commit into from

Conversation

hsanjuan
Copy link
Contributor

@hsanjuan hsanjuan commented Jan 5, 2018

ipfs repo size just display the repository size without needing to
count the blocks, which takes a significant amount of the repo stat
time.

ipfs-cluster queries the repository size to allocate pins to those
peers with most free space using repo/stat. When done too often on a busy
node, this has resulted in downgraded performance. Counting
blocks is an information that cluster does not use.

License: MIT
Signed-off-by: Hector Sanjuan hector@protocol.ai

ipfs repo size just display the repository size without needing to
count the blocks, which takes a significant amount of the repo stat
time.

ipfs-cluster queries the repository size to allocate pins to those
peers with most free space using repo/stat. When done too often on a busy
node, this has resulted in downgraded performance. Counting
blocks is an information that cluster does not use.

License: MIT
Signed-off-by: Hector Sanjuan <hector@protocol.ai>
@ghost ghost assigned hsanjuan Jan 5, 2018
@ghost ghost added the status/in-progress In progress label Jan 5, 2018
@hsanjuan
Copy link
Contributor Author

hsanjuan commented Jan 5, 2018

This is still relatively slow so I'm not 100% sure it fixes the problem, responses do show a ~40% improvement.

Any of you guys can think of a better way to keeping track of the repo size? Would it be doable to calculate it once and then update the value on every write/gc ?

@whyrusleeping
Copy link
Member

Maybe we should just add this as a flag to repo stat?

@Stebalien
Copy link
Member

Any of you guys can think of a better way to keeping track of the repo size? Would it be doable to calculate it once and then update the value on every write/gc?

I think we should leave it up to the datastore. That is, ask the datastore its size and let it do whatever it needs. For flatfs, we should keep a running count. If you think the performance sucks now, it's going to absolutely tank when we upgrade to a kernel with meltdown protections enabled (syscall performance will tank, we're currently doing a du (effectively)). For badger, we can just ask badger (it provides a Size function).

@ghost
Copy link

ghost commented Jan 6, 2018

Would it be an option at all to keep track of repo size separately? i.e. start at zero, then increment for every added block, decrement for every deleted block? That'd make repo size an accounting function and not an IO operation anymore.

I don't know how Badger or LevelDB keep track of size, they're probably smarter than flatfs. In that case accounting would still be an option for flatfs?

@hsanjuan
Copy link
Contributor Author

hsanjuan commented Jan 8, 2018

I think we should leave it up to the datastore. That is, ask the datastore its size and let it do whatever it needs. For flatfs, we should keep a running count.

This sounds like a good way to do. I can try do this. Do you have any pointers? (There's an interface and datastores are implemented in different repos right ?) What datastores are available now?

Maybe we should just add this as a flag to repo stat?

Sure, it's cosmetic so I can do it at the end (along with tests)

@kevina
Copy link
Contributor

kevina commented Jan 8, 2018

Although this might brake backwards compatibility I would lean towards having the stat command only return what is is already computed (much like how the stat system call works on unix) and have this command do the actual computation.

hsanjuan added a commit to ipfs/go-datastore that referenced this pull request Jan 18, 2018
The each implementation of a datastore should decide on their size
is measured and how accurate the result is.

i.e. Some datastores may consider Size as the number of entries,
others as the disk space they use etc. Some implementations may
opt to improve the performance of the operation by caching or
by reducing the accuracy of the calculation.

Related: ipfs/kubo#4550
hsanjuan added a commit to ipfs/go-datastore that referenced this pull request Jan 18, 2018
The each implementation of a datastore should decide on their size
is measured and how accurate the result is.

i.e. Some datastores may consider Size as the number of entries,
others as the disk space they use etc. Some implementations may
opt to improve the performance of the operation by caching or
by reducing the accuracy of the calculation.

Related: ipfs/kubo#4550
hsanjuan added a commit to ipfs/go-datastore that referenced this pull request Jan 22, 2018
This adds a PersistentDatastore interface which allows datastores
to report DiskUsage().

It implementes the interface on all wrapper types, which return
0 if the wrapped datastore does not provide this method.

Related: ipfs/kubo#4550
hsanjuan added a commit to ipfs/go-datastore that referenced this pull request Jan 24, 2018
This adds a PersistentDatastore interface which allows datastores
to report DiskUsage().

It implementes the interface on all wrapper types, which return
0 if the wrapped datastore does not provide this method.

Related: ipfs/kubo#4550
@hsanjuan
Copy link
Contributor Author

hsanjuan commented May 9, 2018

See #5010

@hsanjuan hsanjuan closed this May 9, 2018
@ghost ghost removed the status/in-progress In progress label May 9, 2018
@Stebalien Stebalien deleted the feat/repo-size branch February 28, 2019 22:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants