Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Splitstore: proposal for MaxHotBytesTarget and space adaptive GC #10388

Closed
5 of 15 tasks
ZenGround0 opened this issue Mar 3, 2023 · 0 comments
Closed
5 of 15 tasks

Splitstore: proposal for MaxHotBytesTarget and space adaptive GC #10388

ZenGround0 opened this issue Mar 3, 2023 · 0 comments
Labels
kind/enhancement Kind: Enhancement

Comments

@ZenGround0
Copy link
Contributor

ZenGround0 commented Mar 3, 2023

Checklist

  • This is not a new feature or an enhancement to the Filecoin protocol. If it is, please open an FIP issue.
  • This is not a new feature request. If it is, please file a feature request instead.
  • This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the lotus forum and select the category as Ideas.
  • I have a specific, actionable, and well motivated improvement to propose.

Lotus component

  • lotus daemon - chain sync
  • lotus miner - mining and block production
  • lotus miner/worker - sealing
  • lotus miner - proving(WindowPoSt)
  • lotus miner/market - storage deal
  • lotus miner/market - retrieval deal
  • lotus miner/market - data transfer
  • lotus client
  • lotus JSON-RPC API
  • lotus message management (mpool)
  • Other

Improvement Suggestion

Breakdown of splitstore problems and needs

Here is a useful breakdown from @f8-ptrk categorizing splitstore problems

three things are important for us:
a) we know it's upper boundaries in resource usage
b) we can rely on it not causing getting out of sync
c) we can rely on it to have everything available for sealing/deals

Point c) is about soundness of compaction. There is no evidence of problems with this today.

An attempt at fixing issues with point b) is being addressed in #10387 by limiting worker goroutines running flatten operations. Its possible (likely?) there are still problems here. To the extent this is still a problem the bottleneck is probably contention between GC and block sync reads / writes at the badger level. We need to keep monitoring this issue and if it persists we can revisit designing something that prioritizes block sync over GC. The easier solutions involve coarsely shutting off badger GC / compaction entirely after somehow detecting out of sync. Harder solutions might involve somehow structurally doing compaction access of badger at a lower priority than chain sync access. It's unclear if this is possible.

Failures related to point a) present as disk usage exceeding space on the device and crashing the daemon. This proposal is about making this failure mode much harder to reach

Fixing disk overflows

Measuring splitstore needs

The first thing we can do is make available information about the splitstore's last marked hotstore set so that users can know the absolutely necessary requirements. We can also keep a record of the size of the purged set so users can get a sense for the high water mark. This information can then be used along with the gc command in #10387 to know roughly how much garbage is in the store.

We can get marked set size without increasing compaction load by tweaking the walk functions to measure the bytes of all loaded graphs and return a count. We would need to load sizes of the blocks in the purged graph to get the full picture which would make discard post-walk processing approach universal post-walk processing.

Automating avoidance of a max value

The next thing we can do is use the measurement of the marked graph and a measurement of the existing total badger size to determine if moving GC (overhead current size + marked hot store size) will overload the disk. We can then modify hotstore GC behavior to never cross the boundary by 1) pre-emptively using moving GC when we are getting close to overloading the disk during moving GC 2) if we pass the point where move would overload the disk we can force a more aggressive online GC with a lower threshold.

While we could use the filesystem's remaining space as our target it will likely be friendlier for operators to allow them to specify a new config value MaxHotBytesTarget in the splitstore config.

More ideas along these lines

Other ideas we could pursue later include 1) ramping down the online GC threshold if we are not collecting enough garbage during online GC 2) estimating online GC threshold based on total live block size / total badger datastore size = 1 - g g being the average garbage fraction of badger vlogs. This last idea would benefit from a deeper investigation of badger vlog garbage fraction distributions. They're probably normal but maybe the story is more complicated. With a good understanding of this distribution we could get a good idea of what value to set threshold to in order to delete x percent of garbage + what overhead in time / bytes loaded each threshold value will cause / expected size of GC walk before we hit a vlog with a lower fraction and terminate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Kind: Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants