Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collect pages allocated after minimum txid #3

Merged
merged 1 commit into from
Jun 24, 2017

Conversation

heyitsanthony
Copy link
Contributor

Read txns would lock pages allocated after the txn, keeping those pages
off the free list until closing the read txn. Instead, track allocating
txid to compute page lifetime, freeing pages if all txns between
page allocation and page free are closed.

Read txns would lock pages allocated after the txn, keeping those pages
off the free list until closing the read txn. Instead, track allocating
txid to compute page lifetime, freeing pages if all txns between
page allocation and page free are closed.
@xiang90
Copy link
Contributor

xiang90 commented Jun 23, 2017

LGTM.

@xiang90 xiang90 merged commit ad39960 into etcd-io:master Jun 24, 2017
@jpbetz
Copy link
Contributor

jpbetz commented Jul 17, 2017

@heyitsanthony @xiang90 Just wanted to double check my understanding of this improvement-

Bolt tracks all the pages freed by each update transaction, and put them in a "pending" state until all ongoing read transactions started at a transaction id lower than the update transaction's id are closed. And only after those read transactions are closed will they be marked as "freed" and become eligible for allocation.

But some pages have lifecycles ("starting" with allocate and ending with "free") that do not overlap with any of the ongoing read transactions, and so cannot be "seen" by any of those read transactions, and don't need for those reads, even if they started on earlier transaction ids, to be closed before they can be freed. So this fix frees them at the earliest opportunity.

Is that roughly accurate? Are there any other improvements or "pending" page leaks that this also fixes that I'm not seeing?

Thanks for working on this.

@heyitsanthony
Copy link
Contributor Author

@jpbetz pretty much. Boltdb uses the lowest open txn id as the watermark for GC so a long-standing read txn will lock pages allocated+freed after that txn was opened, even if the pages aren't reachable from any open txn. The fix involves tracking the allocation revisions for pending pages to compute the page lifetime so they can be safely freed early.

There's still an opportunity for more GC here, but I couldn't figure out the allocation path to get it to work. Namely, some pages will be passed to free without being marked with a revision on allocation (see the code around https://github.com/coreos/bbolt/pull/3/files#diff-238a035220b760d4181155d23414dc1fR136), so there's an allocation somewhere that's not being tracked. Right now it falls back to the old txn id watermarking.

@jpbetz
Copy link
Contributor

jpbetz commented Jul 18, 2017

Thanks @heyitsanthony. This sounds like it might resolve some "mvcc: database space exceeded" issues we've seen.

Do you happen to know of are specific etcd operations that would run long read transactions? Does etcd already have any instrumentation to track the # of boltdb pages in pending state or the lifespans of read transactions? I'm considering adding some instrumentation to help track this sort of thing.

@heyitsanthony
Copy link
Contributor Author

@jpbetz backup snapshots and snapshots sent over raft for recovering/new members will keep read-only txns open for a while. There's a metric in mvcc/backend/metrics.go that tracks the snapshot duration already, but nothing that tracks any internal boltdb allocation information (aside from db size) since it's hidden from other packages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants