Garbage collect pages allocated after minimum txid #3

heyitsanthony · 2017-06-19T22:36:09Z

Read txns would lock pages allocated after the txn, keeping those pages
off the free list until closing the read txn. Instead, track allocating
txid to compute page lifetime, freeing pages if all txns between
page allocation and page free are closed.

Read txns would lock pages allocated after the txn, keeping those pages off the free list until closing the read txn. Instead, track allocating txid to compute page lifetime, freeing pages if all txns between page allocation and page free are closed.

xiang90 · 2017-06-23T18:32:24Z

LGTM.

jpbetz · 2017-07-17T22:58:24Z

@heyitsanthony @xiang90 Just wanted to double check my understanding of this improvement-

Bolt tracks all the pages freed by each update transaction, and put them in a "pending" state until all ongoing read transactions started at a transaction id lower than the update transaction's id are closed. And only after those read transactions are closed will they be marked as "freed" and become eligible for allocation.

But some pages have lifecycles ("starting" with allocate and ending with "free") that do not overlap with any of the ongoing read transactions, and so cannot be "seen" by any of those read transactions, and don't need for those reads, even if they started on earlier transaction ids, to be closed before they can be freed. So this fix frees them at the earliest opportunity.

Is that roughly accurate? Are there any other improvements or "pending" page leaks that this also fixes that I'm not seeing?

Thanks for working on this.

heyitsanthony · 2017-07-17T23:15:32Z

@jpbetz pretty much. Boltdb uses the lowest open txn id as the watermark for GC so a long-standing read txn will lock pages allocated+freed after that txn was opened, even if the pages aren't reachable from any open txn. The fix involves tracking the allocation revisions for pending pages to compute the page lifetime so they can be safely freed early.

There's still an opportunity for more GC here, but I couldn't figure out the allocation path to get it to work. Namely, some pages will be passed to free without being marked with a revision on allocation (see the code around https://github.com/coreos/bbolt/pull/3/files#diff-238a035220b760d4181155d23414dc1fR136), so there's an allocation somewhere that's not being tracked. Right now it falls back to the old txn id watermarking.

jpbetz · 2017-07-18T17:09:56Z

Thanks @heyitsanthony. This sounds like it might resolve some "mvcc: database space exceeded" issues we've seen.

Do you happen to know of are specific etcd operations that would run long read transactions? Does etcd already have any instrumentation to track the # of boltdb pages in pending state or the lifespans of read transactions? I'm considering adding some instrumentation to help track this sort of thing.

heyitsanthony · 2017-07-18T17:17:36Z

@jpbetz backup snapshots and snapshots sent over raft for recovering/new members will keep read-only txns open for a while. There's a metric in mvcc/backend/metrics.go that tracks the snapshot duration already, but nothing that tracks any internal boltdb allocation information (aside from db size) since it's hidden from other packages.

xiang90 merged commit ad39960 into etcd-io:master Jun 24, 2017

jpbetz mentioned this pull request Jul 18, 2017

Backport #8009 into 3.0? etcd-io/etcd#8253

Closed

jpbetz mentioned this pull request Nov 2, 2017

vendor: Backport bbolt freelist corruption and fragmentation fixes to 3.2 branch etcd-io/etcd#8813

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage collect pages allocated after minimum txid #3

Garbage collect pages allocated after minimum txid #3

heyitsanthony commented Jun 19, 2017

xiang90 commented Jun 23, 2017

jpbetz commented Jul 17, 2017 •

edited

Loading

heyitsanthony commented Jul 17, 2017

jpbetz commented Jul 18, 2017 •

edited

Loading

heyitsanthony commented Jul 18, 2017

Garbage collect pages allocated after minimum txid #3

Garbage collect pages allocated after minimum txid #3

Conversation

heyitsanthony commented Jun 19, 2017

xiang90 commented Jun 23, 2017

jpbetz commented Jul 17, 2017 • edited Loading

heyitsanthony commented Jul 17, 2017

jpbetz commented Jul 18, 2017 • edited Loading

heyitsanthony commented Jul 18, 2017

jpbetz commented Jul 17, 2017 •

edited

Loading

jpbetz commented Jul 18, 2017 •

edited

Loading