Skip to content

Commit

Permalink
Fix zettacache allocation cursor on startup (openzfs#17)
Browse files Browse the repository at this point in the history
On startup, the allocation cursor for each of the buckets within the
zettacache appears to be set incorrectly. Rather than the cursor being
set to the most free slab in each bucket, we inadvertently set the
cursor to the last slab that is loaded from disk, for each bucket.

This means that on start up, if the last slab loaded for a bucket
happens to be full and at the end of the sorted slab list, a new slab
will be converted on the next allocation, rather than satisfying that
allocation from a partially allocated slab already in the bucket.

Further, it means that if that new slab is filled with new allocations
within a single checkpoint, allocation failures will result, even if the
bucket has lots of free space available.

This error is easy to make, because SlabAllocationBuckets can be in a
half-constructed state, where it's SortedSlabs don't have all the slabs,
and their last_allocated is bogus

The fix implemented here, is to no longer allow a half-constructed
SlabAllocationBuckets; instead, its new() method now takes sufficient
parameters to get it into a valid state, that accurately represents
what's on disk, before the object is returned to the caller.
  • Loading branch information
Prakash Surya authored Dec 1, 2021
1 parent 660bda7 commit 7906583
Showing 1 changed file with 21 additions and 12 deletions.
33 changes: 21 additions & 12 deletions cmd/zfs_object_agent/zettacache/src/block_allocator.rs
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ use std::collections::{BTreeMap, BTreeSet, HashSet};
use std::ops::Bound::*;
use std::sync::Arc;
use std::time::Instant;
use std::{fmt, iter, mem};
use std::{fmt, mem};
use util::BitmapRangeIterator;
use util::RangeTree;
use util::{get_tunable, TerseVec};
Expand Down Expand Up @@ -770,19 +770,22 @@ impl SortedSlabs {
struct SlabAllocationBuckets(BTreeMap<u32, SortedSlabs>);

impl SlabAllocationBuckets {
fn new(phys: SlabAllocationBucketsPhys) -> Self {
fn new(
phys: SlabAllocationBucketsPhys,
mut slabs: BTreeMap<u32, Vec<SortedSlabEntry>>,
) -> Self {
let mut buckets = BTreeMap::new();
for (max_size, is_extent_based) in phys.buckets {
buckets.insert(max_size, SortedSlabs::new(is_extent_based, iter::empty()));
buckets.insert(
max_size,
SortedSlabs::new(is_extent_based, slabs.remove(&max_size).unwrap_or_default()),
);
}
SlabAllocationBuckets(buckets)
}

fn add_slab_to_bucket(&mut self, bucket: u32, slab: &Slab) {
self.0
.get_mut(&bucket)
.unwrap()
.insert(slab.to_sorted_slab_entry());
// We expect for all slabs passed in to be consumed and added to a bucket.
assert!(slabs.is_empty());

SlabAllocationBuckets(buckets)
}

fn get_bucket_for_allocation_size(&mut self, request_size: u32) -> (&u32, &mut SortedSlabs) {
Expand Down Expand Up @@ -986,11 +989,15 @@ impl BlockAllocator {

let mut available_space = 0u64;
let mut free_slabs = Vec::new();
let mut slab_buckets = SlabAllocationBuckets::new(phys.slab_buckets);
let mut slabs_by_bucket: BTreeMap<u32, Vec<SortedSlabEntry>> = BTreeMap::new();

for slab in slabs.0.iter() {
match &slab.info {
SlabType::BitmapBased(_) | SlabType::ExtentBased(_) => {
slab_buckets.add_slab_to_bucket(slab.get_max_size(), slab);
slabs_by_bucket
.entry(slab.get_max_size())
.or_default()
.push(slab.to_sorted_slab_entry());
available_space += slab.get_free_space();
}
SlabType::Free(_) => {
Expand All @@ -1000,6 +1007,8 @@ impl BlockAllocator {
}
}

let slab_buckets = SlabAllocationBuckets::new(phys.slab_buckets, slabs_by_bucket);

info!(
"loaded BlockAllocator metadata in {}ms",
begin.elapsed().as_millis()
Expand Down

0 comments on commit 7906583

Please sign in to comment.