pageserver: clean up stray VM pages #9927

erikgrinaker · 2024-11-28T15:14:33Z

In #9855 and #9914, we saw that ClearVmBits updates may be applied on shards that aren't responsible for VM pages. This can lead to stray writes to these keys, which can be incomplete since the shards will not have seen WAL records for these pages. We should clean these keys up, to avoid them cropping up again later and causing problems.

Bonus question: if we only apply ClearVmBits but not the page images on stray shards, why don't these error out during compaction when we presumably try to reconstruct the VM pages?

The text was updated successfully, but these errors were encountered:

erikgrinaker · 2024-11-29T09:15:10Z

This is already done during compactions, which removes relation blocks that don't belong to the shard. This also explains why these stray keys didn't cause compaction failures.

neon/libs/pageserver_api/src/shard.rs

Lines 173 to 190 in a68edad

    
               /// Return true if the key should be discarded if found in this shard's 
        
               /// data store, e.g. during compaction after a split. 
        
               /// 
        
               /// Shards _may_ drop keys which return false here, but are not obliged to. 
        
               pub fn is_key_disposable(&self, key: &Key) -> bool { 
        
                   if key_is_shard0(key) { 
        
                       // Q: Why can't we dispose of shard0 content if we're not shard 0? 
        
                       // A1: because the WAL ingestion logic currently ingests some shard 0 
        
                       //     content on all shards, even though it's only read on shard 0.  If we 
        
                       //     dropped it, then subsequent WAL ingest to these keys would encounter 
        
                       //     an error. 
        
                       // A2: because key_is_shard0 also covers relation size keys, which are written 
        
                       //     on all shards even though they're only maintained accurately on shard 0. 
        
                       false 
        
                   } else { 
        
                       !self.is_key_local(key) 
        
                   } 
        
               }

erikgrinaker · 2024-12-03T11:15:53Z

@jcsp indicated in #9786 (comment) that it's probably too expensive to force a recompaction of all tenants, since it requires downloading all layers from S3. So we'll just do this best-effort during compactions.

Let me know if I misinterpreted you @jcsp.

jcsp · 2024-12-03T11:26:16Z

Bonus question: if we only apply ClearVmBits but not the page images on stray shards, why don't these error out during compaction when we presumably try to reconstruct the VM pages?

Speaking from memory: the ShardedRange that we use to build the list of pages to include in compaction excludes them

erikgrinaker added the c/storage/pageserver Component: storage: pageserver label Nov 28, 2024

erikgrinaker self-assigned this Nov 28, 2024

This was referenced Nov 28, 2024

pageserver: verify correct handling of VM and FSM pages #9914

Closed

pageserver: only store SLRUs & aux files on shard zero #9786

Merged

erikgrinaker closed this as completed Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: clean up stray VM pages #9927

pageserver: clean up stray VM pages #9927

erikgrinaker commented Nov 28, 2024 •

edited

Loading

erikgrinaker commented Nov 29, 2024 •

edited

Loading

erikgrinaker commented Dec 3, 2024

jcsp commented Dec 3, 2024

pageserver: clean up stray VM pages #9927

pageserver: clean up stray VM pages #9927

Comments

erikgrinaker commented Nov 28, 2024 • edited Loading

erikgrinaker commented Nov 29, 2024 • edited Loading

erikgrinaker commented Dec 3, 2024

jcsp commented Dec 3, 2024

erikgrinaker commented Nov 28, 2024 •

edited

Loading

erikgrinaker commented Nov 29, 2024 •

edited

Loading