kvserver: consistency check should only checkpoint relevant range #90543

erikgrinaker · 2022-10-24T14:49:48Z

When the consistency checker detects a range inconsistency, it takes a storage checkpoint on all nodes with range replicas. These are hardlinks, so they're a cheap copy of the entire database. However, over time, as data is written, this copy will consume as much space as the main database. This can easily run the node out of disk rather rapidly.

We should consider only taking a checkpoint of the SSTs that are relevant to the replica instead, to avoid running the node out of disk. This should be made available as a Pebble database for those SSTs, so that usual debug tooling can be used to investigate it. It also needs to contain the relevant manifest history.

Note that we specifically don't want to export the KV pairs of the replica, since we often need the LSM structure for debugging, e.g. due to Pebble compaction bugs.

Jira issue: CRDB-20829

blathers-crl · 2022-10-24T14:49:50Z

cc @cockroachdb/replication

erikgrinaker · 2022-10-24T15:00:31Z

@jbowens The majority of the work here will be in Pebble, should I write up a separate issue there?

erikgrinaker · 2022-10-24T16:30:26Z

@sumeerbhola pointed out that we'll often also want to keep the neighbouring ranges for debugging, even if it isn't immediately adjacent in the keyspace.

RaduBerinde · 2023-01-13T19:52:35Z

The pebble side of this is done (see cockroachdb/pebble#2237).

erikgrinaker · 2023-01-16T11:40:00Z

Fantastic, thanks! Will storage be doing the CRDB integration as well, or would you like us (repl) to take it from here?

RaduBerinde · 2023-01-16T15:53:57Z

I think your team might know better how to choose the spans to restrict the checkpoint to (sounds like we want the relevant range plus some ranges around it, plus some internal key space?)

erikgrinaker · 2023-01-16T16:19:29Z

Yep, we'll pick it up. Thanks again Radu!

pav-kv · 2023-01-25T18:10:00Z

@erikgrinaker @sumeerbhola In the checkpoint failure investigation a few months ago, which ranges in the checkpoint (except the inconsistent one) ended up useful for finding the bug?

Was it only the checkpoint from the killed node that was useful? For the killed nodes we don't need to bother much about narrowing it, can take a full checkpoint. For those nodes that survive (and are likely correct), it is more important to reduce the cost of checkpoints, so we can take a partial checkpoint of only the range itself. The neighbouring ranges might be completely different across the 3 nodes, so not sure whether including them in the snapshot is useful. Also, only the range under consideration is raftMu-held, and neighbouring ranges (even if match) will be out of sync in these checkpoints.

So I'm inclined to propose this:

Take full checkpoint on the nodes to be killed.
Take single-range checkpoint on the survived nodes.

pav-kv · 2023-01-25T18:29:50Z

Although, the neighbouring ranges can be useful for debugging split/merge-related issues. @erikgrinaker @sumeerbhola @tbg Can you think of other reasons / examples why we would extend the checkpoint beyond the single range?

erikgrinaker · 2023-01-26T12:34:47Z

In the checkpoint failure investigation a few months ago, which ranges in the checkpoint (except the inconsistent one) ended up useful for finding the bug?

The neighbouring ranges. There was a Pebble range tombstone in a neighbouring range which wasn't properly truncated to the range bounds and therefore extended into the inconsistent range, deleting a bunch of keys that it shouldn't have deleted. We want to preserve the neighbouring ranges so that we can look at interactions with e.g. range keys and range tombstones across ranges.

Also, only the range under consideration is raftMu-held, and neighbouring ranges (even if match) will be out of sync in these checkpoints.

The neighbouring ranges are mostly useful to look at cross-range LSM interactions on a single node, so it's ok if they aren't completely in sync across nodes, as long as they're consistent on each node (which Pebble ensures they are).

For the killed nodes we don't need to bother much about narrowing it, can take a full checkpoint. For those nodes that survive (and are likely correct), it is more important to reduce the cost of checkpoints, so we can take a partial checkpoint of only the range itself.

That's true, but I figure we may as well just keep it simple and do the same checkpointing across all nodes?

erikgrinaker added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-kv-replication labels Oct 24, 2022

erikgrinaker mentioned this issue Oct 24, 2022

db: narrow checkpoints cockroachdb/pebble#2045

Closed

blathers-crl bot added the T-storage Storage Team label Oct 24, 2022

nicktrav added the O-postmortem Originated from a Postmortem action item. label Oct 26, 2022

lunevalex added the N-followup Needs followup. label Nov 21, 2022

RaduBerinde added the E-quick-win Likely to be a quick win for someone experienced. label Jan 13, 2023

erikgrinaker assigned pav-kv Jan 16, 2023

erikgrinaker removed the T-storage Storage Team label Jan 16, 2023

pav-kv mentioned this issue Jan 26, 2023

kvserver: use narrow checkpoints in consistency checker #95963

Merged

craig bot closed this as completed in b031933 Feb 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kvserver: consistency check should only checkpoint relevant range #90543

kvserver: consistency check should only checkpoint relevant range #90543

erikgrinaker commented Oct 24, 2022 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Oct 24, 2022

erikgrinaker commented Oct 24, 2022

erikgrinaker commented Oct 24, 2022

RaduBerinde commented Jan 13, 2023

erikgrinaker commented Jan 16, 2023

RaduBerinde commented Jan 16, 2023

erikgrinaker commented Jan 16, 2023

pav-kv commented Jan 25, 2023 •

edited

Loading

pav-kv commented Jan 25, 2023

erikgrinaker commented Jan 26, 2023

kvserver: consistency check should only checkpoint relevant range #90543

kvserver: consistency check should only checkpoint relevant range #90543

Comments

erikgrinaker commented Oct 24, 2022 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Oct 24, 2022

erikgrinaker commented Oct 24, 2022

erikgrinaker commented Oct 24, 2022

RaduBerinde commented Jan 13, 2023

erikgrinaker commented Jan 16, 2023

RaduBerinde commented Jan 16, 2023

erikgrinaker commented Jan 16, 2023

pav-kv commented Jan 25, 2023 • edited Loading

pav-kv commented Jan 25, 2023

erikgrinaker commented Jan 26, 2023

erikgrinaker commented Oct 24, 2022 •

edited by cockroach-jira-scripts

Loading

pav-kv commented Jan 25, 2023 •

edited

Loading