backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

dt · 2019-04-07T19:21:53Z

I’ve been looking at the chunk backfiller stuff a bit this morning/afternoon and I’m a bit worried about it / resume spans.

ResumeSpans are currently the to-do spans for the backfiller, and are stored in the job details. While it starts as a single span covering the whole table, but partitioned processors each update it (splitting it up into many subspans as they complete individual subspans.

This is similar to the pattern that previously got us in trouble with RESTORE and IMPORT: repeatedly rewriting the job details to mark done/not done spans. In addition to the just normal load/contention concerns, the particular issue with this was that when keys included large fields (e.g. a longer string), these spans could be large, and repeated rewriting all of them on every update resulted a large amount of MVCC revision data for a single job record -- indeed, it'd sometimes mean a single job row could easily exceed the max range size and since ranges cannot split a row, cause oversize ranges.

The conclusion from that was that job details and job progess need to be separate : The job details can be large and documents the definition of the job, what work it involves, etc. Execution of the job should not need to rewrite the details as it makes progress. The job progress is what can be rewritten often, but must not be large.

We should eventually move the frequently rewritten done/not done span into to the job Progress field where they belong, and consider if we can fine a cheaper (maybe track more in-mem with a loss-y representation is persisted, with that only mattering if you need to resume from said persisted state).

In the short term though we should rewrite ResumeSpans less often: a given processor should be able to complete all the spans assigned to it before it is forced to re-write the whole resume spans, and should choose to do time-based checkpoints less often too.

The text was updated successfully, but these errors were encountered:

vivekmenezes · 2019-04-08T02:19:18Z

This is not going to be straightforward to implement because we read the checkpoint at the gateway before each backfill chunk to run the next backfill evaluation chunk over (1-2 minutes).

It probably makes the most sense to extend the chunk time to like 15 minutes and setup a timer to ping the schema lease from the gateway?

dt · 2019-04-08T02:24:21Z

Yeah, that's what I'm looking at doing: moving the coordinator to run the lease extender and distsql flows in parallel goroutines so the flow can run as long as it wants, then making the checkpoint interval how long between writing checkpoints, but not forcing the processor to exit after doing so.

Even better might be to have the processors emit rows with finished spans and let the gateway coalesce those / choose when to checkpoint as that'd make it easily to frequency-cap it, but I think just making the existing flow run longer between checkpointing would be a good first step.

vivekmenezes · 2019-04-08T02:29:33Z

you'll also have to update the logic on the gateway that updates the progress since we don't want to update % progress every 15 minutes( or whatever interval you choose)

dt · 2019-08-14T17:11:25Z

@spaskob via email @danhhz pointed me towards the stats Sampler process which sends back progress info like we discussed but via distsql flow metadata as opposed to actual rows: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/distsqlrun/sampler.go#L214

This looks like a nice, clean way to do it compared to sticking encoded bytes in the rows themselves. I'm not quite sure how to get at this metadata from outside the flow so doing it this way might require inserting an extra processor in the flow itself (or tweaking the API on the receiver) so it might not be quite drop-in for the current backfiller flow. I'm looking at this for IMPORT and I'll let you know what I find.

Before this change index backfill workers would write their updated spans concurrently to the job's row which can potentially lead to blow up in the jobs table. This PR is a first step in improving this. Instead of updating the finished spans in each worker, we propagate this info to the gateway node and do bulk updates there. Next step is to let the workers run longer by not canceling them at 2 mins which is currently done for historical reasons. Fixes cockroachdb#36601. Release note: None

39773: jobs: improve the status reporting of index backfill jobs r=spaskob a=spaskob Before this change index backfill workers would write their updated spans concurrently to the job's row which can potentially lead to blow up in the jobs table. This PR is a first step in improving this. Instead of updating the finished spans in each worker, we propagate this info to the gateway node and do bulk updates there. Next step is to let the workers run longer by not canceling them at 2 mins which is currently done for historical reasons. Fixes #36601. Release note: None Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>

Currently if a new node participates in a cluster with old version nodes the index backfill will be busy waiting since the different versions use different ways of reporting status. We detect this situation and revert to the old status reporting style if necessary. Touches cockroachdb#36601. Release note: None

40288: jobs: make new index backfill status tracking bwd compatible r=spaskob a=spaskob Currently if a new node participates in a cluster with old version nodes the index backfill will be busy waiting since the different versions use different ways of reporting status. We detect this situation and revert to the old status reporting style if necessary. Touches #36601. Release note: None Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>

dt self-assigned this Apr 7, 2019

awoods187 added A-schema-changes C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 22, 2019

rolandcrosby assigned spaskob and unassigned dt Aug 12, 2019

bdarnell mentioned this issue Aug 12, 2019

jobs: Use separate tables for cancellation and progress #39601

Closed

spaskob mentioned this issue Aug 20, 2019

jobs: improve the status reporting of index backfill jobs #39773

Merged

craig bot closed this as completed in 9bce82e Aug 23, 2019

spaskob mentioned this issue Aug 28, 2019

jobs: make new index backfill status tracking bwd compatible #40288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

dt commented Apr 7, 2019 •

edited

Loading

vivekmenezes commented Apr 8, 2019

dt commented Apr 8, 2019

vivekmenezes commented Apr 8, 2019

dt commented Aug 14, 2019

backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

Comments

dt commented Apr 7, 2019 • edited Loading

vivekmenezes commented Apr 8, 2019

dt commented Apr 8, 2019

vivekmenezes commented Apr 8, 2019

dt commented Aug 14, 2019

dt commented Apr 7, 2019 •

edited

Loading