Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

Closed
dt opened this issue Apr 7, 2019 · 4 comments · Fixed by #39773
Closed

backfiller: Index/Column backfiller ResumeSpans need to be reworked #36601

dt opened this issue Apr 7, 2019 · 4 comments · Fixed by #39773
Assignees
Labels
A-schema-changes C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)

Comments

@dt
Copy link
Member

dt commented Apr 7, 2019

I’ve been looking at the chunk backfiller stuff a bit this morning/afternoon and I’m a bit worried about it / resume spans.

ResumeSpans are currently the to-do spans for the backfiller, and are stored in the job details. While it starts as a single span covering the whole table, but partitioned processors each update it (splitting it up into many subspans as they complete individual subspans.

This is similar to the pattern that previously got us in trouble with RESTORE and IMPORT: repeatedly rewriting the job details to mark done/not done spans. In addition to the just normal load/contention concerns, the particular issue with this was that when keys included large fields (e.g. a longer string), these spans could be large, and repeated rewriting all of them on every update resulted a large amount of MVCC revision data for a single job record -- indeed, it'd sometimes mean a single job row could easily exceed the max range size and since ranges cannot split a row, cause oversize ranges.

The conclusion from that was that job details and job progess need to be separate : The job details can be large and documents the definition of the job, what work it involves, etc. Execution of the job should not need to rewrite the details as it makes progress. The job progress is what can be rewritten often, but must not be large.

We should eventually move the frequently rewritten done/not done span into to the job Progress field where they belong, and consider if we can fine a cheaper (maybe track more in-mem with a loss-y representation is persisted, with that only mattering if you need to resume from said persisted state).

In the short term though we should rewrite ResumeSpans less often: a given processor should be able to complete all the spans assigned to it before it is forced to re-write the whole resume spans, and should choose to do time-based checkpoints less often too.

@dt dt self-assigned this Apr 7, 2019
@vivekmenezes
Copy link
Contributor

This is not going to be straightforward to implement because we read the checkpoint at the gateway before each backfill chunk to run the next backfill evaluation chunk over (1-2 minutes).

It probably makes the most sense to extend the chunk time to like 15 minutes and setup a timer to ping the schema lease from the gateway?

@dt
Copy link
Member Author

dt commented Apr 8, 2019

Yeah, that's what I'm looking at doing: moving the coordinator to run the lease extender and distsql flows in parallel goroutines so the flow can run as long as it wants, then making the checkpoint interval how long between writing checkpoints, but not forcing the processor to exit after doing so.

Even better might be to have the processors emit rows with finished spans and let the gateway coalesce those / choose when to checkpoint as that'd make it easily to frequency-cap it, but I think just making the existing flow run longer between checkpointing would be a good first step.

@vivekmenezes
Copy link
Contributor

you'll also have to update the logic on the gateway that updates the progress since we don't want to update % progress every 15 minutes( or whatever interval you choose)

@awoods187 awoods187 added A-schema-changes C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) labels Apr 22, 2019
@rolandcrosby rolandcrosby assigned spaskob and unassigned dt Aug 12, 2019
@dt
Copy link
Member Author

dt commented Aug 14, 2019

@spaskob via email @danhhz pointed me towards the stats Sampler process which sends back progress info like we discussed but via distsql flow metadata as opposed to actual rows: https://github.com/cockroachdb/cockroach/blob/master/pkg/sql/distsqlrun/sampler.go#L214

This looks like a nice, clean way to do it compared to sticking encoded bytes in the rows themselves. I'm not quite sure how to get at this metadata from outside the flow so doing it this way might require inserting an extra processor in the flow itself (or tweaking the API on the receiver) so it might not be quite drop-in for the current backfiller flow. I'm looking at this for IMPORT and I'll let you know what I find.

spaskob pushed a commit to spaskob/cockroach that referenced this issue Aug 20, 2019
Before this change index backfill workers would write their updated
spans concurrently to the job's row which can potentially lead to blow
up in the jobs table. This PR is a first step in improving this. Instead
of updating the finished spans in each worker, we propagate this info to
the gateway node and do bulk updates there. Next step is to let the
workers run longer by not canceling them at 2 mins which is currently
done for historical reasons.

Fixes cockroachdb#36601.

Release note: None
craig bot pushed a commit that referenced this issue Aug 23, 2019
39773: jobs: improve the status reporting of index backfill jobs r=spaskob a=spaskob

Before this change index backfill workers would write their updated
spans concurrently to the job's row which can potentially lead to blow
up in the jobs table. This PR is a first step in improving this. Instead
of updating the finished spans in each worker, we propagate this info to
the gateway node and do bulk updates there. Next step is to let the
workers run longer by not canceling them at 2 mins which is currently
done for historical reasons.

Fixes #36601.

Release note: None

Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
@craig craig bot closed this as completed in 9bce82e Aug 23, 2019
spaskob pushed a commit to spaskob/cockroach that referenced this issue Aug 28, 2019
Currently if a new node participates in a cluster with old version nodes
the index backfill will be busy waiting since the different versions use
different ways of reporting status. We detect this situation and revert
to the old status reporting style if necessary.

Touches cockroachdb#36601.

Release note: None
craig bot pushed a commit that referenced this issue Aug 29, 2019
40288: jobs: make new index backfill status tracking bwd compatible r=spaskob a=spaskob

Currently if a new node participates in a cluster with old version nodes the index backfill
will be busy waiting since the different versions use different ways of reporting status.
We detect this situation and revert to the old status reporting style if necessary.

Touches #36601.

Release note: None

Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-schema-changes C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants