-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs: improve the status reporting of index backfill jobs #39773
Conversation
Before this change index backfill workers would write their updated spans concurrently to the job's row which can potentially lead to blow up in the jobs table. This PR is a first step in improving this. Instead of updating the finished spans in each worker, we propagate this info to the gateway node and do bulk updates there. Next step is to let the workers run longer by not canceling them at 2 mins which is currently done for historical reasons. Fixes cockroachdb#36601. Release note: None
8f4be12
to
9bce82e
Compare
@@ -843,10 +841,16 @@ func (sc *SchemaChanger) distBackfill( | |||
otherTableDescs = append(otherTableDescs, *table.TableDesc()) | |||
} | |||
} | |||
rw := &errOnlyResultWriter{} | |||
metaFn := func(_ context.Context, meta *distsqlpb.ProducerMetadata) { | |||
if meta.BulkProcessorProgress != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we avoid wrapping todoSpans access in a mutex because all the access from the callback is from the same goroutine (the distsql inbound goroutine) and all happens while this goroutine is blocked on Run
.
If that is correct, that's a slightly brittle assumption but it works for me for now.
bors r+ |
39773: jobs: improve the status reporting of index backfill jobs r=spaskob a=spaskob Before this change index backfill workers would write their updated spans concurrently to the job's row which can potentially lead to blow up in the jobs table. This PR is a first step in improving this. Instead of updating the finished spans in each worker, we propagate this info to the gateway node and do bulk updates there. Next step is to let the workers run longer by not canceling them at 2 mins which is currently done for historical reasons. Fixes #36601. Release note: None Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Build succeeded |
Before this change index backfill workers would write their updated
spans concurrently to the job's row which can potentially lead to blow
up in the jobs table. This PR is a first step in improving this. Instead
of updating the finished spans in each worker, we propagate this info to
the gateway node and do bulk updates there. Next step is to let the
workers run longer by not canceling them at 2 mins which is currently
done for historical reasons.
Fixes #36601.
Release note: None