batcheval: ExportRequest does not return to the client when over the ElasticCPU limit if TargetBytes is unset #96684

adityamaru · 2023-02-06T20:51:41Z

Recently, we added an elastic CPU limiter to mvccExportToWriter. This limiter checks if an ExportRequest is over its allotted CPU tokens, and if it is, returns from the method with a resume key. This return from mvccExportToWriter is expected to result in the return of an ExportResponse (along with the resume key) to the client. Signaling the RPC to return is a way of allowing the scheduler to take the goroutine off the CPU, thereby allowing other processes waiting on the CPU to be admitted.

Today, this return to the client is conditional on us breaking out from this loop

cockroach/pkg/kv/kvserver/batcheval/cmd_export.go

Lines 172 to 307 in 260f4fa

    
           for start := args.Key; start != nil; { 
        
           	destFile := &storage.MemFile{} 
        
           	opts := storage.MVCCExportOptions{ 
        
           		StartKey:           storage.MVCCKey{Key: start, Timestamp: resumeKeyTS}, 
        
           		EndKey:             args.EndKey, 
        
           		StartTS:            args.StartTime, 
        
           		EndTS:              h.Timestamp, 
        
           		ExportAllRevisions: exportAllRevisions, 
        
           		TargetSize:         targetSize, 
        
           		MaxSize:            maxSize, 
        
           		MaxIntents:         maxIntents, 
        
           		StopMidKey:         args.SplitMidKey, 
        
           	} 
        
           	var summary roachpb.BulkOpSummary 
        
           	var resume storage.MVCCKey 
        
           	var fingerprint uint64 
        
           	var err error 
        
           	if args.ExportFingerprint { 
        
           		// Default to stripping the tenant prefix from keys, and checksum from 
        
           		// values before fingerprinting so that the fingerprint is tenant 
        
           		// agnostic. 
        
           		opts.FingerprintOptions = storage.MVCCExportFingerprintOptions{ 
        
           			StripTenantPrefix:  true, 
        
           			StripValueChecksum: true, 
        
           		} 
        
           		var hasRangeKeys bool 
        
           		summary, resume, fingerprint, hasRangeKeys, err = storage.MVCCExportFingerprint(ctx, 
        
           			cArgs.EvalCtx.ClusterSettings(), reader, opts, destFile) 
        
           		if err != nil { 
        
           			return result.Result{}, maybeAnnotateExceedMaxSizeError(err) 
        
           		} 
        
           		// If no range keys were encountered during fingerprinting then we zero 
        
           		// out the underlying SST file as there is no use in sending an empty file 
        
           		// part of the ExportResponse. This frees up the memory used by the empty 
        
           		// SST file. 
        
           		if !hasRangeKeys { 
        
           			destFile = &storage.MemFile{} 
        
           		} 
        
           	} else { 
        
           		summary, resume, err = storage.MVCCExportToSST(ctx, cArgs.EvalCtx.ClusterSettings(), reader, 
        
           			opts, destFile) 
        
           		if err != nil { 
        
           			return result.Result{}, maybeAnnotateExceedMaxSizeError(err) 
        
           		} 
        
           	} 
        
           	data := destFile.Data() 
        
           	// NB: This should only happen in two cases: 
        
           	// 
        
           	// 1. There was nothing to export for this span. 
        
           	// 
        
           	// 2. We hit a resource constraint that led to an 
        
           	//    early exit and thus have a resume key despite 
        
           	//    not having data. 
        
           	if summary.DataSize == 0 { 
        
           		if resume.Key != nil { 
        
           			start = resume.Key 
        
           			resumeKeyTS = resume.Timestamp 
        
           			continue 
        
           		} else { 
        
           			break 
        
           		} 
        
           	} 
        
           	span := roachpb.Span{Key: start} 
        
           	if resume.Key != nil { 
        
           		span.EndKey = resume.Key 
        
           	} else { 
        
           		span.EndKey = args.EndKey 
        
           	} 
        
           	var exported roachpb.ExportResponse_File 
        
           	if args.ExportFingerprint { 
        
           		// A fingerprinting ExportRequest does not need to return the 
        
           		// BulkOpSummary or the exported Span. This is because we do not expect 
        
           		// the sender of a fingerprint ExportRequest to use anything but the 
        
           		// `Fingerprint` for point-keys and the SST file that contains the 
        
           		// rangekeys we encountered during ExportRequest evaluation. 
        
           		exported = roachpb.ExportResponse_File{ 
        
           			EndKeyTS:    resume.Timestamp, 
        
           			SST:         data, 
        
           			Fingerprint: fingerprint, 
        
           		} 
        
           	} else { 
        
           		exported = roachpb.ExportResponse_File{ 
        
           			Span:     span, 
        
           			EndKeyTS: resume.Timestamp, 
        
           			Exported: summary, 
        
           			SST:      data, 
        
           		} 
        
           	} 
        
           	reply.Files = append(reply.Files, exported) 
        
           	start = resume.Key 
        
           	resumeKeyTS = resume.Timestamp 
        
           	if h.TargetBytes > 0 { 
        
           		curSizeOfExportedSSTs += summary.DataSize 
        
           		// There could be a situation where the size of exported SSTs is larger 
        
           		// than the TargetBytes. In such a scenario, we want to report back 
        
           		// TargetBytes as the size of the processed SSTs otherwise the DistSender 
        
           		// will error out with an "exceeded limit". In every other case we want to 
        
           		// report back the actual size so that the DistSender can shrink the limit 
        
           		// for subsequent range requests. 
        
           		// This is semantically OK for two reasons: 
        
           		// 
        
           		// - DistSender does not parallelize requests with TargetBytes > 0. 
        
           		// 
        
           		// - DistSender uses NumBytes to shrink the limit for subsequent requests. 
        
           		// By returning TargetBytes, no more requests will be processed (and there 
        
           		// are no parallel running requests) which is what we expect. 
        
           		// 
        
           		// The ResumeSpan is what is used as the source of truth by the caller 
        
           		// issuing the request, and that contains accurate information about what 
        
           		// is left to be exported. 
        
           		targetSize := h.TargetBytes 
        
           		if curSizeOfExportedSSTs < targetSize { 
        
           			targetSize = curSizeOfExportedSSTs 
        
           		} 
        
           		reply.NumBytes = targetSize 
        
           		// NB: This condition means that we will allow another SST to be created 
        
           		// even if we have less room in our TargetBytes than the target size of 
        
           		// the next SST. In the worst case this could lead to us exceeding our 
        
           		// TargetBytes by SST target size + overage. 
        
           		if reply.NumBytes == h.TargetBytes { 
        
           			if resume.Key != nil { 
        
           				reply.ResumeSpan = &roachpb.Span{ 
        
           					Key:    resume.Key, 
        
           					EndKey: args.EndKey, 
        
           				} 
        
           				reply.ResumeReason = roachpb.RESUME_BYTE_LIMIT 
        
           			} 
        
           			break 
        
           		} 
        
           	} 
        
           }

. The only break in the loop is however inside this conditional

cockroach/pkg/kv/kvserver/batcheval/cmd_export.go

Line 268 in 260f4fa

if h.TargetBytes > 0 {

that checks if the request was sent with a TargetBytes. ExportRequests are not always sent with TargetBytes set and so even if the elastic limiter indicates we are over the CPU limit, we would not break from this loop and immediately retry the same export. This would likely lead to excessive thrashing since the scheduler is not being given a chance to offload the goroutine before retrying. There is no correlation between TargetBytes being set and paginating because of a CPU limit and so we should not rely on the break inside the conditional.

Jira issue: CRDB-24277

The text was updated successfully, but these errors were encountered:

blathers-crl · 2023-02-06T20:51:45Z

cc @cockroachdb/disaster-recovery

Previously, there was a strange coupling between the elastic CPU limiter and the `header.TargetBytes` DistSender limit set on each ExportRequest. Even if a request was preempted on exhausting its allotted CPU tokens, it would only return from kvserver by virtue of its `header.TargetBytes` being set to a non-zero value. Out of the four users of ExportRequest, only backup set this field to a sentinel value of 1 to limit the number of SSTs we send back in an ExportResponse. The remaining callers of ExportRequest would not return from the kvserver. Instead they would evaluate the request from the resume key immediately, not giving the scheduler a chance to take the goroutine off CPU. This change breaks this coupling by introducing a `resumeInfo` object that indicates whether the resumption was because we were over our CPU limit. If it was, we return an ExportResponse with our progress so far. This change shifts the burden of handling pagination to the client. This seems better than having the server sleep or wait around until its CPU tokens are replenished as the client would be left wondering why a request is taking so long. To that effect this change adds pagination support to the other callers of ExportRequest. Note, we do not set `SplitMidKey` at these other callsites yet. Thus, all pagination will happen at key boundaries in the ExportRequest. A follow-up will add support for `SplitMidKey` to these callers. Informs: cockroachdb#96684 Release note: None

96691: *: enables elastic CPU limiter for all users of ExportRequest r=stevendanna a=adityamaru Previously, there was a strange coupling between the elastic CPU limiter and the `header.TargetBytes` DistSender limit set on each ExportRequest. Even if a request was preempted on exhausting its allotted CPU tokens, it would only return from kvserver by virtue of its `header.TargetBytes` being set to a non-zero value. Out of the four users of ExportRequest, only backup set this field to a sentinel value of 1 to limit the number of SSTs we send back in an ExportResponse. The remaining callers of ExportRequest would not return from the kvserver. Instead they would evaluate the request from the resume key immediately, not giving the scheduler a chance to take the goroutine off CPU. This change breaks this coupling by introducing a `resumeInfo` object that indicates whether the resumption was because we were over our CPU limit. If it was, we return an ExportResponse with our progress so far. This change shifts the burden of handling pagination to the client. This seems better than having the server sleep or wait around until its CPU tokens are replenished as the client would be left wondering why a request is taking so long. To that effect this change adds pagination support to the other callers of ExportRequest. Note, we do not set `SplitMidKey` at these other callsites yet. Thus, all pagination will happen at key boundaries in the ExportRequest. A follow-up will add support for `SplitMidKey` to these callers. Informs: #96684 Release note: None Co-authored-by: adityamaru <adityamaru@gmail.com>

adityamaru added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-disaster-recovery labels Feb 6, 2023

blathers-crl bot added the T-disaster-recovery label Feb 6, 2023

adityamaru changed the title ~~batcheval: ExportRequest does not always return to the client when over the ElasticCPU limit~~ batcheval: ExportRequest does not return to the client when over the ElasticCPU limit if TargetBytes is unset Feb 6, 2023

adityamaru self-assigned this Feb 9, 2023

adityamaru mentioned this issue Feb 9, 2023

*: enables elastic CPU limiter for all users of ExportRequest #96691

Merged

exalate-issue-sync bot closed this as completed Mar 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batcheval: ExportRequest does not return to the client when over the ElasticCPU limit if TargetBytes is unset #96684

batcheval: ExportRequest does not return to the client when over the ElasticCPU limit if TargetBytes is unset #96684

adityamaru commented Feb 6, 2023 •

edited by cockroach-jira-scripts

Loading

blathers-crl bot commented Feb 6, 2023

batcheval: ExportRequest does not return to the client when over the ElasticCPU limit if TargetBytes is unset #96684

batcheval: ExportRequest does not return to the client when over the ElasticCPU limit if TargetBytes is unset #96684

Comments

adityamaru commented Feb 6, 2023 • edited by cockroach-jira-scripts Loading

blathers-crl bot commented Feb 6, 2023

adityamaru commented Feb 6, 2023 •

edited by cockroach-jira-scripts

Loading