Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[processor/routing] Add error_mode configuration option #19147

Merged
merged 7 commits into from
Mar 13, 2023

Conversation

TylerHelmuth
Copy link
Member

Description:
Adds new error_mode configuration option that allows routing processor users to specify how errors from OTTL statements should be handled.

@kovrus the routingprocessor's use of OTTL is unique in Contrib and therefore doesn't fall nicely into the pattern set for filterprocessor and transformprocessor which is to use Statements. It does not allow defining statements by signal type, but also the list of functions is very small and there are no signal-specific function.

I'll leave it up to you if ErrorMode is an option your want to expose as a configuration as proposed in this PR, if you'd like to ignore OTTL errors, or if you would like to continue handling errors from statements as you do today, which is to return them up the pipeline (this results in the payload being dropped from the collector). Be aware that due to ErrorMode, functions are going to be made "stricter", and will intentionally return more errors, moving the concept of "open erroring" to the Statements struct, or in the case of this processor, the processor itself.

Link to tracking Issue:

Related to #16519

Documentation:

Updated docs

@TylerHelmuth TylerHelmuth requested a review from a team February 28, 2023 22:59
@github-actions github-actions bot added the processor/routing Routing processor label Feb 28, 2023
@runforesight
Copy link

runforesight bot commented Feb 28, 2023

Foresight Summary

    
Major Impacts

build-and-test-windows duration(5 seconds) has decreased 42 minutes 43 seconds compared to main branch avg(42 minutes 48 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 5 seconds (42 minutes 43 seconds less than main branch avg.) and finished at 13th Mar, 2023.


Job Failed Steps Tests
windows-unittest -     🔗  N/A See Details
windows-unittest-matrix -     🔗  N/A See Details

✅  check-links workflow has finished in 1 minute 24 seconds and finished at 13th Mar, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 1 minute 14 seconds (52 seconds less than main branch avg.) and finished at 13th Mar, 2023.


Job Failed Steps Tests
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
build-dev -     🔗  N/A See Details

✅  changelog workflow has finished in 2 minutes 24 seconds and finished at 13th Mar, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  build-and-test workflow has finished in 33 minutes 15 seconds (31 minutes 50 seconds less than main branch avg.) and finished at 13th Mar, 2023.


Job Failed Steps Tests
unittest-matrix (1.20, connector) -     🔗  ✅ 113  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, connector) -     🔗  ✅ 113  ❌ 0  ⏭ 0    🔗 See Details
correctness-metrics -     🔗  ✅ 2  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, internal) -     🔗  ✅ 583  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, extension) -     🔗  ✅ 538  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, extension) -     🔗  ✅ 538  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, processor) -     🔗  ✅ 1545  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, processor) -     🔗  ✅ 1545  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, internal) -     🔗  ✅ 583  ❌ 0  ⏭ 0    🔗 See Details
correctness-traces -     🔗  ✅ 17  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-0) -     🔗  ✅ 2607  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, receiver-0) -     🔗  ✅ 2607  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, exporter) -     🔗  ✅ 2488  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, receiver-1) -     🔗  ✅ 1940  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, exporter) -     🔗  ✅ 2488  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, receiver-1) -     🔗  ✅ 1940  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.19, other) -     🔗  ✅ 4728  ❌ 0  ⏭ 0    🔗 See Details
unittest-matrix (1.20, other) -     🔗  ✅ 4728  ❌ 0  ⏭ 0    🔗 See Details
integration-tests -     🔗  ✅ 55  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
checks -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
lint -     🔗  N/A See Details
unittest (1.20) -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 3 minutes 28 seconds (4 minutes 5 seconds less than main branch avg.) and finished at 13th Mar, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  ✅ 21  ❌ 0  ⏭ 0    🔗 See Details

✅  load-tests workflow has finished in 6 minutes 54 seconds (6 minutes 25 seconds less than main branch avg.) and finished at 13th Mar, 2023.


Job Failed Steps Tests
loadtest (TestTraceAttributesProcessor) -     🔗  ✅ 3  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestIdleMode) -     🔗  ✅ 1  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  ✅ 6  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  ✅ 8  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  ✅ 12  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  ✅ 10  ❌ 0  ⏭ 0    🔗 See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  ✅ 18  ❌ 0  ⏭ 0    🔗 See Details
setup-environment -     🔗  N/A See Details

✅  e2e-tests workflow has finished in 14 minutes 36 seconds and finished at 13th Mar, 2023.


Job Failed Steps Tests
kubernetes-test (v1.26.0) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

@@ -105,7 +106,7 @@ func (p *tracesProcessor) route(ctx context.Context, t ptrace.Traces) error {
matchCount := len(p.router.routes)
for key, route := range p.router.routes {
_, isMatch, err := route.statement.Execute(ctx, stx)
if err != nil {
if err != nil && p.config.ErrorMode == ottl.PropagateError {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kovrus note, for each signal, if the error happens when executing the statement's function, isMatch will be true and err will be non-nil. If using ignore, p.group(key, groups, route.exporters, rspans) will be called, but whatever the goal with the statement's function was will not have happened, since the function errored. Is that acceptable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assumed that isMatch is false when the error happens. I think we need to adjust the routing logic then. So now we send resource metrics to the default exporter when there is no match across all route definitions or when there is at least one failure under the condition that error mode is ignore. Something like that should do the job, we can refine this logic later:

		matchCount := len(p.router.routes)
		for key, route := range p.router.routes {
			_, isMatch, err := route.statement.Execute(ctx, stx)
			if err != nil && p.config.ErrorMode == ottl.PropagateError {
				return err
			}

			if err != nil && p.config.ErrorMode == ottl.IgnoreError {
				p.group("", groups, p.router.defaultExporters, rspans)
				continue
			}

			if !isMatch {
				matchCount--
				continue
			}
			p.group(key, groups, route.exporters, rspans)
		}

		if matchCount == 0 {
			// no route conditions are matched, add resource spans to default exporters group
			p.group("", groups, p.router.defaultExporters, rspans)
		}

cc: @jpkrohling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsMatch always reflects the output of the condition; if the condition errored then it is false, but if the condition passed it will be true, even if the Invocation errors.

I will update the functions to match your desired logic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for adding that.

Copy link
Member

@kovrus kovrus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@TylerHelmuth TylerHelmuth merged commit c41e487 into open-telemetry:main Mar 13, 2023
@TylerHelmuth TylerHelmuth deleted the rp-error-mode branch March 13, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
processor/routing Routing processor
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants