Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Azure Data Explorer Exporter - Missing Exception Fields from OTelLogs and OTelTraces #26496

Closed
akakarikos opened this issue Sep 7, 2023 · 12 comments
Labels
bug Something isn't working exporter/azuredataexplorer

Comments

@akakarikos
Copy link

Component(s)

exporter/azuredataexplorer

What happened?

Description

When there is an unhandled exception in the application and captured by an instrumentation library, the exception details such as exception message and exception stack trace do not populate in the Azure Data Explorer tables logs and traces.

To give you an example:

  1. See below a comparison between a trace record from the OTel Collector console and the ADX Traces table. You will notice that the Status Message field is missing from the ADX trace.

OTel Collector Console

ScopeSpans #1
ScopeSpans SchemaURL:
InstrumentationScope OpenTelemetry.Instrumentation.AspNetCore 1.0.0.0
Span #0
    Trace ID       : 96eb74d68c214c468798fd3462bc46aa
    Parent ID      : 102608693b89e619
    ID             : 20af91f77a0fb7a5
    Name           : endpoint_name
    Kind           : Server
    Start time     : 2023-09-06 16:14:17.0062245 +0000 UTC
    End time       : 2023-09-06 16:14:18.9010324 +0000 UTC
    Status code    : Error
    Status message : error message displayed here
Attributes:
     -> net.host.name: Str(127.0.0.1)
     -> net.host.port: Int(8080)
     -> http.method: Str(POST)
     -> http.scheme: Str(http)
     -> http.target: Str(/endpoint_name)
     -> http.url: Str(http://127.0.0.1:8080/endpoint_name)
     -> http.flavor: Str(1.1)
     -> http.user_agent: Str(dapr-sdk-dotnet/v1.11.0+787cba6f4f4d28bd8ec09fcef9c854ef64d4361b)
     -> http.route: Str(endpoint_name)
     -> http.status_code: Int(500)
     -> ThreadId: Str(17)
     -> ReferenceId: Str(9a74cefd-62ab-4a4f-8572-b53090df083f)
     -> TransactionId: Str()

ADX Traces Table

"TraceID": 96eb74d68c214c468798fd3462bc46aa,
"SpanID": 20af91f77a0fb7a5,
"ParentID": 102608693b89e619,
"SpanName": endpoint_name,
"SpanStatus": STATUS_CODE_ERROR,
"SpanKind": SPAN_KIND_SERVER,
"StartTime": 2023-09-06T16:14:17.0062245Z,
"EndTime": 2023-09-06T16:14:18.9010324Z,
"ResourceAttributes": {
	"service.name": "service_name",
	"service.version": "9.0.0.0",
	"service.instance.id": "4978bae1-a67c-4b76-9a60-80af647cc2c4",
	"telemetry.sdk.name": "opentelemetry",
	"telemetry.sdk.language": "dotnet",
	"telemetry.sdk.version": "1.6.0-rc.1"
},
"TraceAttributes": {
	"http.url": "[http://127.0.0.1:8080/endpoint_name"](http://127.0.0.1:8080/endpoint_name%22),
	"http.target": "endpoint_name",
	"net.host.port": 8080,
	"ThreadId": "17",
	"http.user_agent": "dapr-sdk-dotnet/v1.11.0+787cba6f4f4d28bd8ec09fcef9c854ef64d4361b",
	"scope.name": "OpenTelemetry.Instrumentation.AspNetCore",
	"http.route": "endpoint_name",
	"ReferenceId": "9a74cefd-62ab-4a4f-8572-b53090df083f",
	"http.scheme": "http",
	"http.flavor": "1.1",
	"scope.version": "1.0.0.0",
	"TransactionId": "",
	"http.status_code": 500,
	"net.host.name": "127.0.0.1",
	"http.method": "POST"
},
"Events": [],
"Links": []
  1. See below a comparison between a log record from the Application Console and the ADX Logs table. You will notice that the LogRecord.Exception field is missing from the ADX log record, so we cannot retrieve the exact exception details and stack trace.

Application Console

LogRecord.Timestamp:               2023-09-06T16:14:18.8893862Z
LogRecord.TraceId:                 96eb74d68c214c468798fd3462bc46aa
LogRecord.SpanId:                  20af91f77a0fb7a5
LogRecord.TraceFlags:              Recorded
LogRecord.CategoryName:            Microsoft.AspNetCore.Server.Kestrel
LogRecord.Severity:                Error
LogRecord.SeverityText:            Error
LogRecord.Body:                    exception message displayed here
LogRecord.Attributes (Key:Value):
    ConnectionId: 0HMTEUH40C96J
    TraceIdentifier: 0HMTEUH40C96J:00000004
    OriginalFormat exception message displayed here
    ThreadId: 17
    ReferenceId: 9a74cefd-62ab-4a4f-8572-b53090df083f
    TransactionId:
LogRecord.EventId:                 13
LogRecord.EventName:               ApplicationError
LogRecord.Exception:               specific exception message and stacktrace displayed here
LogRecord.ScopeValues (Key:Value):
[Scope.0]:SpanId: 20af91f77a0fb7a5
[Scope.0]:TraceId: 96eb74d68c214c468798fd3462bc46aa
[Scope.0]:ParentId: 102608693b89e619
[Scope.1]:ConnectionId: 0HMTEUH40C96J
[Scope.2]:RequestId: 0HMTEUH40C96J:00000004
[Scope.2]:RequestPath: endpoint_name

ADX Logs Table

"Timestamp": 2023-09-06T16:14:18.8893862Z,
"ObservedTimestamp": 2023-09-06T16:14:18.8893862Z,
"TraceID": 96eb74d68c214c468798fd3462bc46aa,
"SpanID": 20af91f77a0fb7a5,
"SeverityText": Error,
"SeverityNumber": 17,
"Body": exception message displayed here,
"ResourceAttributes": {
	"telemetry.sdk.name": "opentelemetry",
	"telemetry.sdk.language": "dotnet",
	"telemetry.sdk.version": "1.6.0-rc.1",
	"service.name": "service_name",
	"service.version": "9.0.0.0",
	"service.instance.id": "c91fc583-cbcf-4845-aac4-fe5eb9024d77"
},
"LogsAttributes": {
	"SpanId": "20af91f77a0fb7a5",
	"ParentId": "102608693b89e619",
	"RequestId": "0HMTEUH40C96J:00000004",
	"ReferenceId": "9a74cefd-62ab-4a4f-8572-b53090df083f",
	"ThreadId": 17,
	"TraceId": "96eb74d68c214c468798fd3462bc46aa",
	"RequestPath": "",
	"ConnectionId": "0HMTEUH40C96J",
	"TransactionId": "",
	"TraceIdentifier": "0HMTEUH40C96J:00000004"
}

Steps to Reproduce

Force an application to throw an unhandled exception that will be caught by the OTel collector and exported to the Azure Data Explorer.

Expected Result

The exception details such as exception message, exception stack trace, and exception source are successfully exported in the OTelLogs and OTelTraces ADX tables.

Actual Result

The exception details such as exception message, stack trace, and source are not included in the OTelLogs and OTelTraces ADX table records.

Collector version

0.83.0

Environment information

Environment

Docker image deployed in AKS.
OS: linux amd64

OpenTelemetry Collector configuration

apiVersion: v1
kind: ConfigMap
metadata:
  name: otel-collector-config
data:
  config.yaml: |-
    receivers:
      zipkin:
        endpoint: localhost:9411
      otlp:
        protocols:
          grpc:
            endpoint: localhost:4317
          http:
            endpoint: localhost:4318
            
    processors:
      memory_limiter:
        check_interval: 1s
        limit_percentage: 80
        spike_limit_percentage: 30

      batch:
        send_batch_size: 50
    
    exporters:
      logging:
        verbosity: detailed
    
      azuremonitor:
        instrumentation_key: 
        spaneventsenabled: true

      azuredataexplorer:
        cluster_uri: 
        application_id: 
        application_key: 
        tenant_id:
        db_name: 
        metrics_table_name: 
        logs_table_name: 
        traces_table_name: 
        ingestion_type : 'managed'
    
    service:
      pipelines:
        traces:
          receivers: [zipkin,otlp]
          processors: [memory_limiter,batch]
          exporters: [azuredataexplorer,azuremonitor,logging]
        logs:
          receivers: [otlp]
          processors: [memory_limiter,batch]
          exporters: [azuredataexplorer,azuremonitor,logging]

Log output

No response

Additional context

No response

@akakarikos akakarikos added bug Something isn't working needs triage New item requiring triage labels Sep 7, 2023
@github-actions
Copy link
Contributor

github-actions bot commented Sep 7, 2023

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@ag-ramachandran
Copy link
Contributor

Hello @akakarikos, Thanks for reporting. Will have a look and provide details about the missing attributes.

@cijothomas
Copy link
Member

If you are missing Exception, then its likely due to not being exported in the OTLP Exporter itself:
https://github.com/open-telemetry/opentelemetry-dotnet/blob/main/src/OpenTelemetry.Exporter.OpenTelemetryProtocol/CHANGELOG.md#160-rc1

You can verify that by using "OTel Collector Console" output, which would also be missing the Exception from Logs.

@akakarikos
Copy link
Author

Thanks a lot for the info @cijothomas that makes sense! You are right, the LogRecord.Exception is missing from the OTel collector console.

However, the LogRecord.Exception is there at the application console. I don't know if this related or not.

Also there is the issue with the traces missing the Status Message.

Apologies in advance if I am missing something.

@cijothomas
Copy link
Member

Thanks a lot for the info @cijothomas that makes sense! You are right, the LogRecord.Exception is missing from the OTel collector console.

However, the LogRecord.Exception is there at the application console. I don't know if this related or not.

The ConsoleExporter still exports Exception, but OTLP Exporter stopped doing it for the 1.6 stable. (Hopefully it'll be back in the 1.7.*)

@cijothomas
Copy link
Member

Also there is the issue with the traces missing the Status Message.

That looks like a bug in the ADX exporter, it does not seem to do anything with StatusMessage. Will let the owners triage that.

@ag-ramachandran
Copy link
Contributor

ag-ramachandran commented Sep 8, 2023

Hello @akakarikos , Here is an update

We're in the process of adding Status Code and Status Message attributes. The PR is here, will perform some tests track this for closure

https://github.com/ag-ramachandran/opentelemetry-collector-contrib/tree/bugfix/26496105-StatusMessage

@cijothomas
Copy link
Member

@ag-ramachandran Thanks. The span's statusmessage is a single string message, not attributes.
https://github.com/open-telemetry/opentelemetry-proto/blob/main/opentelemetry/proto/trace/v1/trace.proto#L266

@atoulme atoulme removed the needs triage New item requiring triage label Sep 9, 2023
@ag-ramachandran
Copy link
Contributor

Hello @cijothomas, You're right. Unfortunately we already had SpanStatus (which housed the StatusCode). Ideally this field should house the Code and Message as a KV or a JSON (in Kusto we could have a dynamic field, like the one we have for Link or Event attributes)
The challenge is the following, if we change this as SpanStatus and existing customers use this, there is a likely chance the field will have incompatibility (change in Data type from string to dynamic/json)

We could say, we can add a field called SpanStatusMessage and alter that as a string in the table. The only thought we had has, in case there is some new attribute added the table could go wider again, where as if was a KV/JSON we could just add an attribute in

While some of it is Kusto/ADX specific, this was really the rationale. What are your thoughts :)

cc: @akakarikos , @asaharn thoughts

BR,Ram

@cijothomas
Copy link
Member

We could say, we can add a field called SpanStatusMessage and alter that as a string in the table. The only thought we had has, in case there is some new attribute added the table could go wider again

I'd recommend this option. OpenTelemetry Span hasn't added new top-level fields to Span in years, so this should not be a concern.

OR you can chose to make a breaking change and fix it. Agree it'll break users, but I am not sure if this component is declared Stable, so it's generally okay to make breaking changes.

jpkrohling pushed a commit that referenced this issue Sep 20, 2023
…es (#26682)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Added an optional column in the exported trace data to store the status
code and message as a dynamic field.


**Link to tracking Issue:** <Issue number if applicable>

[#26496](#26496)

**Testing:** <Describe what testing was performed and which tests were
added.>
Performed E2E ingestion tests and added Test Cases for new fields.

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Ramachandran A G <ramacg@microsoft.com>
Co-authored-by: Ramachandran A G <106139410+ag-ramachandran@users.noreply.github.com>
@ag-ramachandran
Copy link
Contributor

Hello @akakarikos , Is this something you can test and close

@akakarikos
Copy link
Author

Hello @ag-ramachandran and thanks for all the remediate actions! Looks good for now, If anything comes up, I will reopen the case.

jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this issue Nov 12, 2023
…es (open-telemetry#26682)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
Added an optional column in the exported trace data to store the status
code and message as a dynamic field.


**Link to tracking Issue:** <Issue number if applicable>

[open-telemetry#26496](open-telemetry#26496)

**Testing:** <Describe what testing was performed and which tests were
added.>
Performed E2E ingestion tests and added Test Cases for new fields.

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: Ramachandran A G <ramacg@microsoft.com>
Co-authored-by: Ramachandran A G <106139410+ag-ramachandran@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working exporter/azuredataexplorer
Projects
None yet
Development

No branches or pull requests

4 participants