Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/awsxray] X-Ray Receiver appears to be dropping or rejecting segments from X-Ray #36128

Open
Mjb141 opened this issue Nov 1, 2024 · 2 comments
Labels
bug Something isn't working needs triage New item requiring triage receiver/awsxray

Comments

@Mjb141
Copy link

Mjb141 commented Nov 1, 2024

Component(s)

No response

What happened?

Description

We are forwarding X-Ray trace data (from batch-get-traces) to the awsxrayreceiver but after extensive debugging we've come to the conclusion that it is either not receiving all spans, or is rejecting some segments somehow. We get missing root spans in multiple services (tested Grafana, Honeycomb), and have verified through debug/file exporters that the OTel collector otlphttp exporter is not receiving many spans.

The fileexporter returns:

{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "testdeploy"
            }
          },
          {
            "key": "cloud.provider",
            "value": {
              "stringValue": "aws"
            }
          },
          {
            "key": "container.name",
            "value": {
              "stringValue": "testdeploy"
            }
          },
          {
            "key": "container.id",
            "value": {
              "stringValue": "3116351ee7534c67bd3a502e59d44709-2627975326"
            }
          },
          {
            "key": "telemetry.sdk.version",
            "value": {
              "stringValue": "1.34.1"
            }
          },
          {
            "key": "telemetry.sdk.name",
            "value": {
              "stringValue": "opentelemetry for java"
            }
          },
          {
            "key": "telemetry.sdk.language",
            "value": {
              "stringValue": "java"
            }
          },
          {
            "key": "service.version",
            "value": {
              "stringValue": "9c98685a"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {},
          "spans": [
            {
              "traceId": "6723911c7a0f665bb85239302a49d134",
              "spanId": "fdc038fcd2cc8f45",
              "parentSpanId": "b93f1e256a7a9226",
              "name": "testdeploy",
              "kind": 1,
              "startTimeUnixNano": "1730384156650004736",
              "endTimeUnixNano": "1730384156657166592",
              "attributes": [
                {
                  "key": "http.method",
                  "value": {
                    "stringValue": "GET"
                  }
                },
                {
                  "key": "http.user_agent",
                  "value": {
                    "stringValue": "universal (client2)"
                  }
                },
                {
                  "key": "http.url",
                  "value": {
                    "stringValue": "<http://testdeploy/version>"
                  }
                },
                {
                  "key": "http.status_code",
                  "value": {
                    "intValue": "200"
                  }
                },
                {
                  "key": "http.response_content_length",
                  "value": {
                    "intValue": "0"
                  }
                },
                {
                  "key": "aws.xray.metadata.default",
                  "value": {
                    "stringValue": "{\"http.route\":\"/version\",\"net.protocol.name\":\"http\",\"net.protocol.version\":\"1.1\",\"net.sock.host.addr\":\"127.0.0.1\",\"net.sock.host.port\":80,\"net.sock.peer.addr\":\"127.0.0.1\",\"net.sock.peer.port\":54818,\"otel.resource.aws.ecs.container.arn\":\"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\",\"otel.resource.aws.ecs.container.image.id\":\"sha256:71f45761957420a821428f526c698c3963cb380e2c4b963d263e6b2347b515e9\",\"otel.resource.aws.ecs.launchtype\":\"fargate\",\"otel.resource.aws.ecs.task.arn\":\"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\",\"otel.resource.aws.ecs.task.family\":\"testdeploy\",\"otel.resource.aws.ecs.task.revision\":\"110\",\"otel.resource.cloud.platform\":\"aws_ecs\",\"otel.resource.cloud.provider\":\"aws\",\"otel.resource.container.id\":\"3116351ee7534c67bd3a502e59d44709-2627975326\",\"otel.resource.container.image.name\":\"123456789012.dkr.ecr.eu-central-1.amazonaws.com/services/laas/testdeploy\",\"otel.resource.container.image.tag\":\"9c98685a\",\"otel.resource.container.name\":\"testdeploy\",\"otel.resource.host.arch\":\"aarch64\",\"otel.resource.host.name\":\"ip-10-123-28-140.eu-central-1.compute.internal\",\"otel.resource.os.description\":\"Linux 5.10.226-214.880.amzn2.aarch64\",\"otel.resource.os.type\":\"linux\",\"otel.resource.process.command_args\":[\"/usr/lib/jvm/java-21-amazon-corretto.aarch64/bin/java\",\"-jar\",\"--enable-preview\",\"-javaagent:/opt/aws-opentelemetry-agent.jar\",\"-Djavax.net.ssl.trustStore=/maven/cacerts.jks\",\"-Djavax.net.ssl.trustStorePassword=randompassword\",\"-Denvironment=dv\",\"-Xmx512M\",\"/maven/laas-microservice.jar\",\"server\",\"/maven/config/external.yml\"],\"otel.resource.process.executable.path\":\"/usr/lib/jvm/java-21-amazon-corretto.aarch64/bin/java\",\"otel.resource.process.pid\":87,\"otel.resource.process.runtime.description\":\"Amazon.com Inc. OpenJDK 64-Bit Server VM 21.0.4+7-LTS\",\"otel.resource.process.runtime.name\":\"OpenJDK Runtime Environment\",\"otel.resource.process.runtime.version\":\"21.0.4+7-LTS\",\"otel.resource.service.name\":\"testdeploy\",\"otel.resource.telemetry.auto.version\":\"1.32.3-aws\",\"otel.resource.telemetry.sdk.language\":\"java\",\"otel.resource.telemetry.sdk.name\":\"opentelemetry\",\"otel.resource.telemetry.sdk.version\":\"1.34.1\",\"thread.id\":39,\"thread.name\":\"dw-39\"}"
                  }
                }
              ],
              "status": {}
            },
            {
              "traceId": "6723911c7a0f665bb85239302a49d134",
              "spanId": "d89508538345ff04",
              "parentSpanId": "fdc038fcd2cc8f45",
              "name": "VersionResource.getVersion",
              "kind": 1,
              "startTimeUnixNano": "1730384156654128640",
              "endTimeUnixNano": "1730384156654829056",
              "attributes": [
                {
                  "key": "aws.xray.metadata.default",
                  "value": {
                    "stringValue": "{\"code.function\":\"getVersion\",\"code.namespace\":\"io.org.tech.service.util.resources.VersionResource\",\"thread.id\":39,\"thread.name\":\"dw-39\"}"
                  }
                }
              ],
              "status": {}
            }
          ]
        }
      ]
    },
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "event-service"
            }
          },
          {
            "key": "cloud.provider",
            "value": {
              "stringValue": "aws"
            }
          },
          {
            "key": "container.name",
            "value": {
              "stringValue": "event-service"
            }
          },
          {
            "key": "container.id",
            "value": {
              "stringValue": "eebf0aa33fc04319aa6c3d0532e9f3eb-0211033773"
            }
          },
          {
            "key": "telemetry.sdk.version",
            "value": {
              "stringValue": "1.34.1"
            }
          },
          {
            "key": "telemetry.sdk.name",
            "value": {
              "stringValue": "opentelemetry for java"
            }
          },
          {
            "key": "telemetry.sdk.language",
            "value": {
              "stringValue": "java"
            }
          },
          {
            "key": "service.version",
            "value": {
              "stringValue": "a40caa5f"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {},
          "spans": [
            {
              "traceId": "6723911c7a0f665bb85239302a49d134",
              "spanId": "12039a9641ff3e8a",
              "parentSpanId": "ce1532207a0210c6",
              "name": "event-service",
              "kind": 1,
              "startTimeUnixNano": "1730384156593957632",
              "endTimeUnixNano": "1730384156598795264",
              "attributes": [
                {
                  "key": "http.method",
                  "value": {
                    "stringValue": "GET"
                  }
                },
                {
                  "key": "http.user_agent",
                  "value": {
                    "stringValue": "universal (client1)"
                  }
                },
                {
                  "key": "http.url",
                  "value": {
                    "stringValue": "<http://event/version>"
                  }
                },
                {
                  "key": "http.status_code",
                  "value": {
                    "intValue": "200"
                  }
                },
                {
                  "key": "http.response_content_length",
                  "value": {
                    "intValue": "0"
                  }
                },
                {
                  "key": "aws.xray.metadata.default",
                  "value": {
                    "stringValue": "{\"http.route\":\"/version\",\"net.protocol.name\":\"http\",\"net.protocol.version\":\"1.1\",\"net.sock.host.addr\":\"127.0.0.1\",\"net.sock.host.port\":80,\"net.sock.peer.addr\":\"127.0.0.1\",\"net.sock.peer.port\":58168,\"otel.resource.aws.ecs.container.arn\":\"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/eebf0aa33fc04319aa6c3d0532e9f3eb/45290e90-05e8-4fdb-9ddc-0c47adb8594c\",\"otel.resource.aws.ecs.container.image.id\":\"sha256:52d3b192a5abc1d60ecd594d0dd37126d9a04c3b6500080a4b00bec9a9be4397\",\"otel.resource.aws.ecs.launchtype\":\"fargate\",\"otel.resource.aws.ecs.task.arn\":\"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/eebf0aa33fc04319aa6c3d0532e9f3eb\",\"otel.resource.aws.ecs.task.family\":\"event-service\",\"otel.resource.aws.ecs.task.revision\":\"24\",\"otel.resource.cloud.platform\":\"aws_ecs\",\"otel.resource.cloud.provider\":\"aws\",\"otel.resource.container.id\":\"eebf0aa33fc04319aa6c3d0532e9f3eb-0211033773\",\"otel.resource.container.image.name\":\"123456789012.dkr.ecr.eu-central-1.amazonaws.com/services/laas/event-service\",\"otel.resource.container.image.tag\":\"a40caa5f\",\"otel.resource.container.name\":\"event-service\",\"otel.resource.host.arch\":\"aarch64\",\"otel.resource.host.name\":\"ip-10-123-30-141.eu-central-1.compute.internal\",\"otel.resource.os.description\":\"Linux 5.10.226-214.880.amzn2.aarch64\",\"otel.resource.os.type\":\"linux\",\"otel.resource.process.command_args\":[\"/usr/lib/jvm/java-21-amazon-corretto.aarch64/bin/java\",\"-jar\",\"--enable-preview\",\"-javaagent:/opt/aws-opentelemetry-agent.jar\",\"-Djavax.net.ssl.trustStore=/maven/cacerts.jks\",\"-Djavax.net.ssl.trustStorePassword=randompassword\",\"-Denvironment=dv\",\"-Xmx512M\",\"/maven/laas-microservice.jar\",\"server\",\"/maven/config/external.yml\"],\"otel.resource.process.executable.path\":\"/usr/lib/jvm/java-21-amazon-corretto.aarch64/bin/java\",\"otel.resource.process.pid\":98,\"otel.resource.process.runtime.description\":\"Amazon.com Inc. OpenJDK 64-Bit Server VM 21.0.4+7-LTS\",\"otel.resource.process.runtime.name\":\"OpenJDK Runtime Environment\",\"otel.resource.process.runtime.version\":\"21.0.4+7-LTS\",\"otel.resource.service.name\":\"event-service\",\"otel.resource.telemetry.auto.version\":\"1.32.3-aws\",\"otel.resource.telemetry.sdk.language\":\"java\",\"otel.resource.telemetry.sdk.name\":\"opentelemetry\",\"otel.resource.telemetry.sdk.version\":\"1.34.1\",\"thread.id\":31,\"thread.name\":\"dw-31\"}"
                  }
                }
              ],
              "status": {}
            },
            {
              "traceId": "6723911c7a0f665bb85239302a49d134",
              "spanId": "03dead531ba7fced",
              "parentSpanId": "12039a9641ff3e8a",
              "name": "VersionResource.getVersion",
              "kind": 1,
              "startTimeUnixNano": "1730384156596189184",
              "endTimeUnixNano": "1730384156596675072",
              "attributes": [
                {
                  "key": "aws.xray.metadata.default",
                  "value": {
                    "stringValue": "{\"code.function\":\"getVersion\",\"code.namespace\":\"se.organisation.microservice.resources.VersionResource\",\"thread.id\":31,\"thread.name\":\"dw-31\"}"
                  }
                }
              ],
              "status": {}
            }
          ]
        }
      ]
    },
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "Events"
            }
          },
          {
            "key": "cloud.provider",
            "value": {
              "stringValue": "aws"
            }
          },
          {
            "key": "container.name",
            "value": {
              "stringValue": "testdeploy"
            }
          },
          {
            "key": "container.id",
            "value": {
              "stringValue": "3116351ee7534c67bd3a502e59d44709-2627975326"
            }
          },
          {
            "key": "telemetry.sdk.version",
            "value": {
              "stringValue": "1.34.1"
            }
          },
          {
            "key": "telemetry.sdk.name",
            "value": {
              "stringValue": "opentelemetry for java"
            }
          },
          {
            "key": "telemetry.sdk.language",
            "value": {
              "stringValue": "java"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {},
          "spans": [
            {
              "traceId": "6723911c7a0f665bb85239302a49d134",
              "spanId": "1cb212333d516ee4",
              "parentSpanId": "ab914a4f53856c92",
              "name": "Events",
              "kind": 1,
              "startTimeUnixNano": "1730384156369367040",
              "endTimeUnixNano": "1730384156550637824",
              "attributes": [
                {
                  "key": "http.method",
                  "value": {
                    "stringValue": "POST"
                  }
                },
                {
                  "key": "http.url",
                  "value": {
                    "stringValue": "<https://events.eu-central-1.amazonaws.com/>"
                  }
                },
                {
                  "key": "http.status_code",
                  "value": {
                    "intValue": "200"
                  }
                },
                {
                  "key": "http.response_content_length",
                  "value": {
                    "intValue": "0"
                  }
                },
                {
                  "key": "aws.operation",
                  "value": {
                    "stringValue": "PutEvents"
                  }
                },
                {
                  "key": "aws.request_id",
                  "value": {
                    "stringValue": "c395d79c-5731-4e66-a307-48a56658f9d9"
                  }
                }
              ],
              "status": {}
            }
          ]
        }
      ]
    },
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "SQS"
            }
          },
          {
            "key": "cloud.provider",
            "value": {
              "stringValue": "aws"
            }
          },
          {
            "key": "container.name",
            "value": {
              "stringValue": "universal"
            }
          },
          {
            "key": "container.id",
            "value": {
              "stringValue": "636687abf5f848ab96a0faf975311f47-3577057898"
            }
          },
          {
            "key": "telemetry.sdk.version",
            "value": {
              "stringValue": "1.34.1"
            }
          },
          {
            "key": "telemetry.sdk.name",
            "value": {
              "stringValue": "opentelemetry for java"
            }
          },
          {
            "key": "telemetry.sdk.language",
            "value": {
              "stringValue": "java"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {},
          "spans": [
            {
              "traceId": "6723911c7a0f665bb85239302a49d134",
              "spanId": "1e7bc16c1b751521",
              "parentSpanId": "d27bc9b3fb38702c",
              "name": "SQS",
              "kind": 1,
              "startTimeUnixNano": "1730384156659136768",
              "endTimeUnixNano": "1730384156665035520",
              "attributes": [
                {
                  "key": "http.method",
                  "value": {
                    "stringValue": "POST"
                  }
                },
                {
                  "key": "http.url",
                  "value": {
                    "stringValue": "<https://sqs.eu-central-1.amazonaws.com/>"
                  }
                },
                {
                  "key": "http.status_code",
                  "value": {
                    "intValue": "200"
                  }
                },
                {
                  "key": "http.response_content_length",
                  "value": {
                    "intValue": "0"
                  }
                },
                {
                  "key": "aws.operation",
                  "value": {
                    "stringValue": "DeleteMessage"
                  }
                },
                {
                  "key": "aws.request_id",
                  "value": {
                    "stringValue": "7201df5c-b4e3-53e9-a10d-80c127bf28e9"
                  }
                },
                {
                  "key": "aws.queue_url",
                  "value": {
                    "stringValue": "<https://sqs.eu-central-1.amazonaws.com/123456789012/org-universal-EventBridgeTestEvent>"
                  }
                }
              ],
              "status": {}
            }
          ]
        }
      ]
    }
  ]
}

Which is missing a number of spans, including the root span (5f40bc419f287db1 which has no parent_id)

We run this with a batch processor, and have tested with no processors.

Our lambda sends xray trace data to the otel collector, has been extensively tested/debugged, and is verifiably sending the root span:

lambda-1  | {
lambda-1  |     "level":"DEBUG",
lambda-1  |     "location":"process_trace_into_messages:89",
lambda-1  |     "message":"Trace: 1-672398f2-319790d89a186aaaa8291f8a | Seg 4/11 (nested): 5f40bc419f287db1",
lambda-1  |     "timestamp":"2024-10-31 14:51:31,738+0000",
lambda-1  |     "service":"x-ray-forwarder"
lambda-1  | }

Note the 4th of 11 spans forwarded contains 5f40bc419f287db1, the root span. This span is not in the trace output above - somewhere in the collector of awsxrayreceiver -> otlphttp it's being lost/dropped/rejected.

And the relevant message being sent:

lambda-1  | {
lambda-1  |     "level":"DEBUG",
lambda-1  |     "location":"send_message:111",
lambda-1  |     "message":"Trace ID: 1-672398f2-319790d89a186aaaa8291f8a | Segment ID: 5f40bc419f287db1 | Segment Index: 4",
lambda-1  |     "timestamp":"2024-10-31 14:51:31,748+0000",
lambda-1  |     "service":"x-ray-forwarder"
lambda-1  | }

The message is, as far as we can tell, a valid trace message. There are no errors/warnings in the collector logs.

Example message:

"{\"format\":\"json\",\"version\":1}\n{\"id\": \"5f40bc419f287db1\", \"name\": \"testdeploy\", \"start_time\": 1730386162.6986518, \"trace_id\": \"1-672398f2-319790d89a186aaaa8291f8a\", \"end_time\": 1730386162.7810762, \"fault\": false, \"error\": false, \"throttle\": false, \"http\": {\"request\": {\"url\": \"<http://internal-servicegroup-services-294977727.eu-central-1.elb.amazonaws.com/testdeploy/send-message\>", \"method\": \"GET\", \"user_agent\": \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:131.0) Gecko/20100101 Firefox/131.0\", \"client_ip\": \"172.26.248.79\", \"x_forwarded_for\": true}, \"response\": {\"status\": 200, \"content_length\": 0}}, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}}, \"metadata\": {\"default\": {\"otel.resource.container.image.tag\": \"9c98685a\", \"net.sock.peer.addr\": \"127.0.0.1\", \"otel.resource.container.image.name\": \"123456789012.dkr.ecr.eu-central-1.amazonaws.com/services/servicegroup/testdeploy\", \"otel.resource.process.command_args\": [\"/usr/lib/jvm/java-21-amazon-corretto.aarch64/bin/java\", \"-jar\", \"--enable-preview\", \"-javaagent:/opt/aws-opentelemetry-agent.jar\", \"-Djavax.net.ssl.trustStore=/maven/cacerts.jks\", \"-Djavax.net.ssl.trustStorePassword=randompassword\", \"-Denvironment=dv\", \"-Xmx512M\", \"/maven/servicegroup-microservice.jar\", \"server\", \"/maven/config/external.yml\"], \"otel.resource.host.arch\": \"aarch64\", \"otel.resource.host.name\": \"ip-10-123-28-140.eu-central-1.compute.internal\", \"otel.resource.aws.ecs.launchtype\": \"fargate\", \"thread.name\": \"dw-39\", \"otel.resource.aws.ecs.container.image.id\": \"sha256:71f45761957420a821428f526c698c3963cb380e2c4b963d263e6b2347b515e9\", \"otel.resource.aws.ecs.task.revision\": \"110\", \"otel.resource.service.name\": \"testdeploy\", \"otel.resource.telemetry.auto.version\": \"1.32.3-aws\", \"net.sock.host.addr\": \"127.0.0.1\", \"otel.resource.process.pid\": 87, \"otel.resource.os.description\": \"Linux 5.10.226-214.880.amzn2.aarch64\", \"net.protocol.name\": \"http\", \"otel.resource.cloud.platform\": \"aws_ecs\", \"otel.resource.os.type\": \"linux\", \"net.sock.peer.port\": 32848, \"thread.id\": 39, \"otel.resource.container.name\": \"testdeploy\", \"otel.resource.aws.ecs.task.arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"otel.resource.telemetry.sdk.name\": \"opentelemetry\", \"otel.resource.process.runtime.description\": \"Amazon.com Inc. OpenJDK 64-Bit Server VM 21.0.4+7-LTS\", \"otel.resource.process.runtime.version\": \"21.0.4+7-LTS\", \"otel.resource.aws.ecs.task.family\": \"testdeploy\", \"otel.resource.process.executable.path\": \"/usr/lib/jvm/java-21-amazon-corretto.aarch64/bin/java\", \"otel.resource.telemetry.sdk.version\": \"1.34.1\", \"otel.resource.container.id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"otel.resource.aws.ecs.container.arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"http.route\": \"/testdeploy/send-message\", \"otel.resource.process.runtime.name\": \"OpenJDK Runtime Environment\", \"otel.resource.telemetry.sdk.language\": \"java\", \"otel.resource.cloud.provider\": \"aws\", \"net.protocol.version\": \"1.1\", \"net.sock.host.port\": 80}}, \"service\": {\"version\": \"9c98685a\"}, \"origin\": \"AWS::ECS::Fargate\", \"subsegments\": [{\"id\": \"522d5b2bd04b20af\", \"name\": \"TestDeployResource.sendMessage\", \"start_time\": 1730386162.7004426, \"end_time\": 1730386162.779862, \"fault\": false, \"error\": false, \"throttle\": false, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}}, \"metadata\": {\"default\": {\"code.namespace\": \"no.organisation.service.testdeploy.resources.TestDeployResource\", \"code.function\": \"sendMessage\", \"thread.name\": \"dw-39 - GET /testdeploy/send-message\", \"thread.id\": 39}}, \"subsegments\": [{\"id\": \"942b30b17da48533\", \"name\": \"Events\", \"start_time\": 1730386162.7133555, \"end_time\": 1730386162.776195, \"fault\": false, \"error\": false, \"throttle\": false, \"http\": {\"request\": {\"url\": \"<https://events.eu-central-1.amazonaws.com/\>", \"method\": \"POST\"}, \"response\": {\"status\": 200, \"content_length\": 0}}, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}, \"operation\": \"PutEvents\", \"request_id\": \"ebb29cd4-35a4-47b6-ba42-85960c673224\"}, \"metadata\": {\"default\": {\"http.response_content_length\": 85, \"rpc.service\": \"EventBridge\", \"aws.agent\": \"java-aws-sdk\", \"rpc.system\": \"aws-api\", \"thread.name\": \"dw-39 - GET /testdeploy/send-message\", \"http.request_content_length\": 349, \"thread.id\": 39}}, \"namespace\": \"aws\", \"subsegments\": [{\"id\": \"a712bc821226ff31\", \"name\": \"SQS\", \"start_time\": 1730386163.0638723, \"end_time\": 1730386166.2725995, \"fault\": false, \"error\": false, \"throttle\": false, \"http\": {\"request\": {\"url\": \"<https://sqs.eu-central-1.amazonaws.com/\>", \"method\": \"POST\"}, \"response\": {\"content_length\": 0}}, \"aws\": {\"ecs\": {\"container\": \"universal\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/ae5ef46bb63343e0852b4c4fd32657a2\", \"task_family\": \"universal\", \"container_id\": \"ae5ef46bb63343e0852b4c4fd32657a2-3577057898\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/ae5ef46bb63343e0852b4c4fd32657a2/a686e87d-d973-4260-94d8-176df43a8777\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}, \"operation\": \"ReceiveMessage\"}, \"metadata\": {\"default\": {\"messaging.system\": \"AmazonSQS\", \"rpc.service\": \"Sqs\", \"aws.agent\": \"java-aws-sdk\", \"messaging.message.id\": \"73df1316-db66-4fb7-98ab-1ca20ef03582\", \"rpc.system\": \"aws-api\", \"thread.name\": \"sqs-poller-org-universal-EventBridgeTestEvent-0\", \"http.request_content_length\": 233, \"thread.id\": 45, \"messaging.destination.name\": \"org-universal-EventBridgeTestEvent\", \"messaging.operation\": \"process\"}}, \"namespace\": \"aws\"}]}, {\"id\": \"b7c9b3d40704c482\", \"name\": \"testdeploy_db@org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com\", \"start_time\": 1730386162.702915, \"end_time\": 1730386162.7050912, \"fault\": false, \"error\": false, \"throttle\": false, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}}, \"sql\": {\"connection_string\": \"postgresql://org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com:5432/testdeploy_db\", \"url\": \"INSERT testdeploy_db.pending_message\", \"sanitized_query\": \"/* PendingMessageDao.insertIntoPendingMessage */ INSERT INTO pending_message (event_type, event_data, source, detail_type, created_at) VALUES (?, ?::jsonb, ?, ?, ?) RETURNING *\", \"database_type\": \"postgresql\", \"user\": \"testdeploy\"}, \"metadata\": {\"default\": {\"db.sql.table\": \"pending_message\", \"db.operation\": \"INSERT\", \"thread.name\": \"dw-39 - GET /testdeploy/send-message\", \"thread.id\": 39}}, \"namespace\": \"remote\"}, {\"id\": \"42b15385e0250e07\", \"name\": \"testdeploy_db@org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com\", \"start_time\": 1730386162.7104318, \"end_time\": 1730386162.7114258, \"fault\": false, \"error\": false, \"throttle\": false, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}}, \"sql\": {\"connection_string\": \"postgresql://org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com:5432/testdeploy_db\", \"url\": \"testdeploy_db\", \"sanitized_query\": \"SAVEPOINT \\\"before-publish\\\"\", \"database_type\": \"postgresql\", \"user\": \"testdeploy\"}, \"metadata\": {\"default\": {\"thread.name\": \"dw-39 - GET /testdeploy/send-message\", \"thread.id\": 39}}, \"namespace\": \"remote\"}, {\"id\": \"ff3345cf30dae57d\", \"name\": \"testdeploy_db@org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com\", \"start_time\": 1730386162.7117033, \"end_time\": 1730386162.7128386, \"fault\": false, \"error\": false, \"throttle\": false, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}}, \"sql\": {\"connection_string\": \"postgresql://org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com:5432/testdeploy_db\", \"url\": \"DELETE testdeploy_db.pending_message\", \"sanitized_query\": \"/* PendingMessageDao.delete */ DELETE FROM pending_message WHERE id = ?\", \"database_type\": \"postgresql\", \"user\": \"testdeploy\"}, \"metadata\": {\"default\": {\"db.sql.table\": \"pending_message\", \"db.operation\": \"DELETE\", \"thread.name\": \"dw-39 - GET /testdeploy/send-message\", \"thread.id\": 39}}, \"namespace\": \"remote\"}, {\"id\": \"598f5d9e14ee7e44\", \"name\": \"testdeploy_db@org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com\", \"start_time\": 1730386162.7087028, \"end_time\": 1730386162.7098439, \"fault\": false, \"error\": false, \"throttle\": false, \"aws\": {\"ecs\": {\"container\": \"testdeploy\", \"task_arn\": \"arn:aws:ecs:eu-central-1:123456789012:task/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709\", \"task_family\": \"testdeploy\", \"container_id\": \"3116351ee7534c67bd3a502e59d44709-2627975326\", \"container_arn\": \"arn:aws:ecs:eu-central-1:123456789012:container/org-servicegroup-pe/3116351ee7534c67bd3a502e59d44709/faa9e54e-9828-48aa-9016-b6a301e3ae66\", \"launch_type\": \"fargate\"}, \"xray\": {\"auto_instrumentation\": true, \"sdk_version\": \"1.34.1\", \"sdk\": \"opentelemetry for java\"}}, \"sql\": {\"connection_string\": \"postgresql://org-dv-servicegroup-pe.cluster-c2na9gerh6mg.eu-central-1.rds.amazonaws.com:5432/testdeploy_db\", \"url\": \"SELECT testdeploy_db.pending_message\", \"sanitized_query\": \"/* PendingMessageDao.getAndLockForFirstPublish */ SELECT id, event_type, event_data #>> ? AS event_data, source, detail_type, created_at, last_publish_attempt_at, number_of_publish_attempts FROM pending_message WHERE last_publish_attempt_at IS NULL FOR UPDATE SKIP LOCKED LIMIT ? \", \"database_type\": \"postgresql\", \"user\": \"testdeploy\"}, \"metadata\": {\"default\": {\"db.sql.table\": \"pending_message\", \"db.operation\": \"SELECT\", \"thread.name\": \"dw-39 - GET /testdeploy/send-message\", \"thread.id\": 39}}, \"namespace\": \"remote\"}]}]}"

Steps to Reproduce

Send large trace with nested subsegments to awsxrayreceiver

Fewer segments are exported via otlphttp than exist in the trace, including the all-important root span

Expected Result

Root span is not missing in whatever service the otel collector otlphttp exporter sends spans to. A span exists in our trace without a parent_id and this is the correct root span.

Actual Result

Root span is missing.

Collector version

0.112.0

Environment information

Environment

OS: amazonlinux:2023
Compiler(if manually compiled): golang:1.22-alpine, go.opentelemetry.io/collector/cmd/builder@v0.112.0

Binary:

dist:
  name: otelcol-custom
  description: Central OTel Collector binary
  output_path: /dist

receivers:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awsxrayreceiver v0.112.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/receiver/awscloudwatchreceiver v0.112.0

processors:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/attributesprocessor v0.112.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/processor/resourceprocessor v0.112.0
  - gomod: go.opentelemetry.io/collector/processor/batchprocessor v0.112.0

exporters:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/fileexporter v0.112.0
  - gomod: go.opentelemetry.io/collector/exporter/debugexporter v0.112.0
  - gomod: go.opentelemetry.io/collector/exporter/otlphttpexporter v0.112.0

extensions:
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/healthcheckv2extension v0.112.0
  - gomod: github.com/open-telemetry/opentelemetry-collector-contrib/extension/headerssetterextension v0.112.0
  - gomod: go.opentelemetry.io/collector/extension/zpagesextension v0.112.0

providers:
  - gomod: go.opentelemetry.io/collector/confmap/provider/envprovider v1.18.0
  - gomod: go.opentelemetry.io/collector/confmap/provider/fileprovider v1.18.0

OpenTelemetry Collector configuration

extensions:
  headers_setter/basic_auth:
    headers:
      - action: insert
        key: x-honeycomb-team
        value: ${env:OTEL_EXPORTER_OTLP_API_KEY}
  zpages:
    endpoint: :55679
  healthcheckv2:
    use_v2: true
    component_health:
      include_permanent_errors: false
      include_recoverable_errors: true
      recovery_duration: 5m
    http:
      endpoint: "0.0.0.0:13133"
      status:
        enabled: true
        path: "/health/check"
      config:
        enabled: true
        path: "/health/config"

receivers:
  awsxray:
    transport: udp
    endpoint: 0.0.0.0:2000

  awscloudwatch/ecs:
    region: ${env:AWS_REGION}
    logs:
      poll_interval: 1m
      max_events_per_request: 10000
      groups:
        autodiscover:
          limit: 100
          prefix: /aws/ecs/microservice/
          streams:
            prefixes: [main]

processors:
  resource/set-labels:
    attributes:
      - action: insert
        key: Entity
        value: ${env:ENTITY}
      - action: insert
        key: Environment
        value: ${env:ENVIRONMENT}

  resource/set-service-name:
    attributes:
      - action: extract
        key: cloudwatch.log.group.name
        pattern: ^\/aws\/ecs\/microservice\/(?<extractedservicename>[a-z-]+)
      - action: extract
        key: cloudwatch.log.group.name
        pattern: ^\/aws\/lambda\/(?<extractedservicename>[a-z-]+)
      - action: insert
        key: service.name
        from_attribute: extractedservicename
      - action: delete
        key: extractedservicename

  resource/set-account-attributes:
    attributes:
      - action: insert
        key: deployment.environment
        value: ${env:ENVIRONMENT}
      - action: insert
        key: cloud.region
        from_attribute: aws.region
      - action: delete
        key: aws.region

  batch:

exporters:
  file:
    path: ./trace_output
  otlphttp:
    auth:
      authenticator: headers_setter/basic_auth
    endpoint: ${env:OTEL_EXPORTER_OTLP_ENDPOINT}

service:
  extensions:
    - headers_setter/basic_auth
    - healthcheckv2
    - zpages
  telemetry:
    logs:
      level: ${env:TELEMETRY_LOG_LEVEL}
      encoding: ${env:TELEMETRY_LOG_ENCODING:-json} # options: json, console
      output_paths: [stdout]
      error_output_paths: [stderr]
      disable_caller: false
      disable_stacktrace: false
  pipelines:
    traces:
      receivers:
        - awsxray
      exporters:
        - file
        - otlphttp

Log output

Debug logs in collector of the format:


otelcollector-1  | 2024-11-01T11:11:59.878Z	info	Traces	{"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 4}
otelcollector-1  | 2024-11-01T11:11:59.878Z	info	ResourceSpans #0
otelcollector-1  | Resource SchemaURL:
otelcollector-1  | Resource attributes:
otelcollector-1  |      -> service.name: Str(universal)
otelcollector-1  |      -> cloud.provider: Str(aws)
otelcollector-1  |      -> container.name: Str(universal)
otelcollector-1  |      -> container.id: Str(ae5ef46bb63343e0852b4c4fd32657a2-3577057898)
otelcollector-1  |      -> telemetry.sdk.version: Str(1.34.1)
otelcollector-1  |      -> telemetry.sdk.name: Str(opentelemetry for java)
otelcollector-1  |      -> telemetry.sdk.language: Str(java)
otelcollector-1  |      -> service.version: Str(b6ccc491)
otelcollector-1  | ScopeSpans #0
otelcollector-1  | ScopeSpans SchemaURL:
otelcollector-1  | InstrumentationScope
otelcollector-1  | Span #0
otelcollector-1  |     Trace ID       : 672398f2319790d89a186aaaa8291f8a
otelcollector-1  |     Parent ID      : 35ed7e0914764641
otelcollector-1  |     ID             : 629361ce9b60af30
otelcollector-1  |     Name           : universal
otelcollector-1  |     Kind           : Internal
otelcollector-1  |     Start time     : 2024-10-31 14:49:23.1754368 +0000 UTC
otelcollector-1  |     End time       : 2024-10-31 14:49:26.272440576 +0000 UTC
otelcollector-1  |     Status code    : Unset
otelcollector-1  |     Status message :

Nothing incorrect in the logs that do show, but there are missing log entries for the missing spans. Given that these are exporter debug logs, the only remaining outcome we see is that the awsxrayreceiver is at fault?



### Additional context

_No response_
@Mjb141 Mjb141 added bug Something isn't working needs triage New item requiring triage labels Nov 1, 2024
Copy link
Contributor

github-actions bot commented Nov 1, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@Mjb141
Copy link
Author

Mjb141 commented Nov 1, 2024

Example (anonymised) trace attached.
example_trace_anon.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage New item requiring triage receiver/awsxray
Projects
None yet
Development

No branches or pull requests

1 participant