Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

breaking: Use sha256 to hash lambda traceId that are triggered by Step Functions and set _dd.p.tid #534

Merged
merged 30 commits into from
Jun 5, 2024

Conversation

kimi-p
Copy link
Contributor

@kimi-p kimi-p commented May 2, 2024

What does this PR do?

  • using the most significant 128 bits of sha256 (instead of current md5) to hash trace_id and the first 64 for span_id.
  • the 1st and 65th bit should be 0.
  • manually take:
    • the first 64 bits from the sha256 hash for _dd.p.tid tag
    • the 2nd 64 bits from the sha256 hash for traceId
    • if it were for spanId, use sha256 hash for parentId (which is step function task's spanId)
  • ts-md5 package is remove.
  • @node/type in DevDependencies needs to be upgraded by running yarn add @types/node -D as suggested in this the issue Issue with TS4.9 - 'AbortSignal' was also declared here. microsoft/TypeScript#51567
  • note that for the root span it has span_id in number but trace_id in hex
  • relevant PR for 64 bits using md5
{
    "trace": {
        "root_id": "3124668925291824462",
        "spans": {
            "3124668925291824462": {
                "trace_id": "41919d29ba4149cd40fcae25a0e4e323",
                "span_id": "3124668925291824462",
                "parent_id": "0",
                "start": 1716393574.422,
                "end": 1716393576.138,
                "duration": 1.716,
                "type": "stepfunctions",
                "service": "kimi-test-nodejs-linking",
                "name": "aws.stepfunctions",
                "resource": "kimi-test-nodejs-linking",
                "resource_hash": "c76d3199bc4b077",
                "host_id": 3884291,
                "hostname": "none",
                "env": "dev",
                "host_groups": [
                    "env:dev"
                ],
...

Motivation

To support 128 bits trace IDs and to avoid showing up in security vulnerability scans, we are upgrading the hashing method (md5) to sha256.

Testing Guidelines

  • trace link
  • example traceId 41919d29ba4149cd40fcae25a0e4e323 is a hash, not a number anymore.
  • verified that the parent_id of the aws.lambda span matches the aws.stepfunctions.lambda task span.
image

Additional Notes

This is a breaking change that will affect all current Step Functions public beta users. And what will break is the Step Functions trace and Lambda traces linking.

Since this has to be done at some point, we want to put this change out before GA. Serverless-Integrations team will work with Sumedha and have some sort of announcement to let customers know the exact cut off time so that they can deploy their Lambda around the same time.

Referencing PR for implementing MD5

Types of Changes

  • Bug fix
  • New feature
  • Breaking change
  • Misc (docs, refactoring, dependency upgrade, etc.)

Check all that apply

  • This PR's description is comprehensive
  • This PR contains breaking changes that are documented in the description
  • This PR introduces new APIs or parameters that are documented and unlikely to change in the foreseeable future
  • This PR impacts documentation, and it has been updated (or a ticket has been logged)
  • This PR's changes are covered by the automated tests
  • This PR collects user input/sensitive content into Datadog
  • This PR passes the integration tests (ask a Datadog member to run the tests)

@kimi-p kimi-p marked this pull request as ready for review May 7, 2024 18:50
@kimi-p kimi-p requested a review from a team as a code owner May 7, 2024 18:50
const res = "0" + binary.substring(1, 64);
if (res === "0".repeat(64)) {
const res = "0" + binary.substring(1, 128);
if (res === "0".repeat(128)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we need to ensure that the 0th and 64th bits of the 128-bit data are both set to 0.

Copy link

@nine5two7 nine5two7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kimi-p kimi-p changed the title Update Step Function hashing to use sha256 and return the most significant 128 bit Update Step Function hashing to use sha256 and return the most significant 128 bit for trace_id May 16, 2024
"arn:aws:states:sa-east-1:425362996713:stateMachine:MyStateMachine-b276uka1j#lambda#2",
_64_BITS,
);
expect(actual).toEqual("5759173372325510050");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"arn:aws:states:sa-east-1:425362996713:stateMachine:MyStateMachine-b276uka1j#lambda#1",
_64_BITS,
);
expect(actual).toEqual("3711631873188331089");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 173 to 178
if (type === TRACE_ID) {
intArray = uint8Array.subarray(8, 16);
} else {
// type === SPAN_ID || type === DD_P_TID
intArray = uint8Array.subarray(0, 8);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section matches the logic that we put on logs-backend implementation.

@kimi-p kimi-p changed the title Update Step Function hashing to use sha256 and return the most significant 128 bit for trace_id Use sha256 to hash lambda traceId that are triggered by Step Functions and set _dd.p.tid May 23, 2024
"@smithy/util-middleware" "^2.2.0"
"@smithy/util-retry" "^2.2.0"
"@smithy/util-utf8" "^2.3.0"
"@aws-sdk/client-sso-oidc" "3.577.0"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal to have so many changes on the yarn.lock file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just that these are the result of not pinned packages. When I remove the lock file and regenerated, some packages have been upgraded all at once.

});

const ptid = this.deterministicSha256HashToBigIntString(this.context["step_function.execution_id"], DD_P_TID);
if (ptid === "0".repeat(16)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some explanations for the condition check in Line 15?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I think this was coming from similar logic implemented somewhere the dd-trace-js. But since deterministicSha256HashToBigIntString will never return all zeros, we can remove the check here.

Copy link

@nine5two7 nine5two7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 140 to 152
const _DatadogSpanContext = require("dd-trace/packages/dd-trace/src/opentracing/span_context");
const id = require("dd-trace/packages/dd-trace/src/id");

const spanContext = SpanContextWrapper.fromTraceContext({
traceId,
parentId,
sampleMode,
source: TraceSource.Event,
const ddTraceContext = new _DatadogSpanContext({
traceId: id(traceId, 10),
spanId: id(parentId, 10),
sampling: { priority: sampleMode.toString(2) },
});

const ptid = this.deterministicSha256HashToBigIntString(this.context["step_function.execution_id"], DD_P_TID);
ddTraceContext._trace.tags["_dd.p.tid"] = id(ptid, 10).toString(16);
const spanContext = new SpanContextWrapper(ddTraceContext, TraceSource.Event);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we not using the current SpanContextWrapper code? Is it because we need to add the ptid?

If so, is there another way to not require dd-trace code here? Or at least, we need to lazy load it, since the tracer could be not available

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes.
  2. Lemme try to lazy load it.

Copy link
Contributor

@duncanista duncanista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment

kimi-p and others added 3 commits June 4, 2024 16:17
Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
Co-authored-by: jordan gonzález <30836115+duncanista@users.noreply.github.com>
@kimi-p
Copy link
Contributor Author

kimi-p commented Jun 5, 2024

I have verified after these change based on the comments that the spans linking still works in staging.

image

Copy link
Contributor

@duncanista duncanista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@duncanista duncanista changed the title Use sha256 to hash lambda traceId that are triggered by Step Functions and set _dd.p.tid breaking: Use sha256 to hash lambda traceId that are triggered by Step Functions and set _dd.p.tid Jun 5, 2024
@kimi-p kimi-p merged commit 7753de1 into main Jun 5, 2024
25 checks passed
@kimi-p kimi-p deleted the kimi/update-sha256 branch June 5, 2024 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants