Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sdk): enable xray #5025

Closed
wants to merge 3 commits into from
Closed

feat(sdk): enable xray #5025

wants to merge 3 commits into from

Conversation

skorfmann
Copy link
Contributor

This aims to do the following things:

  • wrap the aws sdk clients with xray
  • instrument outgoing http

For http, I haven't found the right way to inject the necessary code. Tests are failing with something like this:

 FAIL  test/target-tf-aws/api.test.ts > api with GET routes with different prefix
Error: Failed to bundle function: Error: Build failed with 1 error:
api-onrequest-c5395e41_c8f26cae.js:2:24: ERROR: Could not resolve "aws-xray-sdk"
 ❯ createBundle src/shared/bundling.ts:54:11
 ❯ new Function src/target-tf-aws/function.ts:84:32
 ❯ Api.addInflightHandler src/target-tf-aws/api.ts:258:12
 ❯ Api.getExistingOrAddInflightHandler src/target-tf-aws/api.ts:219:17
 ❯ Api.addHandler src/target-tf-aws/api.ts:201:19
 ❯ Api.httpRequests src/target-tf-aws/api.ts:78:21
 ❯ Api.get src/target-tf-aws/api.ts:100:10
 ❯ test/target-tf-aws/api.test.ts:83:7
     81|   const inflight2 = Testing.makeHandler(app, "Handler2", INFLIGHT_CODE);
     82|
     83|   api.get("/hello/foo", inflight);
       |       ^
     84|   api.get("/foo/bar", inflight2);

Also, I'm not quite sure if the current approach for http instrumentation is the right way. I think in another project I brought in the xray sdk via a Lambda layer and excluded it from bundling. But I don't think this would help with the failing tests. Thoughts / ideas?

Related to #4830

Checklist

  • Title matches Winglang's style guide
  • Description explains motivation and solution
  • Tests added (always)
  • Docs updated (only required for features)
  • Added pr/e2e-full label if this feature requires end-to-end testing

By submitting this pull request, I confirm that my contribution is made under the terms of the Wing Cloud Contribution License.

```
 FAIL  test/target-tf-aws/api.test.ts > api with GET routes with different prefix
Error: Failed to bundle function: Error: Build failed with 1 error:
api-onrequest-c5395e41_c8f26cae.js:2:24: ERROR: Could not resolve "aws-xray-sdk"
 ❯ createBundle src/shared/bundling.ts:54:11
 ❯ new Function src/target-tf-aws/function.ts:84:32
 ❯ Api.addInflightHandler src/target-tf-aws/api.ts:258:12
 ❯ Api.getExistingOrAddInflightHandler src/target-tf-aws/api.ts:219:17
 ❯ Api.addHandler src/target-tf-aws/api.ts:201:19
 ❯ Api.httpRequests src/target-tf-aws/api.ts:78:21
 ❯ Api.get src/target-tf-aws/api.ts:100:10
 ❯ test/target-tf-aws/api.test.ts:83:7
     81|   const inflight2 = Testing.makeHandler(app, "Handler2", INFLIGHT_CODE);
     82|
     83|   api.get("/hello/foo", inflight);
       |       ^
     84|   api.get("/foo/bar", inflight2);
```
Copy link
Contributor

@eladb eladb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any idea what's the issue you are seeing with bundling


lines.push('"use strict";');
lines.push(`const AWSXRay = require('aws-xray-sdk');`);
lines.push(`AWSXRay.captureHTTPsGlobal(require('http'));`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also capture fetch which we are using to implement the wing http module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, good point - aws/aws-xray-sdk-node#531 - no, it's not. They apparently recommend to use some open telemetry sdk. Will check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks interesting https://aws-otel.github.io/

Also, seems to be less intrusive and more generic. Will see if that's really just pluggable via ENV, layer and permissions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes! is it possible to use OT for everything instead of directly xray? that would be a much better and general solution.

lines.push(`const AWSXRay = require('aws-xray-sdk');`);
lines.push(`AWSXRay.captureHTTPsGlobal(require('http'));`);
lines.push("exports.handler = async function(event) {");
lines.push(` return await (${inflightClient}).handle(event);`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lines.push(` return await (${inflightClient}).handle(event);`);
lines.push(` return (${inflightClient}).handle(event);`);

Not needed

* @param handler IFunctionHandler
* @returns the function code lines as strings
*/
protected _getCodeLines(handler: IFunctionHandler): string[] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this called from?

@monadabot
Copy link
Contributor

Console preview environment is available at https://wing-console-pr-5025.fly.dev 🚀

Last Updated (UTC) 2023-11-22 08:55

@monadabot
Copy link
Contributor

Benchmarks

Comparison to Baseline 🟥⬜⬜⬜⬜⬜⬜⬜⬜⬜
Benchmark Before After Change
version 67ms±0.35 69ms±0.5 +2ms (+2.41%)🟥
functions_10.test.w -t sim 624ms±20.41 614ms±5.37 -10ms (-1.64%)⬜
functions_10.test.w -t tf-aws 3304ms±27.67 ... ...
empty.test.w -t sim 512ms±3.52 512ms±7.16 0ms (-0.04%)⬜
empty.test.w -t tf-aws 634ms±7.09 637ms±5.99 +3ms (+0.53%)⬜
hello_world.test.w -t sim 542ms±2.6 556ms±18.05 +14ms (+2.66%)⬜
hello_world.test.w -t tf-aws 4575ms±29.13 ... ...
functions_1.test.w -t sim 543ms±3.36 548ms±4.92 +5ms (+0.99%)⬜
functions_1.test.w -t tf-aws 1527ms±28.77 ... ...
jsii_big.test.w -t sim 3333ms±10.03 3343ms±11.18 +10ms (+0.3%)⬜
jsii_big.test.w -t tf-aws 3442ms±9.45 3456ms±9.97 +14ms (+0.4%)⬜
jsii_small.test.w -t sim 515ms±4.46 523ms±6.41 +8ms (+1.46%)⬜
jsii_small.test.w -t tf-aws 637ms±4.78 642ms±5.63 +4ms (+0.68%)⬜

⬜ Within 1.5 standard deviations
🟩 Faster, Above 1.5 standard deviations
🟥 Slower, Above 1.5 standard deviations

Benchmarks may vary outside of normal expectations, especially when running in GitHub Actions CI.

Results
name mean min max moe sd
version 69ms 68ms 70ms 1ms 1ms
functions_10.test.w -t sim 614ms 603ms 625ms 5ms 8ms
functions_10.test.w -t tf-aws ... ... ... ... ...
empty.test.w -t sim 512ms 499ms 533ms 7ms 10ms
empty.test.w -t tf-aws 637ms 621ms 653ms 6ms 8ms
hello_world.test.w -t sim 556ms 538ms 623ms 18ms 25ms
hello_world.test.w -t tf-aws ... ... ... ... ...
functions_1.test.w -t sim 548ms 535ms 557ms 5ms 7ms
functions_1.test.w -t tf-aws ... ... ... ... ...
jsii_big.test.w -t sim 3343ms 3322ms 3364ms 11ms 16ms
jsii_big.test.w -t tf-aws 3456ms 3431ms 3480ms 10ms 14ms
jsii_small.test.w -t sim 523ms 511ms 540ms 6ms 9ms
jsii_small.test.w -t tf-aws 642ms 629ms 653ms 6ms 8ms
Last Updated (UTC) 2023-11-22 09:02

@skorfmann
Copy link
Contributor Author

did some research yesterday, it's possible to add open telemetry without changes and get some basic instrumentation going.

Screenshot 2023-12-04 at 23 23 10

However, the segments for AWS SDK calls are missing. It looks like we're bundling the aws sdk v3, which apparently gets in the way of auto-instrumenting sdk calls to services like s3 or ddb.

There were attempts to exclude it from bundling already #4152 which led to drastically increased cold starts #4125 - Perhaps there's a way to get around the performance issues. Will give it a quick try.

btw: that's a case where #3410 would be super helpful to quickly iterate and upstream once it's working.

@skorfmann
Copy link
Contributor Author

oh, and there's a tax on the cold start times for adding the otel agent. Roughly in the range of 0.5 - 1s

@skorfmann
Copy link
Contributor Author

for reference, here's a recent issue in the aws skd v3 repo talking about the cold start issues with unbundled sdk. ~ 1s added at 1024mb aws/aws-sdk-js-v3#5382 - so I'm assuming it gets way worse with less memory. And they recommend to bundle 🤡

@Chriscbr
Copy link
Contributor

Chriscbr commented Dec 5, 2023

oh, and there's a tax on the cold start times for adding the otel agent. Roughly in the range of 0.5 - 1s

Oof... any way to avoid this?

@skorfmann
Copy link
Contributor Author

oh, and there's a tax on the cold start times for adding the otel agent. Roughly in the range of 0.5 - 1s

Oof... any way to avoid this?

just double checked to be sure:

with open telemetry ~ 1.3 seconds init

Screenshot 2023-12-06 at 14 09 38

same but plain lambda ~ 350 ms init
Screenshot 2023-12-06 at 14 11 18

also, this is just a single sample but the execution time of the function itself appears to be slower with otel. But this could be just a coincident.

perhaps bumping up the memory could speed things up a bit. But doesn't seem to be a good strategy.

@skorfmann
Copy link
Contributor Author

also, this is just a single sample but the execution time of the function itself appears to be slower with otel. But this could be just a coincident.

no, there's no difference in actual runtime. Only in terms of cold start / init.

perhaps bumping up the memory could speed things up a bit. But doesn't seem to be a good strategy.

No, doubling to 2048 didn't make a difference.

However, the segments for AWS SDK calls are missing. It looks like we're bundling the aws sdk v3, which apparently gets in the way of auto-instrumenting sdk calls to services like s3 or ddb.

Deployed a little test application and customized it with a platform which uses the default aws otel layer with config, no code changes. While it picks up the general flow, it looks a bit confusing.

Screenshot 2023-12-06 at 16 34 20

And that's the related trace

Screenshot 2023-12-06 at 16 31 50

@skorfmann
Copy link
Contributor Author

for reference, here's a recent issue in the aws skd v3 repo talking about the cold start issues with unbundled sdk. ~ 1s added at 1024mb aws/aws-sdk-js-v3#5382 - so I'm assuming it gets way worse with less memory.

There's a pretty good blog post about this by https://twitter.com/astuyve

https://aaronstuyvenberg.com/posts/aws-sdk-comparison

@skorfmann
Copy link
Contributor Author

some more links around the aws open telemetry stuff

@skorfmann
Copy link
Contributor Author

re: fetch for xray - there's a pretty nice workaround posted here by @RaphaelManke aws-powertools/powertools-lambda-typescript#1619

Copy link

Hi,

This PR has not seen activity in 20 days. Therefore, we are marking the PR as stale for now. It will be closed after 7 days.
If you need help with the PR, do not hesitate to reach out in the winglang community slack at winglang.slack.com.
Feel free to re-open this PR when it is still relevant and ready to be worked on again.
Thanks!

@github-actions github-actions bot added the Stale label Dec 27, 2023
@github-actions github-actions bot closed this Jan 4, 2024
@eladb
Copy link
Contributor

eladb commented Jan 4, 2024

😞

@skorfmann
Copy link
Contributor Author

for wing cloud, we're using a custom platform to define the open telemetry config, which works ok. As said above, it adds substantial cold start times though.

So, perhaps an opt-in option for functions would work?

@skorfmann skorfmann reopened this Jan 4, 2024
@github-actions github-actions bot removed the Stale label Jan 5, 2024
Copy link

Hi,

This PR has not seen activity in 20 days. Therefore, we are marking the PR as stale for now. It will be closed after 7 days.
If you need help with the PR, do not hesitate to reach out in the winglang community slack at winglang.slack.com.
Feel free to re-open this PR when it is still relevant and ready to be worked on again.
Thanks!

@github-actions github-actions bot added the Stale label Jan 25, 2024
@github-actions github-actions bot closed this Feb 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants