Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Request Performance Regression on Amazon Linux 2023 #1621

Closed
chrisoverzero opened this issue Nov 16, 2023 · 5 comments
Closed

Initial Request Performance Regression on Amazon Linux 2023 #1621

chrisoverzero opened this issue Nov 16, 2023 · 5 comments
Assignees
Labels
bug This issue is a bug. module/custom-runtime service-api This issue is due to a problem in a service API, not the SDK implementation.

Comments

@chrisoverzero
Copy link
Contributor

chrisoverzero commented Nov 16, 2023

Describe the bug

When using the AWS SDK for .NET on Lambda with a runtime of "provided.al2023" the first request to an AWS service is significantly slower than it is under "provided.al2".

I've checked this with the Step Functions and S3 SDKs in particular.

Expected Behavior

Let's look under Amazon Linux 2. Here's an example initial request metric from the S3 SDK:

Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T162723Z\n20231116/eu-west-1/s3/aws4_request\n0eadbea89d6ac50207072872ff834e0c56ecb2d325b4b8e74b20322a13e955a1; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = j9+NJQJ+0MGzhi2RWbHJowscyJVqh8kd7rJ8xidbA2CZWfQdnzEyhI880oYmqMF0MudxZdEi/Yc=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = QN3QHSD7PNK16JE2; CredentialsRequestTime = 00:00:00.0000659; RequestSigningTime = 00:00:00.0141405; HttpRequestTime = 00:00:00.1191039; ResponseUnmarshallTime = 00:00:00.0022337; ResponseProcessingTime = 00:00:00.0024265; ClientExecuteTime = 00:00:00.1375685;

…and from a subsequent request:

Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T162729Z\n20231116/eu-west-1/s3/aws4_request\nf8f6569b2501b42700322a3f2cc6e9d30d4676388cd0911f13ff2534568aa87a; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = c6DRj8nFKTIYU91M/iPPrOK2RB3ZoXpm/u1/2ZvBLnhU5g1MK0xrrUYvsmx1kj0rh9RiuBku0uA=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = K2VKBKNNP5G1244J; CredentialsRequestTime = 00:00:00.0000823; RequestSigningTime = 00:00:00.0001613; HttpRequestTime = 00:00:00.0562662; ResponseUnmarshallTime = 00:00:00.0012703; ResponseProcessingTime = 00:00:00.0014287; ClientExecuteTime = 00:00:00.0581422;

Let's clean that up:

Initial Subsequent
CredentialsRequestTime 00:00:00.0000659 00:00:00.0000823
RequestSigningTime 00:00:00.0141405 00:00:00.0001613
HttpRequestTime 00:00:00.1191039 00:00:00.0562662
ResponseUnmarshallTime 00:00:00.0022337 00:00:00.0012703
ResponseProcessingTime 00:00:00.0024265 00:00:00.0014287
ClientExecuteTime 00:00:00.1375685 00:00:00.0581422

In short, the total time for an initial request is a little under 140ms with subsequent requests under 60ms. Around 120 of those 140 milliseconds were spent in HttpRequestTime. A totally reasonable place to spend milliseconds.

Current Behavior

Doing the same for Amazon Linux 2023, first:

Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T163811Z\n20231116/eu-west-1/s3/aws4_request\n80c4f0ccf97f8a6a98732829089b00568838935cb9a820541ad204d27df447a9; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = V5p/kJVqud9IIy9+KZLbK/AqdjfvxSp6f4mREQiEooUHtcpqTxkNfCyG+VI2y1bH3gK22JUCcmk=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = 20552BMSN97GSH3T; CredentialsRequestTime = 00:00:00.0000497; RequestSigningTime = 00:00:00.0152391; HttpRequestTime = 00:00:00.3099302; ResponseUnmarshallTime = 00:00:00.0054981; ResponseProcessingTime = 00:00:00.0056516; ClientExecuteTime = 00:00:00.3326032;

…and subsequent:

Request metrics: AsyncCall = True; CanonicalRequest = [REDACTED]; StringToSign = AWS4-HMAC-SHA256\n20231116T163819Z\n20231116/eu-west-1/s3/aws4_request\nf3ceeb3c74e79254fc97210de4475dae052f7cf7a2564f4456f7000aefc60786; ServiceName = AmazonS3; ServiceEndpoint = https://fen-object-share.s3.eu-west-1.amazonaws.com/; MethodName = ListObjectsV2Request; AmzId2 = VnPpOSZCVArdTrl5tOhRHZAEDl3IYCbSUpB8ddHuDklLENmeCUvrj/sAYh6F7qVrv5fT4eJmUlM=; StatusCode = OK; BytesProcessed = 0; AWSRequestID = CT6NQGKCR4DCFGR8; CredentialsRequestTime = 00:00:00.0000801; RequestSigningTime = 00:00:00.0002212; HttpRequestTime = 00:00:00.0450372; ResponseUnmarshallTime = 00:00:00.0016921; ResponseProcessingTime = 00:00:00.0018422; ClientExecuteTime = 00:00:00.0474034;

Cleaning that up as before, we get:

Initial Subsequent
CredentialsRequestTime 00:00:00.0000497 00:00:00.0000801
RequestSigningTime 00:00:00.0152391 00:00:00.0002212
HttpRequestTime 00:00:00.3099302 00:00:00.0450372
ResponseUnmarshallTime 00:00:00.0054981 00:00:00.0016921
ResponseProcessingTime 00:00:00.0056516 00:00:00.0018422
ClientExecuteTime 00:00:00.3326032 00:00:00.0474034

The overall ClientExecuteTime for the initial request has more than doubled. Around 310 of the total 330 milliseconds were spent in HttpRequestTime. This and the above are only a single sample – I've sampled this many times and I first saw it manifest in a running application:

image

These are stats for duration in milliseconds of first request binned by 1 day. Before and after the bump are "provided.al2". The bump is "provided.al2023".

Reproduction Steps

Repro is committed here. Publishing with CI set to true will perform AOT compilation. Please do let me know if there's more I can do.

Possible Solution

No response

Additional Information/Context

No response

AWS .NET SDK and/or Package version used

AWSSDK.S3: 3.7.300.3
AWSSDK.StepFunctions: 3.7.301.1

Targeted .NET Platform

.NET 8

Operating System and version

Amazon Linux 2, Amazon Linux 2023

@chrisoverzero chrisoverzero added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Nov 16, 2023
@ashishdhingra ashishdhingra added needs-reproduction This issue needs reproduction. needs-review and removed needs-triage This issue or PR still needs to be triaged. labels Nov 16, 2023
@normj
Copy link
Member

normj commented Nov 16, 2023

Thanks for all of this data @chrisoverzero. I'll do some investigation to see what I can find.

@ashishdhingra ashishdhingra added service-api This issue is due to a problem in a service API, not the SDK implementation. and removed needs-review needs-reproduction This issue needs reproduction. labels Nov 17, 2023
@ashishdhingra ashishdhingra transferred this issue from aws/aws-sdk-net Nov 17, 2023
@normj
Copy link
Member

normj commented Mar 30, 2024

This issue was most likely due to the the Amazon Linux 2023 using OpenSSL 3.0. We found a performance regression in OpenSSL 3.0 and .NET which caused the cert store to loaded twice. For the dotnet8 runtime we made a special tweak to compensate for this issue. We also pushed a fix to the .NET runtime but Microsoft won't ship that fix till .NET 9. Since the provided.al2023 does not use the bash script I linked before it will be affected by the regression. I would advice using the dotnet8 runtime over the provided.al2023 runtime even if you aren't using the .NET runtime installed in the managed runtime.

@normj normj closed this as completed Mar 30, 2024
Copy link
Contributor

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

@chrisoverzero
Copy link
Contributor Author

@normj Thanks for all the work that you and others put into this issue – I was following along on other GitHub projects.

Does the equivalent advice apply to Image-style Lambda Functions, as well? That is, even for a NativeAOT application, would one be better off using the dotnet8 base image rather than the provided.al2023 image?

@normj
Copy link
Member

normj commented Apr 5, 2024

You can see the tweak we ended up doing was just set the SSL_CERT_FILE to non existing file outside of the usual cert directory to avoid cert double load. So you could set that environment variable yourself on provided.al2023. Something to keep in mind with provided.al2023 is libicu is not preinstalled which .NET needs for internalization. So unless you deploy with your function configured for invariant mode you have to include libicu in the deployment bundle. You can include in .NET by adding the following to your project but it does increase the deployment bundle quite a bit.

  <ItemGroup>
    <RuntimeHostConfigurationOption Include="System.Globalization.AppLocalIcu" Value="68.2.0.9" />
    <PackageReference Include="Microsoft.ICU.ICU4C.Runtime" Version="68.2.0.9" />
  </ItemGroup>

The lack of libicu is another strong reason I recommend deploying dotnet8 to avoid having include it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. module/custom-runtime service-api This issue is due to a problem in a service API, not the SDK implementation.
Projects
None yet
Development

No branches or pull requests

3 participants