Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: MaxListenersExceededWarning when using newer AWS SDK version #500

Closed
peter-asztalos-mf opened this issue Jun 20, 2024 · 22 comments
Closed

Comments

@peter-asztalos-mf
Copy link

peter-asztalos-mf commented Jun 20, 2024

Describe the bug

Hi,

Recently we tried to upgrade our dependencies, as part of this we tried to upgrade @aws-sdk/client-sqs from v3.576.0 to v3.600.0. After the upgrade I noticed we are getting MaxListenersExceededWarning warnings.

After looking into it turns out that there were changes in one of the SDK dependencies and the issue can be triggered starting from version v3.582.0. The change smithy-lang/smithy-typescript#1308 .

I believe the issue is due to reusing the same abort controller in each request, if I set a new AbortController after a successful poll the error goes away.

The full warning:

(node:27582) MaxListenersExceededWarning: Possible EventTarget memory leak detected. 11 abort listeners added to [AbortSignal]. Use events.setMaxListeners() to increase limit
    at EventTarget.[kNewListener] (node:internal/event_target:571:17)
    at EventTarget.[kNewListener] (node:internal/abort_controller:240:24)
    at EventTarget.addEventListener (node:internal/event_target:684:23)
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/node-http-handler/dist-cjs/index.js:331:23
    at new Promise (<anonymous>)
    at _NodeHttpHandler.handle (<project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/node-http-handler/dist-cjs/index.js:248:12)
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/smithy-client/dist-cjs/index.js:192:35
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/middleware-serde/dist-cjs/index.js:33:30
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/core/dist-cjs/index.js:165:24
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@smithy/middleware-retry/dist-cjs/index.js:320:38
    at <project>/node_modules/@aws-sdk/middleware-sdk-sqs/dist-cjs/index.js:92:18
    at <project>/node_modules/@aws-sdk/client-sqs/node_modules/@aws-sdk/middleware-logger/dist-cjs/index.js:34:22
    at Consumer.receiveMessage (<project>/node_modules/sqs-consumer/dist/cjs/consumer.js:214:28)

Your minimal, reproducible example

https://codesandbox.io/p/sandbox/upbeat-jerry-k6pplg

Steps to reproduce

1 .Get the example from the Code Sandbox
2. Start docker compose to run SQS mock service
3. Start the example script: NODE_OPTIONS="--trace-warnings" DEBUG=sqs-consumer npm run start
4. observe the warning

Otherwise:

  1. upgrade to newer a @aws-sdk/client-sqs version (>= 3.582.0)
  2. set the WaitTimeSeconds in the ReceiveMessageCommand to 1 second to trigger the warning faster
  3. observe the warning

Expected behavior

There shouldn't be warning.

How often does this bug happen?

Every time

Screenshots or Videos

No response

Platform

  • OS: macOS Sonoma
  • Node: v20.13.1.

Package version

v10.3.0

AWS SDK version

>=3.582.0

Additional context

No response

@nicholasgriffintn
Copy link
Member

Hey, thanks for raising an issue.

3.6 of the sdk would be a major version, we don't naturally support major version releases as they come out as they come with breaking changes like this.

It would be something that we'd have to look at for our next major/minor release ourselves.

I am rather busy at the moment though, so I would suggest staying on 3.5 for a little while longer.

We'd be happy to take a pr though if this is an issue for anyone.

@peter-asztalos-mf
Copy link
Author

Hej, thanks for the quick response.

I can reproduce this issue just upgrading to 3.582.0 which if I understood correctly is part of 3.5, right? 🤔

I can look into it and open a PR. My fix during the test was very simple: passing new AbortController per request. Do you see any issue with that?

@nicholasgriffintn
Copy link
Member

nicholasgriffintn commented Jun 21, 2024

Ahh sorry, I thought you said it was 3.6, the pr you linked also seems like a breaking change to me.

If I have time I might be able to look into it. In terms of a possible solution, that might be if, I've not looked too deep though.

@erichkuba
Copy link

I'm having the same issue and can confirm that this is not just a warning but continues to leak until the process is out of memory.

@nicholasgriffintn
Copy link
Member

nicholasgriffintn commented Jun 24, 2024

If I have spare time this week I'll take a look, unfortunately, until then, I'd have to ask that everyone bares with.

If anyone has the time themselves to look, that'd be super helpful, but if not, you'll have to wait on me. Unfortunately, it has come at a particularly busy time for the BBC.

@erichkuba
Copy link

erichkuba commented Jun 24, 2024

I've possibly been able to resolve this by pinning the @smithy/node-http-handler package (which is used internally by @AWS-SDK) to version 3.0.1

With this, I have been able to upgrade the @AWS-SDK packages to version ^3.600.0. This is not a long-term solution but provides a bridging solution until the root cause is addressed.

So, in short, add the following to your dependencies in your package.json file:

    "@smithy/node-http-handler": "3.0.1",

Note: After doing this, you will need to delete your package-lock.json file, or npm will install the version pinned in the lock file, which may not be the same as the version you have pinned in your package.json file.

I've deployed this change into our systems and will monitor memory usage over the next few days, but the warning no longer presents.

I hope this is helpful.

@erichkuba
Copy link

After reviewing the impact of the above fix, it does appear that the leak has been resolved, so I'm happy to recommend it as a temporary solution until a more permanent fix is available.

SQS-Consumer-Memory-Leak-Fix

@nicholasgriffintn
Copy link
Member

nicholasgriffintn commented Jun 24, 2024

Awesome! So it is a change in the next version that causes the issue, that makes it easier to figure out what's going on, thanks!

(This one: https://github.com/smithy-lang/smithy-typescript/releases/tag/%40smithy/node-http-handler%403.1.0 which is the change mentioned at the top, spot on!)

@nicholasgriffintn
Copy link
Member

nicholasgriffintn commented Jun 28, 2024

On further research, this may be because the Aborts are not automatically being cleaned up, there's an old Node issue around this: nodejs/node#46525

Possibly because smithy is adding event listeners: https://github.com/smithy-lang/smithy-typescript/pull/1308/files#diff-98216b22e6b39a1cd6f2ab013304d4964152b554c4ad8fee4182a961b42285d8R192 but not removing them?

Update: I've opened an issue around this here: smithy-lang/smithy-typescript#1318

@nicholasgriffintn
Copy link
Member

nicholasgriffintn commented Jun 28, 2024

Update on this, I have published 11.0.0-canary.1 with the latest version of the aws-sdk to test this with that new version and I've made a new example that sets up Docker with localstack for a truer test: https://github.com/bbc/sqs-consumer-starter/tree/main/examples/docker

With this, I have replicated the issue:
Screenshot by Dropbox Capture

Still trying to figure out the best way to resolve it though.

@joseamt
Copy link

joseamt commented Jul 2, 2024

@erichkuba Still I have the issue.

Node version : 18.20.2
"@aws-sdk/client-sqs": "^3.600.0",
"@ssut/nestjs-sqs": "^2.1.0",

Screenshot 2024-06-21 113538

I tried with @smithy/node-http-handler package and still the issue persists.

@nicholasgriffintn
Copy link
Member

Hey, I'm still waiting on a response from AWS on the issue I raised above.

Outside of that, the current suggestion is to do what was described here: #500 (comment), or pin your AWS SDK version to before this change.

Outside of that, if anyone has any ideas for how it might be resolved in SQS Consumer, feel free to open a PR, we are in an election period at the moment, so I'm strapped for time.

Could we avoid adding "me too" comments though please? This just fills up the thread and makes it unreadable for other people coming in.

@erichkuba
Copy link

erichkuba commented Jul 2, 2024

@joseamt Thanks for bringing this up. I could have been clearer in #500 (comment), and have updated it accordingly.

You need to pin the @smithy/node-http-handler package in your package,json, AND delete your package-lock.json, or the version in the package-lock.json file will be used.

I hope that gets you across the line.

@joseamt
Copy link

joseamt commented Jul 3, 2024

@erichkuba Thanks for the update, it worked 👍 👍 👍 !!!. Please add the comments here if you get any updates from AWS.
Once again thanks for the great help

cheers 👍 💯

@nicholasgriffintn
Copy link
Member

A PR has been raised by another member of the BBC that may fix this: smithy-lang/smithy-typescript#1331

Still waiting on a response from AWS on the issue, but hopefully they'll review this as a fix.

@paulbrimicombe
Copy link

I had a little time to mess around with this today. I ran the docker example @nicholasgriffintn linked above (but not with tsx — I compiled the TypeScript with tsc and then ran the JS code). The consumer options are:

const options: ConsumerOptions = {
  queueUrl: "http://127.0.0.1:4566/000000000000/sqs-consumer-test",
  sqs: new SQSClient({
    endpoint: "http://127.0.0.1:4566/000000000000/sqs-consumer-test",
    // This is the latest version of @smithy/node-http-handler
    requestHandler: new NodeHttpHandler({
      // Disable keepAlive in order to keep as little state around in memory as possible
      httpsAgent: new https.Agent({ keepAlive: false }),
      httpAgent: new http.Agent({ keepAlive: false }),
    }),
  }),
  // Artificially fast in order to test the memory pressure over time
  pollingWaitTimeMs: 5,
  waitTimeSeconds: 0.01,
  postReceiveMessageCallback: async () => {
    requests += 1;
    if (requests % 20 === 0) {
      // Print the memory usage
      console.log(`${requests}\t\t${process.memoryUsage().heapUsed}`);
    }
  },
  handleMessage: async () => {},
};

Then I run the scripts with:

tsc && \
  # Artificially limit the memory available to ensure that the Garbage Collector kicks in regularly \
  NODE_OPTIONS='--max-old-space-size=14' \
  AWS_REGION=eu-west-1 AWS_ACCESS_KEY_ID=123 AWS_SECRET_ACCESS_KEY=456 \
  node ./index.js > some-data-file.tsv

Results

Here are some results for:

  • The original production code
    • This fails with an out-of-memory error where the line stops
  • The original production code but with { once: true } set in @smithy/node-http-handler (see Execute AbortSignal listeners once to avoid memory leaks smithy-lang/smithy-typescript#1331)
    • This fails with an out-of-memory error pretty quickly (not surprisingly since unless you're actually aborting, the listeners are never removed)
  • With sqs-consumer patched so that no abortSignal is passed to the AWS SDK
    • This is our baseline — whatever we do with AbortSignals, we're never going to get better performance than this.
image

Now the interesting bit....

These results are for:

  • No AbortSignals (our baseline)

  • Using a new AbortSignal for every request from sqs-consumer — this isn't wired up so we can actually trigger the abort, but tests how well these are garbage collected

  • Using a new AbortSignal in every request from sqs-consumer but wiring it up to the global AbortSignal:

    const requestAbortController = new AbortController();
    const abortHandler = (reason) => requestAbortController.abort(reason);
    this.abortController.signal.addEventListener('abort', abortHandler);
    // Call the AWS SDK passing requestAbortController.signal
    this.abortController.signal.removeEventListener('abort', abortHandler);
  • The same as the previous example, but with patched code in @smithy/node-http-handler to remove the AbortSignal listeners when the request finished:

    if (typeof abortSignal.addEventListener === "function") {
      abortSignal.addEventListener("abort", onAbort, { once: true });
      // This line is new:
      req.on('close', () => abortSignal.removeEventListener('abort', onAbort))
    }
  • The original sqs-consumer code (with a single AbortSignal shared across all requests) but with the patch to tidy listeners in @smithy/node-http-handler

image

Discussion / Conclusions

In the graph above, it's clear that the best memory performance comes when not using any AbortSignal — this is not at all surprising. While it might be tempting to try to infer some differences between the performance of the other options, I think that any differences are due to different Garbage Collection processing and that if I re-ran the tests, any differences would be within the measurement error.

From these measurements, I can say that we could make changes to sqs-consumer to fix this issue without any changes to the @smithy/node-http-handler library:

  • Use a new AbortSignal for each request
  • Add an abort listener to the global AbortSignal that triggers the request AbortSignal
  • Ensure that the abort listener is removed from the global AbortSignal when the request is complete

In the meantime I'll raise another PR for the @smithy/node-http-handler library that removes the event listeners from the AbortSignal when the request has completed (this is slightly more tricky for the fetch handler than the http and https handlers).

@paulbrimicombe
Copy link

As promised: smithy-lang/smithy-typescript#1332

@0xMALVEE
Copy link

0xMALVEE commented Jul 8, 2024

@paulbrimicombe is this error fixed?

@nicholasgriffintn
Copy link
Member

nicholasgriffintn commented Jul 8, 2024

I've just ran our Docker setup for 5 minutes (https://github.com/bbc/sqs-consumer-starter/tree/main/examples/docker), and got no errors.

This is with 11.0.0-canary.6, but this is just a version to indicate our support of the new AWS SDK major version, on your side, you'll just need to make sure your package lock lists 3.1.2 or above for the package @smithy/node-http-handler.

If we don't see any further issues, we'll release v11 to the latest tag to indicate our support.

Thanks @paulbrimicombe btw!

@nicholasgriffintn
Copy link
Member

Hey, we've now released v11.0.0 of the package (https://github.com/bbc/sqs-consumer/releases/tag/v11.0.0).

I'm going to close this issue as it looks like the update the node-http-handler fixed the issue, please feel free to comment if the issue crops up again, if we receive no further comments, this issue will be auto locked in 30 days.

@joseamt
Copy link

joseamt commented Jul 11, 2024

@nicholasgriffintn @paulbrimicombe @erichkuba Thanks for the update and support!! 👍 💯

Copy link

This issue has been closed for more than 30 days. If this issue is still occuring, please open a new issue with more recent context.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 10, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants