Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS::Http::CurlHandleContainer will cause the service to be permanently unavailable #3167

Closed
1 task
USTCZJY opened this issue Oct 31, 2024 · 4 comments
Closed
1 task
Labels
bug This issue is a bug. p3 This is a minor priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.

Comments

@USTCZJY
Copy link

USTCZJY commented Oct 31, 2024

Describe the bug

The DestroyCurlHandle function will fail to rebuild the curl connection, but at this time m_poolSize has not changed, which may result in no connection available in the end. For example: there are N connections in the connection pool, and there are currently 2N concurrent requests. Unfortunately, the N threads that have obtained the connection have problems when making http calls, and DestroyCurlHandle fails to rebuild the curl connection. At this time, there is no connection in the connection pool, but m_poolSize is equal to m_maxPoolSize, so the blocked N threads cannot be awakened, and the new working thread will also be blocked, causing the service to be permanently unavailable.
2. If function AWS::Utils::ExclusiveOwnershipResourceManager::ShutdownAndWait encounters the above situation, it will also be permanently blocked because it does not meet the condition of _resources.size() == resourceCount.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

none

Current Behavior

I haven't actually encountered it, but it's a potential risk.

Reproduction Steps

none

Possible Solution

none

Additional Information/Context

none

AWS CPP SDK version used

1.11.438

Compiler and Version used

4.8.5

Operating System and version

linux

@USTCZJY USTCZJY added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 31, 2024
@jmklix
Copy link
Member

jmklix commented Nov 1, 2024

Do you have any potential repro code that might reproduce this? I noticed that you said you haven't encountered this. How likely do you expect this edge case to happen and how do you expect the sdk to handle it?

@jmklix jmklix added response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. p3 This is a minor priority issue and removed needs-triage This issue or PR still needs to be triaged. labels Nov 1, 2024
@USTCZJY
Copy link
Author

USTCZJY commented Nov 3, 2024

image
image
The problem is mainly in this part of the code. When the curl_easy_init() function returns nullptr, the connection pool will permanently lack a connection, but m_poolSize will not change. This makes the connection pool actually have only N-1 connections, but the worker thread thinks the connection pool is full and will not call the grow operation. If this error continues to occur, the service will be unavailable. However, the probability of curl_easy_init() function returning nullptr error is mainly related to the actual operating environment and is difficult to estimate.

@USTCZJY
Copy link
Author

USTCZJY commented Nov 3, 2024

image
image
As for the improvement method, I think you can follow the modification method in the figure, but this method still has problems, which will also happen when the connection reconstruction fails. For example: the connection pool has N connections, and there are N+1 worker threads to obtain connections, but the N threads that obtain the connection http requests all fail and the connection reconstruction fails. At this time, although m_poolSize is restored to 0, if there is no new worker thread to perform the grow operation, the previously blocked thread will never be awakened.

@github-actions github-actions bot removed the response-requested Waiting on additional info and feedback. Will move to "closing-soon" in 10 days. label Nov 4, 2024
@jmklix jmklix added the pending-release This issue will be fixed by an approved PR that hasn't been released yet. label Nov 19, 2024
Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue is a bug. p3 This is a minor priority issue pending-release This issue will be fixed by an approved PR that hasn't been released yet.
Projects
None yet
Development

No branches or pull requests

2 participants