Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connections hang in CLOSE-WAIT state #3182

Open
Yerkwell opened this issue Nov 2, 2023 · 5 comments · May be fixed by #3454
Open

Connections hang in CLOSE-WAIT state #3182

Yerkwell opened this issue Nov 2, 2023 · 5 comments · May be fixed by #3454
Labels
A-http project: actix-http C-bug Category: bug

Comments

@Yerkwell
Copy link

Yerkwell commented Nov 2, 2023

If a post request with a large body gets error (413 or 404 or any other) as a response, the connection doesn't close, but hangs in CLOSE-WAIT state

Expected Behavior

After getting error connection is closed successfully

Current Behavior

Connection hangs in CLOSE-WAIT state with some bytes in Recv-Q

Possible Solution

As it only happens to requests with a large body, I guess there is something to do with the bytes that weren't read from a socket at the time the error happens. So we need to make sure these bytes are being read before closing the socket.

Steps to Reproduce (for bugs)

  1. Creating simple server
use actix_web::{App, HttpServer};

#[actix_web::main]
pub async fn main() {
    HttpServer::new(move || {
        App::new()
    })
        .bind("127.0.0.1:5556").unwrap()
        .run().await.unwrap();
}
  1. Making request with large body, that will end up with an error (404 in this case)
import requests
requests.post('http://127.0.0.1:5556/data', json={"data": "a"*1000000})
  1. Checking sockets - we can see the connections to our server in CLOSE-WAIT state with a lot of bytes in Recv-Q
$ ss -n4t | grep 5556
FIN-WAIT-2  0       0            127.0.0.1:59240        127.0.0.1:5556          
CLOSE-WAIT  787381  0            127.0.0.1:5556         127.0.0.1:59240

Context

We have an application that uses actix-web as http-api server. We've discovered that at some point it starts leaking the memory and connections. We've also found that another application is sending huge files to our app and keeps retrying as it gets 413 error. Future investigation showed that these things are connected and every 413 error leaves a connection in CLOSE-WAIT state. Later we managed to reproduce this problem with other errors (like 404 in the minimal example)

Your Environment

  • Rust Version: 1.70.0
  • Actix Web Version: 4.4.0
@robjtede robjtede added C-bug Category: bug A-http project: actix-http labels Nov 2, 2023
@robjtede robjtede added this to the actix-web v4.4 milestone Nov 2, 2023
@robjtede robjtede removed this from the actix-web v4.4 milestone Feb 4, 2024
@coolacid
Copy link

coolacid commented Aug 8, 2024

This appears to be specifically related to keep-alive connections.

The following doesn't cause this issue:

import requests
s = requests.Session()
s.headers['Connection'] = 'close'
s.post('http://127.0.0.1:5556/data', json={"data": "a"*1000000})

@gschulze
Copy link

I can confirm this behavior. The number of connections that hang in the CLOSE-WAIT state increases every time a too-large payload has been sent. I'd very much appreciate a fix, as we are having huge issues in production right now. I'd also be willing to contribute, but I'm currently having difficulties finding my way around the codebase. My guess is something has to change in actix-http/src/h1/dispatcher.rs.

@Remby
Copy link

Remby commented Aug 16, 2024

I am experiencing a similar issue. After deploying my website, it becomes inaccessible after running for some time. Upon investigation, I discovered a large number of connections stuck in the CLOSE-WAIT state. Could you please advise on how to resolve this problem? Thank you for your assistance.

@gschulze
Copy link

gschulze commented Aug 16, 2024

I did some systematic testing to narrow down the problem, by modifying the following parameters:

  • Server Keep-Alive: Indicates whether HTTP keep-alive is enabled server-side via HttpServer::keep_alive(...).
  • Client Keep-Alive: Indicates whether Connection: keep-alive or Connection: close is passed as header.
  • Payload: Indicates whether a normal or too large payload has been sent, that means, one that is above 256 KB.

On the client-side, I used the Python requests library with the following setup:

session = requests.Session()
session.headers["Connection"] = "close" | "keep-alive"
session.post(...)
Server Keep-Alive Client Keep-Alive Payload Observation
disabled disabled normal Server connection remains open in TIME_WAIT state for 30 seconds
disabled disabled too large No open connections after request has finished
disabled enabled normal Server connection remains open in TIME_WAIT state for 30 seconds
disabled enabled too large No open connections after request has finished
enabled disabled normal Server connection remains open in TIME_WAIT state for 30 seconds
enabled disabled too large No open connections after request has finished
enabled enabled normal Server connection remains open in TIME_WAIT state for 30 seconds
enabled enabled too large Server connection remains open in CLOSE_WAIT state for 1 minute, Client connection remains open in FIN_WAIT2 state for 1 minute

@gschulze gschulze linked a pull request Aug 16, 2024 that will close this issue
5 tasks
@anilaltuner
Copy link

Hey everyone,

Is this problem resolved? When I use async http libs on Python, I'm getting this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-http project: actix-http C-bug Category: bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants