Support timeouts for COPY commands #1109

seanlinsley · 2024-02-29T22:31:38Z

COPY doesn't respect statement_timeout, which for us has led to stalled connections on production. This can also be confirmed locally:

$ time psql -c "set statement_timeout = '1s'; select pg_sleep(2);"
SET
ERROR:  canceling statement due to statement timeout

real	0m1.028s
user	0m0.020s
sys	0m0.005s

$ time psql -c "set statement_timeout = '1s'; copy foo from stdin" < /dev/zero
SET
^CERROR:  COPY from stdin failed: canceled by user
CONTEXT:  COPY foo, line 1

real	0m4.816s
user	0m1.781s
sys	0m3.031s

Another project has noticed this issue and worked around it: npgsql/npgsql#1328 npgsql/npgsql@9fda64a

Here's the relevant configuration from our application (note we're also using deadpool for connection pooling):

let mut pg_config = tokio_postgres::Config::from_str(url).unwrap();
pg_config.options("-c timezone=UTC -c statement_timeout=120s");
pg_config.connect_timeout(Duration::from_secs(30));
pg_config.keepalives_idle(Duration::from_secs(5 * 60));

I see there is tcp_user_timeout which could possibly help, but it's apparently not straightforward to use because of surprising behavior: https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die

The text was updated successfully, but these errors were encountered:

sfackler · 2024-02-29T23:48:21Z

You can just use tokio's own timeout functionality: https://docs.rs/tokio/latest/tokio/time/fn.timeout.html

seanlinsley · 2024-02-29T23:55:44Z

In this case the COPY command survived across an application deploy (where the server is shut down and completely replaced), so a tokio timeout wouldn't have helped.

sfackler · 2024-02-29T23:59:24Z

Can you explain to me how the linked npgsql change would have helped there? If the client process is no longer running, there's nothing it can do to affect the state of the database.

seanlinsley · 2024-03-01T01:12:05Z

I don't understand the npgsql change myself (and the PR doesn't go into detail), I just linked to it because it's the only place I've found where others have discussed this issue.

Upstream Postgres should definitely document this, and if possible enforce statement_timeout on COPY commands. If there isn't a TCP setting (or other thing) that can help, we will likely revert back to using INSERT to avoid this issue.

sfackler · 2024-03-01T01:56:06Z

To be very clear, the npgsql change will not solve your problem. If the server is not detecting that its client has vanished, you need to change the behavior of the server, not the behavior of the client.

Is the server blocked indefinitely trying to read more data from the client? If that's the case, you should probably enable TCP keepalive probes on the server side (and look into why your server shutdown process isn't cleanly terminating connections). If the server's blocked something else internally, then you'd need to ask the Postgres maintainers to do something about that.

seanlinsley · 2024-10-23T13:24:44Z

FWIW, we haven't seen stalled COPY commands since moving expensive operations that may fail out of the loop that generates data for the COPY, instead doing that work before or after the COPY is completed. So it could be that our issue was that we never called writer.finish() if an error occurred.

I wonder if the API could change so finish is called on Drop? Otherwise the documentation could more explicit on what will happen if you don't call finish, strongly recommending to not do work that may fail while the connection is open.

sfackler · 2024-10-23T14:29:12Z

Like with transactions, we want to require clients to explicitly opt-in to committing the write since otherwise you'll get broken truncated inserts if the logic writing copy data exits early due to e.g. an error. When that happens, the client will abort the copy and it should be as if it had never happened: https://github.com/sfackler/rust-postgres/blob/master/tokio-postgres/tests/test/main.rs#L646-L671

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support timeouts for COPY commands #1109

Support timeouts for COPY commands #1109

seanlinsley commented Feb 29, 2024

sfackler commented Feb 29, 2024

seanlinsley commented Feb 29, 2024

sfackler commented Feb 29, 2024

seanlinsley commented Mar 1, 2024

sfackler commented Mar 1, 2024

seanlinsley commented Oct 23, 2024 •

edited

Loading

sfackler commented Oct 23, 2024

Support timeouts for COPY commands #1109

Support timeouts for COPY commands #1109

Comments

seanlinsley commented Feb 29, 2024

sfackler commented Feb 29, 2024

seanlinsley commented Feb 29, 2024

sfackler commented Feb 29, 2024

seanlinsley commented Mar 1, 2024

sfackler commented Mar 1, 2024

seanlinsley commented Oct 23, 2024 • edited Loading

sfackler commented Oct 23, 2024

seanlinsley commented Oct 23, 2024 •

edited

Loading