Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mark net.Conn failed writes as recoverable when 0 bytes were written. #1013

Merged
merged 1 commit into from
Nov 23, 2020

Conversation

dhermes
Copy link
Contributor

@dhermes dhermes commented Nov 19, 2020

Fixes #723, #870, #897.

if err != nil {
if n == 0 {
err = &safeRetryError{Err: err}
}
Copy link
Contributor Author

@dhermes dhermes Nov 19, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation here is pgx:

The underlying connection just invokes Write() on a net.Conn and a "broken pipe" comes backs (same as here). But then a safeToRetry flag is set if exactly n == 0 bytes went through the write (https://github.com/jackc/pgconn/blob/cba610c245265ff50ea3c56a9961da218ed7d730/pgconn.go#L923).

Then in the high-level code used by the driver, if an error is detected but has safeToRetry set, a driver.ErrBadConn will be surfaced so the database/sql package can handle evicting the connection and replacing it in the pool (https://github.com/jackc/pgx/blob/1df45d758d4f57b45d56fbce1e5bb8cf9aabaaf0/stdlib/sql.go#L336).

@maddyblue maddyblue merged commit fbd2a9a into lib:master Nov 23, 2020
@maddyblue
Copy link
Collaborator

Merging this because it looks pretty good? Unsure if it'll break anything.

@dhermes dhermes deleted the handle-broken-pipe branch November 23, 2020 16:50
@dhermes
Copy link
Contributor Author

dhermes commented Nov 23, 2020

Thanks @mjibson. Turns out the Fixes #N1, #N2, #N3 syntax I used only closed #N1. (I guess I needed Fixes #N1. Fixes #N2. Fixes #N3.)

Mind manually closing out #870 and #897?

@maddyblue
Copy link
Collaborator

Done.

fho added a commit to simplesurance/migrate that referenced this pull request Mar 15, 2021
This reverts commit ac3fb2a.

Switch back to the upstream lib/pq version.
The issue that was the cause for switching to simplesurance/pq was fixed
in lib/pq 1.8 (lib/pq#1013).
roylee17 added a commit to roylee17/sqlx that referenced this pull request Mar 21, 2021
    I'm seeing "broken pipe" errors when working with CRDB using sqlx.
    The issue seemed to be the tcp connections were diconnected while the conns
    in db driver (pq) still has stale connection.

    It happens more often when the DB is behind a proxy.
    In our cases, the pods were proxied by the envoy sidecar.

    There were other instances on the community reporting similar issues,
    and took different workaround by sebding perodic dummy queries in app
    mimicing keepalive, enlenghthen proxy idle timeout, or shortening the
    lifetime of db conn.

    This has been reported and fixed by the lib/pq upstream in v1.9+

      lib/pq#1013

      lib/pq#723
      lib/pq#897
      lib/pq#870

      grafana/grafana#29957
yangru added a commit to greenplum-db/pq that referenced this pull request Aug 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

goroutines and tcp errors
2 participants