Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connection_loss thrown on protocol errors #686

Closed
alexolog opened this issue May 8, 2023 · 7 comments
Closed

connection_loss thrown on protocol errors #686

alexolog opened this issue May 8, 2023 · 7 comments

Comments

@alexolog
Copy link

alexolog commented May 8, 2023

As a follow-up to #287:

If other people get the same message, hopefully the error message will take them to a discussion such as this one.

I am getting the same message, and it took me to that discussion

When I generate the problem, I get SQL error code 08P01: protocol_violation.

And herein lies the problem: pqxx::broken_connection does not expose the error code. If it did, I could check it and decide what to do based on that.

Passing the wrong number of parameters to a statement would normally indicate a bug in the application, so hopefully this complication won't inconvenience too many people.

A reasonable assumption that a pqxx::broken_connection just indicates a connection loss, could (and has!) lead one to write code that catches the exception and attempts to reconnect and resubmit, resulting a loop that cannot be resolved.

At least you get a somewhat helpful error message.

Parsing an error message is a terrible advice. That's what error codes are for!

If you have any good ideas about detecting this exact problem without risking confusion with different errors, feel free to file a new one.

The simple solution is to attach an error code to the exception.
In fact, any exception generated from a condition that has an error code (as per https://www.postgresql.org/docs/current/errcodes-appendix.html), should expose it.

Looking at result.cxx I see that the problem is with all of class 08, as well as with codes 53300 and 42501.

There are several options to deal with it:

  1. Move connection_error under sql_error.
  2. Introduce an intermediary failure_with_code or sql_failure as a parent of both connection_error and sql_error, and move sqlstate() to it.
  3. Just add sqlstate() to connection_error.

I personally prefer #​1.

Thank you!

@alexolog
Copy link
Author

alexolog commented May 9, 2023

I looked at other exceptions that don't expose an error code:

pqxx::too_many_connections - a subclass of pqxx::broken_connection for error code 53300.
pqxx::insufficient_privilege - doesn't expose code 42501, but does expose other codes.
pqxx::blob_already_exists - couldn't find any code that throws it.
pqxx::in_doubt_error - not triggered by postgres, so no error code.
pqxx::variable_set_to_null - not triggered by postgres, so no error code.

@jtv
Copy link
Owner

jtv commented May 10, 2023

Okay, I'll look into exposing sqlstate on more exception classes. Thanks for digging into this. Haven't had time to respond, but I'll incorporate it one way or another.

jtv added a commit that referenced this issue May 13, 2023
See #686.  This is not a full implementatino of what is suggested there.
But it will enable a program to avoid the problem of catching a
`broken_connection` and retrying ad infniitum.
@jtv
Copy link
Owner

jtv commented May 13, 2023

Doing #687 as a first step in addressing the concrete hazard, by enabling a broken application to detect that a particular broken_connection is due to a bug in its own code and so not worth retrying. We'll want to add sqlstate to more exception classes, but that will take a bit more work.

(For the record, nobody recommended parsing error messages. I totally agree that parsing error messages would be a terrible suggestion, but that is just not a thing that happened.)

@alexolog
Copy link
Author

Does it break the connection in the sense that a reconnect is needed?

@jtv
Copy link
Owner

jtv commented May 16, 2023

Yes, a broken_connection signals that the connection is broken and you'll need a new one.

That's also one of the reasons by the way why I don't think it'd work to make broken_connection a subclass of sql_error. An sql_error normally means that your transaction is doomed, but you can still use your connection. A broken_connection means your entire connection is shot. (There are other reasons as well.)

I had a look into what it would take to add sqlstate to broken_connection, but there are complications. In most code paths where broken_connection can occur, there is no actual sqlstate variable that we can query — because that information only comes with a query result. In most cases we'd have to make our own decisions about which sqlstate string to use in which scenario. That would leave us with a baroque boundary of choices and responsibilities. Your application might make the same call twice and hit two connection errors with the same sqlstate code — but one might be coming from the database, and the other might originate inside libpqxx with an sqlstate code that libpqxx picked. In one case a libpq upgrade might change the code, in the other a libpqxx upgrade might change the code.

In most cases, libpqxx would choose an sqlstate code, but it can't necessarily promise that it'll be a good choice. For example, when unable to connect to the server, how would libpqxx know whether to use 08001 (SQL-client unable to establish SQL-connection) or 08004 (SQL-server rejected establishment of SQL-connection)? What if it used 08000 and maybe later versions made the code more specific in some scenarios? It seems quite arbitrary and subject to change, which still leaves you with a messy decision matrix for drawing useful conclusions from the sqlstate.

So for now I'm leaning in the direction of keeping the existing approach: if you've got a useful distinction between errors, I try to give them separate classes, and inheritance keeps the change compatible with older code. When that doesn't work, we discuss and look for a better solution. And from a practical perspective, I think "my app needs to know this distinction" is a better driver for that discussion than "which is the more appropriate sqlstate code."

@alexolog
Copy link
Author

It is a little surprising that a failure to prepare kills a connection.
Does one need to recreate a connection every on every broken_connection exception? Are there "best practice" recommendations (or better yet, sample code)?

Thanks!

@jtv
Copy link
Owner

jtv commented May 18, 2023

Yes, it's absolutely surprising! But not much I can do about it.

Yes, a broken_connection signals that the connection is broken and you'll need to either give up and report an error, or make a new connection.

Best practices? The usual stuff I suppose... Don't retry the same thing forever. Report exceptions somewhere unless you know very precisely what caused them and why it's irrelevant, so a human can see problem patterns happening. Don't over-complicate error handling, especially since error handling code is often hard to test thoroughly. Accept that anything can fail, including your own code, so don't try to fix every possible problem. And of course if you want to reconnect, you'll have to have your connection string again.

jtv added a commit that referenced this issue May 20, 2023
See #686.  This is not a full implementatino of what is suggested there.
But it will enable a program to avoid the problem of catching a
`broken_connection` and retrying ad infniitum.
jtv added a commit that referenced this issue May 20, 2023
jtv added a commit that referenced this issue May 21, 2023
@jtv jtv closed this as completed Jun 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants