-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pgwire: implement cancellation protocol #34520
Conversation
70379ac
to
bfc2746
Compare
// GetPGWireCancelInfo extracts the timestamp of the cluster wide ID as 2 | ||
// int32s, for use by the pgwire cancellation protocol. | ||
func (id ClusterWideID) GetPGWireCancelInfo() (int32, int32) { | ||
return int32(id.Hi >> 32), int32(0xFFFFFFFF & id.Hi) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original id is 128 bits. I'd like a comment here showing that using only 64 of the bits is secure. Presumably the 128 was chosen for a good reason and for some reason 64 wasn't good enough. So, why is it ok here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, the cluster-wide ID needs to be unique across the entire cluster. This ID only needs to be unique on a single node, so I figured you could just use the timestamp part of the cluster-wide ID. Maybe I'm wrong though, thoughts?
This commit wouldn't support canceling queries on node A by sending the message to node B, but judging by the lib/pq implementation that's not needed or expected anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it theoretically possible for two identical wall times to be used for two different connections on the same node? If so then one connection could incorrectly cancel another connection. I'm not sure exactly how the hlc timestamp is generated here, or if this is possible or not. Seems like it might be because of the logical part. Like if one node's clock gets too ahead or behind (I'm not sure which) of the others in the cluster then...it's hlc clock will like not progress its wall time but increment it's logical right? In that case two connections during that time period will have the same wall and could hit this. If this can't happen I'd like a comment explaining why.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm very science dog here, but after the absence of this PR annoyed me for the millionth time, I looked. So we need 64 bit but our IDs are 128bit, and that's making us uneasy, right?
We have to keep the NodeID for load-balanced cancellation to work. NodeIDs are int32 but they're always positive, so we have So we have 16+32 bits left. If we fill the 48 bits with true pseudo-randomness (i.e. don't pick part of the timestamp here), wouldn't that be good enough? We can even boost it up if we dedicate the first bit to deciding whether the NodeID is uvarint encoded or not. NodeIDs are almost certainly nowhere close to MaxInt32 which gives us extra bits in the expected case. But honestly, that seems over the top. 48 bits of randomness should be plenty and at the same time cheap enough to generate.
I haven't looked but I'm guessing that we like that our ClusterWideIDs contain relevant data (?), so my suggestion would be to just tack these random 48bit on locally (they're not shared with other nodes).
To handle a cancel() request, the receiving node first checks if the NodeID matches, in which case it cancels locally. Otherwise it proxies the cancel() over to the node that is supposed to handle it.
Would love for this to get finished. We write this code way too often:
query := SET statement_timeout='5s'; SELECT
or worse, forget, and things just hang if they didn't work out the way they ought to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NodeIDs are int32 but they're always positive, so we have 16+32 bits left.
Your math is off. The sign is just 1 bit. There's just 33 bits left, not 48.
The rest of the argument relative to randomness feels weird to me. Your argument stands trivially if we are guaranteed that all queries terminate nearly instantaneously. But what happens in case a query is long-running? The chance of conflict increases.
We can probably compute the probability based on a distribution of latencies and the frequency of new queries. Unfortunately I don't have the math at the top of my head. Do you remember how that works?
the receiving node first checks if the NodeID matches, in which case it cancels locally. Otherwise it proxies the cancel() over to the node that is supposed to handle it.
Yes that bit is correct (and necessary when load balancers are involved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can use it to implement our own thing in tests.
I just looked and CustomCancel
allows us to just call conn.Cancel()
which is exactly what I think we'd want to do throughout all CRDB-related uses of this.
I don't have a strong case for implementing the cancel protocol with this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the CustomCancel
config was removed in the latest version of pgx
, so we should avoid introducing a dependency on it. This is discussed in jackc/pgx#679, which is also a generally interesting discussion on PgBouncer's handling of the cancel protocol: jackc/pgx#679 (comment). @andreimatei might be interested in that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CustomCancel
is removed in the latest version because that version now closes the connection upon context cancellation. That's great! It's even better and we could use that to fix our test behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go isn't the only language that uses the cancel protocol though! Let's not make the mistake of blinding ourselves to other client behavior - even if Go's cancel works well for our purposes, other languages still use the cancel protocol.
See PGJDBC, the main Java driver: https://github.com/pgjdbc/pgjdbc/blob/14576f4bca3a2484fd4f81a0d8276ae5cab9a419/pgjdbc/src/main/java/org/postgresql/core/QueryExecutorBase.java#L164
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Jordan my comment above is driven by context: the reason why Tobias (and then myself) got interested in this recently is because it's annoying our own test suite -- which is focused on Go, lib/pq and hopefully soon pgx.
For the wider picture, unfortunately as established above we are still in a bind and there's no good solution in sight. This conclusion is further substantiated by the discussion on pgx's own repository ,see the PR linked by Nathan above.
As per my comment at #34520 (comment) as well as Matt's at #34520 (comment)
I personally wouldn't spend time on this further than needed to advance our own testing agenda. The design of postgres' cancel protocol is just very bad and trying to do something here will create a lot of complexity on our side (with unclear gain).
We could also investigate just generating a random 64bit number per connection that is per server instead of per cluster. The cancel protocol would only work for connection on that server. This could be guaranteed to be unique. I did this a few years ago in a branch, but it's gone. |
That wouldn't work for any environment with a load balancer, including cockroach cloud. |
PGWire connections now send back the PID/secret combo at the start of a connection that permits PGWire-based cancellation messages to do their job. This allows lib/pq context cancellations from a client to propagate all the way up into a running query in the CockroachDB server and cancel it gracefully. Release note (sql change): Postgres wire protocol cancellation messages are now respected by the server.
bfc2746
to
1204ef2
Compare
❌ The GitHub CI (Cockroach) build has failed on 1204ef28. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
Before we build this, is it beneficial to our users/customers? A custom Go-based driver is great for us, but maybe not too useful for anyone else. It seems like we should consider limiting our solutions to things that are usable by lots of our users. |
Matt
The reason why this came up recently is that in our own test suite, when there's a timeout in a test, the test uses context cancellation ( What we want (in tests) is for the driver to handle context cancellation by closing the connection. But barring that, we've been discussing implementing the cancel message instead. |
Context cancellation doesn't do anything since lib/pq uses the Postgres cancellation protocol which we don't implement. See discussion on: cockroachdb#34520 The TL;DR is that we ought to be using pgx instead, which would just close the underlying conn on cancellation which is just what we want here. Release note: None
47693: sql: make statusServer optional r=andreimatei,nvanbenschoten a=tbg SQL depends on the status server to list sessions and nodes. A pure SQL server (used for multi-tenancy) will not have access to these. Wrap the status server in a `(serverpb.StatusServer, bool)` wrapper that indicates when the status server is not present, and return the newly introduced `pgerr.UnsupportedWithMultiTenancy()` in such cases. Over time, we'll have to take stock of the functionality lost that way and we may have to restore some of it, however at the moment the focus is on discovering which functionality is affected, and turning their resolution into work items which can be handed to the SQL team. The intention is that looking at the callers of the above error constructor provides an overview to that end. Release note: None 47746: roachtest: add stmt_timeout to consistency checks r=knz a=tbg Context cancellation doesn't do anything since lib/pq uses the Postgres cancellation protocol which we don't implement. See discussion on: #34520 The TL;DR is that we ought to be using pgx instead, which would just close the underlying conn on cancellation which is just what we want here. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Closes cockroachdb#41522. This commit enables LISTEN/NOTIFY. Rangefeeds must be enabled. One problem is that the 32 bit session ID is back to bite us from cockroachdb#34520. We don't have a 32 bit identifier that's unique across all sessions of the cluster. Currently, we report the NodeID for the PID of the notifying process instead. Release note (sql change): implement the LISTEN/NOTIFY Postgres protocol; LISTEN, NOTIFY, UNLISTEN, and pg_notify.
Closing this in favor of #67501 Anyone who's still interested, please take a look! |
Superseded by #41335 actually |
Closes cockroachdb#41522. This commit enables LISTEN/NOTIFY. One problem is that the 32 bit session ID is back to bite us from cockroachdb#34520. We don't have a 32 bit identifier that's unique across all sessions of the cluster. Currently, we report the NodeID for the PID of the notifying process instead. Release note (sql change): implement the LISTEN/NOTIFY Postgres protocol; LISTEN, NOTIFY, UNLISTEN, and pg_notify.
Closes cockroachdb#41522. This commit enables LISTEN/NOTIFY. One problem is that the 32 bit session ID is back to bite us from cockroachdb#34520. We don't have a 32 bit identifier that's unique across all sessions of the cluster. Currently, we report the NodeID for the PID of the notifying process instead. Release note (sql change): implement the LISTEN/NOTIFY Postgres protocol; LISTEN, NOTIFY, UNLISTEN, and pg_notify.
PGWire connections now send back the PID/secret combo at the start of a
connection that permits PGWire-based cancellation messages to do their
job. This allows lib/pq context cancellations from a client to propagate
all the way up into a running query in the CockroachDB server and cancel
it gracefully.
closes #41335
Release note (sql change): Postgres wire protocol cancellation messages
are now respected by the server.