-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: make SQL statements operate on a read snapshot #42862
Conversation
The changes (and that of the parent commit) are sufficiently lightweight that we may consider back-porting this to 19.2 and perhaps 19.1. |
51efb5d
to
451ecd7
Compare
088ac43
to
da262f6
Compare
cc @RaduBerinde FYI see the ugly workaround (= code to disable the fix) around FK existence checks and cascading actions. |
da262f6
to
2d9bd98
Compare
Only remaining failing test |
22e7690
to
10662ad
Compare
Rebased on top of the latest #42854 implementation, which fixes Also, updated |
@RaduBerinde ^^ |
10662ad
to
f825da2
Compare
0988369
to
35c3103
Compare
@andreimatei this is ready I think (assuming you're also ok with #42854). @RaduBerinde please check I did do the post-query logic properly. |
d4d036c
to
84f0349
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/sql/alter_table.go, line 104 at r3 (raw file):
// want to see their previous writes. Disable step-wise execution // for that phase. if err := params.p.Txn().DisableStepping(); err != nil {
there's a few too many of these DisableStepping()
, which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out. Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this. Have you considered alternatives?
pkg/sql/conn_executor_exec.go, line 363 at r3 (raw file):
// For regular statements (the ones that get to this point), we // don't return any event unless an an error happens.
/an an/an if you're taking the blame anyway
pkg/sql/conn_executor_exec.go, line 462 at r3 (raw file):
// Although this is not required for further SQL statements (these // will define their own sequence point above), there may be // additional non-SQL KV activity beyond this point which may want
I'm curious - like what activity?
pkg/sql/conn_executor_test.go, line 18 at r3 (raw file):
"database/sql/driver" "fmt" _ "net/http/pprof"
did you want to leave this in?
pkg/sql/conn_executor_test.go, line 379 at r3 (raw file):
params.Insecure = true s, db, _ := serverutils.StartServer(t, params) defer s.Stopper().Stop(context.TODO())
nit: context.Background() has won for tests
pkg/sql/create_sequence.go, line 120 at r3 (raw file):
} // The event logging wants to operate in step-wise execution. Mark a
please clarify "wants to operate in step-wise execution" a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/sql/alter_table.go, line 104 at r3 (raw file):
there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out.
That's backward thinking. The SQL standard is pretty clear about the fact that all statements operate on a snapshot taken when the statement starts. It's our implementation of DDL that's backward and mis-architected to require see-your-own-writes in KV (it should operate on RAM data structures until a final KV put. But that's out of scope here.)
Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this.
Yes that seems reasonable. Especially because I just found a good reason why there is a problem in the current approach...
Working on it, will ping you when I have a solution.
pkg/sql/conn_executor_exec.go, line 363 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
/an an/an if you're taking the blame anyway
Done.
pkg/sql/conn_executor_exec.go, line 462 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I'm curious - like what activity?
Added example in comment.
pkg/sql/conn_executor_test.go, line 18 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
did you want to leave this in?
No sorry about that
pkg/sql/conn_executor_test.go, line 379 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: context.Background() has won for tests
Done.
84f0349
to
3d10880
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFAL:
I found a cleaner approach than the spurious DisableStepping() calls. Instead, each planNode can implement an interface to declare that it wants to deviate from the standard SQL semantics. The conditional is handled cleanly in just one place.
Also there's now a NewTxnWithSteppingEnabled and the conn executor uses that.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/sql/alter_table.go, line 104 at r3 (raw file):
Previously, knz (kena) wrote…
there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out.
That's backward thinking. The SQL standard is pretty clear about the fact that all statements operate on a snapshot taken when the statement starts. It's our implementation of DDL that's backward and mis-architected to require see-your-own-writes in KV (it should operate on RAM data structures until a final KV put. But that's out of scope here.)
Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this.
Yes that seems reasonable. Especially because I just found a good reason why there is a problem in the current approach...
Working on it, will ping you when I have a solution.
Done.
3d10880
to
fda15ca
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 36 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/internal/client/sender.go, line 255 at r4 (raw file):
// does not need it. // // Calling ConfigureStepping(true) when the stepping mode is
s/true/SteppingEnabled
pkg/internal/client/txn.go, line 1209 at r4 (raw file):
// transaction. // // This function is guaranteed to not return an error if it previously
I find this error comment quite confusing, and also it doesn't seem to line up with a comment at the point in this commit where one of these errors is ignored.
We seem to only return errors from assertions on leaves. I'd much rather fatal and remove the error from the interface.
pkg/internal/client/txn.go, line 1210 at r4 (raw file):
// // This function is guaranteed to not return an error if it previously // succeeded once with some txn and mode, then provided its own return
s/with some txn and mode/with some mode
pkg/kv/txn_coord_sender.go, line 1089 at r4 (raw file):
} // DisableStepping is part of the TxnSender interface.
stale comment
pkg/kv/txn_coord_sender.go, line 1098 at r4 (raw file):
tc.mu.Lock() defer tc.mu.Unlock() prevEnabled := tc.interceptorAlloc.txnSeqNumAllocator.configureSteppingLocked(mode == client.SteppingEnabled)
long line
pkg/kv/txn_interceptor_seq_num_allocator.go, line 143 at r4 (raw file):
} // configureSteppingLocked configures the stepping mode.
pls comment that when disabling stepping, the readSeq is bumped
pkg/kv/txn_interceptor_seq_num_allocator.go, line 144 at r4 (raw file):
// configureSteppingLocked configures the stepping mode. // Used by the TxnCoordSender's ConfigureStepping() method.
nit: since you're taking the blame, I think the 2nd line of the comment doesn't add anything
pkg/kv/txn_interceptor_seq_num_allocator.go, line 145 at r4 (raw file):
// configureSteppingLocked configures the stepping mode. // Used by the TxnCoordSender's ConfigureStepping() method. func (s *txnSeqNumAllocator) configureSteppingLocked(enabled bool) (prevEnabled bool) {
nit: I'd carry over the enum from all the higher layers. Not for the s.steppingModeEnabled
field, though; that one should stay a bool.
pkg/sql/create_table.go, line 1384 at r5 (raw file):
} }
nit: spurious diff
pkg/sql/plan.go, line 135 at r5 (raw file):
} // planNodeReadingOwnWrites can be implemented by planNodes which do
s/can be/is
Unless you're trying to highlight that there are alternatives to achieving what this achieves?
pkg/sql/plan.go, line 141 at r5 (raw file):
// descriptors, expecting to read their own writes. // // This constraint is obeyed by (*planner).startExec().
please clarify this saying that it's only startExec
that runs with the stepping disabled, not the plan node more broadly.
pkg/sql/plan.go, line 143 at r5 (raw file):
// This constraint is obeyed by (*planner).startExec(). type planNodeReadingOwnWrites interface { // ReadingOwnWrites can be implemented as no-op by nodes wishing
nit: just say ReadingOwnWrites is a marker interface.
pkg/sql/row/kv_batch_fetcher.go, line 349 at r5 (raw file):
} // TestingSetKVBatchSize changes the kvBatchFetcher batch size, and returns a function that restores it.
nit: if you don't want to curse at you every time I rediscover this function and rage-blame, put it where it was so that I curse at someone else.
fda15ca
to
2f26c54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)
pkg/internal/client/sender.go, line 255 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/true/SteppingEnabled
Done.
pkg/internal/client/txn.go, line 1209 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I find this error comment quite confusing, and also it doesn't seem to line up with a comment at the point in this commit where one of these errors is ignored.
We seem to only return errors from assertions on leaves. I'd much rather fatal and remove the error from the interface.
Done.
pkg/internal/client/txn.go, line 1210 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/with some txn and mode/with some mode
Done.
pkg/kv/txn_coord_sender.go, line 1089 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
stale comment
Done.
pkg/kv/txn_coord_sender.go, line 1098 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
long line
Done.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 143 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pls comment that when disabling stepping, the readSeq is bumped
Done.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 145 at r4 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: I'd carry over the enum from all the higher layers. Not for the
s.steppingModeEnabled
field, though; that one should stay a bool.
Done.
pkg/sql/create_sequence.go, line 120 at r3 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please clarify "wants to operate in step-wise execution" a bit.
Done.
pkg/sql/plan.go, line 135 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
s/can be/is
Unless you're trying to highlight that there are alternatives to achieving what this achieves?
Done.
pkg/sql/plan.go, line 141 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
please clarify this saying that it's only
startExec
that runs with the stepping disabled, not the plan node more broadly.
Done.
pkg/sql/plan.go, line 143 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: just say
ReadingOwnWrites is a marker interface.
Done.
pkg/sql/row/kv_batch_fetcher.go, line 349 at r5 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: if you don't want to curse at you every time I rediscover this function and rage-blame, put it where it was so that I curse at someone else.
Done.
20d180d
to
0322fe1
Compare
Previously, all individual KV reads performed by a SQL statement were able to observe the most recent KV writes that it performed itself. This is in violation of PostgreSQL's dialect semantics, which mandate that statements can only observe data as per a read snapshot taken at the instant a statement begins execution. Moreover, this invalid behavior causes a real observable bug: a statement that reads and writes to the same table may never complete, as the read part may become able to consume the rows that it itself writes. Or worse, it could cause logical operations to be performed multiple times: https://en.wikipedia.org/wiki/Halloween_Problem This patch fixes it (partially) by exploiting the new KV `Step()` API which decouples the read and write sequence numbers. The fix is not complete however; additional sequence points must also be introduced prior to FK existence checks and cascading actions. See [cockroachdb#42864](cockroachdb#42864) and [cockroachdb#33475](cockroachdb#33475) for details. For now, this patch excludes any mutation that 1) involves a foreign key and 2) does not uyse the new CBO-driven FK logic, from the new (fixed) semantics. When a mutation involves a FK without CBO involvement, the previous (broken) semantics still apply. Release note (bug fix): SQL mutation statements that target tables with no foreign key relationships now correctly read data as per the state of the database when the statement started execution. This is required for compatibility with PostgreSQL and to ensure deterministic behavior when certain operations are parallelized. Prior to this fix, a statement [could incorrectly operate multiple times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that itself was writing, and potentially never terminate. This fix is limited to tables without FK relationships, and for certain operations on tables with FK relationships; in other cases, the fix is not active and the bug is still present. A full fix will be provided in a later release.
0322fe1
to
0a658c1
Compare
bors r=andreimatei |
42862: sql: make SQL statements operate on a read snapshot r=andreimatei a=knz Fixes #33473. Fixes #28842. Informs #41569 and #42864. Previously, all individual KV reads performed by a SQL statement were able to observe the most recent KV writes that it performed itself. This is in violation of PostgreSQL's dialect semantics, which mandate that statements can only observe data as per a read snapshot taken at the instant a statement begins execution. Moreover, this invalid behavior causes a real observable bug: a statement that reads and writes to the same table may never complete, as the read part may become able to consume the rows that it itself writes. Or worse, it could cause logical operations to be performed multiple times: https://en.wikipedia.org/wiki/Halloween_Problem This patch fixes it by exploiting the new KV `Step()` API which decouples the read and write sequence numbers. The fix is not complete however; additional sequence points must also be introduced prior to FK existence checks and cascading actions. See #42864 and #33475 for details. For now, this patch excludes any mutation that involves a foreign key from the new (fixed) semantics. When a mutation involves a FK, the previous (broken) semantics still apply. Release note (bug fix): SQL mutation statements that target tables with no foreign key relationships now correctly read data as per the state of the database when the statement started execution. This is required for compatibility with PostgreSQL and to ensure deterministic behavior when certain operations are parallelized. Prior to this fix, a statement [could incorrectly operate multiple times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that itself was writing, and potentially never terminate. This fix is limited to tables without FK relationships, and for certain operations on tables with FK relationships; in other cases, the fix is not active and the bug is still present. A full fix will be provided in a later release. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Build succeeded |
Too little too late, but I got a chance to look through the diff and it looks great to me! |
radu 💖 you answered all my questions |
These comments were added and accidentally left by cockroachdb#42862. Release note: None
Fixes #33473.
Fixes #28842.
Informs #41569 and #42864.
Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.
This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.
Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem
This patch fixes it by exploiting the new KV
Step()
API whichdecouples the read and write sequence numbers.
The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
#42864 and #33475 for details.
For now, this patch excludes any mutation that involves a foreign key
from the new (fixed) semantics. When a mutation involves a FK, the
previous (broken) semantics still apply.
Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement could incorrectly operate multiple
times on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.