sql: make SQL statements operate on a read snapshot #42862

knz · 2019-11-29T14:27:55Z

Fixes #33473.
Fixes #28842.
Informs #41569 and #42864.

Previously, all individual KV reads performed by a SQL statement were
able to observe the most recent KV writes that it performed itself.

This is in violation of PostgreSQL's dialect semantics, which mandate
that statements can only observe data as per a read snapshot taken at
the instant a statement begins execution.

Moreover, this invalid behavior causes a real observable bug: a
statement that reads and writes to the same table may never complete,
as the read part may become able to consume the rows that it itself
writes. Or worse, it could cause logical operations to be performed
multiple times: https://en.wikipedia.org/wiki/Halloween_Problem

This patch fixes it by exploiting the new KV Step() API which
decouples the read and write sequence numbers.

The fix is not complete however; additional sequence points must also
be introduced prior to FK existence checks and cascading actions. See
#42864 and #33475 for details.

For now, this patch excludes any mutation that involves a foreign key
from the new (fixed) semantics. When a mutation involves a FK, the
previous (broken) semantics still apply.

Release note (bug fix): SQL mutation statements that target tables
with no foreign key relationships now correctly read data as per the
state of the database when the statement started execution. This is
required for compatibility with PostgreSQL and to ensure deterministic
behavior when certain operations are parallelized. Prior to this fix,
a statement could incorrectly operate multiple
times on data that
itself was writing, and potentially never terminate. This fix is
limited to tables without FK relationships, and for certain operations
on tables with FK relationships; in other cases, the fix is not
active and the bug is still present. A full fix will be provided
in a later release.

cockroach-teamcity · 2019-11-29T14:28:03Z

This change is

knz · 2019-11-29T14:51:52Z

The changes (and that of the parent commit) are sufficiently lightweight that we may consider back-porting this to 19.2 and perhaps 19.1.

knz · 2019-11-29T18:11:31Z

cc @RaduBerinde FYI see the ugly workaround (= code to disable the fix) around FK existence checks and cascading actions.

knz · 2019-11-29T19:55:16Z

Only remaining failing test TestUpsertFastPath is pending resolution of this comment: #42854 (comment)

knz · 2019-12-02T22:04:20Z

Rebased on top of the latest #42854 implementation, which fixes TestUpsertFastPath

Also, updated (*DistSQLPlanner) PlanAndRunPostqueries() to add calls to Step as recommended by Radu. This enables the remaining logic tests to complete without error.

knz · 2019-12-02T22:05:03Z

@RaduBerinde ^^

knz · 2020-01-06T07:32:37Z

@andreimatei this is ready I think (assuming you're also ok with #42854).

@RaduBerinde please check I did do the post-query logic properly.

andreimatei

Reviewed 2 of 2 files at r2.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

pkg/sql/alter_table.go, line 104 at r3 (raw file):

	// want to see their previous writes. Disable step-wise execution
	// for that phase.
	if err := params.p.Txn().DisableStepping(); err != nil {

there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out. Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this. Have you considered alternatives?

pkg/sql/conn_executor_exec.go, line 363 at r3 (raw file):

	// For regular statements (the ones that get to this point), we
	// don't return any event unless an an error happens.

/an an/an if you're taking the blame anyway

pkg/sql/conn_executor_exec.go, line 462 at r3 (raw file):

	// Although this is not required for further SQL statements (these
	// will define their own sequence point above), there may be
	// additional non-SQL KV activity beyond this point which may want

I'm curious - like what activity?

pkg/sql/conn_executor_test.go, line 18 at r3 (raw file):

	"database/sql/driver"
	"fmt"
	_ "net/http/pprof"

did you want to leave this in?

pkg/sql/conn_executor_test.go, line 379 at r3 (raw file):

	params.Insecure = true
	s, db, _ := serverutils.StartServer(t, params)
	defer s.Stopper().Stop(context.TODO())

nit: context.Background() has won for tests

pkg/sql/create_sequence.go, line 120 at r3 (raw file):

	}

	// The event logging wants to operate in step-wise execution. Mark a

please clarify "wants to operate in step-wise execution" a bit.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

pkg/sql/alter_table.go, line 104 at r3 (raw file):

there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out.

That's backward thinking. The SQL standard is pretty clear about the fact that all statements operate on a snapshot taken when the statement starts. It's our implementation of DDL that's backward and mis-architected to require see-your-own-writes in KV (it should operate on RAM data structures until a final KV put. But that's out of scope here.)

Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this.

Yes that seems reasonable. Especially because I just found a good reason why there is a problem in the current approach...

Working on it, will ping you when I have a solution.

pkg/sql/conn_executor_exec.go, line 363 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

/an an/an if you're taking the blame anyway

Done.

pkg/sql/conn_executor_exec.go, line 462 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I'm curious - like what activity?

Added example in comment.

pkg/sql/conn_executor_test.go, line 18 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

did you want to leave this in?

No sorry about that

pkg/sql/conn_executor_test.go, line 379 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: context.Background() has won for tests

Done.

knz

RFAL:

I found a cleaner approach than the spurious DisableStepping() calls. Instead, each planNode can implement an interface to declare that it wants to deviate from the standard SQL semantics. The conditional is handled cleanly in just one place.

Also there's now a NewTxnWithSteppingEnabled and the conn executor uses that.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

pkg/sql/alter_table.go, line 104 at r3 (raw file):

Previously, knz (kena) wrote…

there's a few too many of these DisableStepping(), which makes me wonder if the stepping behavior for SQL statements should be opt-in instead of opt-out.

That's backward thinking. The SQL standard is pretty clear about the fact that all statements operate on a snapshot taken when the statement starts. It's our implementation of DDL that's backward and mis-architected to require see-your-own-writes in KV (it should operate on RAM data structures until a final KV put. But that's out of scope here.)

Or in any case declared in some uniform way by the different statements rather than sprinkled through the code like this.

Yes that seems reasonable. Especially because I just found a good reason why there is a problem in the current approach...

Working on it, will ping you when I have a solution.

Done.

andreimatei

Reviewed 1 of 36 files at r5.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei and @knz)

pkg/internal/client/sender.go, line 255 at r4 (raw file):

	// does not need it.
	//
	// Calling ConfigureStepping(true) when the stepping mode is

s/true/SteppingEnabled

pkg/internal/client/txn.go, line 1209 at r4 (raw file):

// transaction.
//
// This function is guaranteed to not return an error if it previously

I find this error comment quite confusing, and also it doesn't seem to line up with a comment at the point in this commit where one of these errors is ignored.
We seem to only return errors from assertions on leaves. I'd much rather fatal and remove the error from the interface.

pkg/internal/client/txn.go, line 1210 at r4 (raw file):

//
// This function is guaranteed to not return an error if it previously
// succeeded once with some txn and mode, then provided its own return

s/with some txn and mode/with some mode

pkg/kv/txn_coord_sender.go, line 1089 at r4 (raw file):

}

// DisableStepping is part of the TxnSender interface.

stale comment

pkg/kv/txn_coord_sender.go, line 1098 at r4 (raw file):

	tc.mu.Lock()
	defer tc.mu.Unlock()
	prevEnabled := tc.interceptorAlloc.txnSeqNumAllocator.configureSteppingLocked(mode == client.SteppingEnabled)

long line

pkg/kv/txn_interceptor_seq_num_allocator.go, line 143 at r4 (raw file):

}

// configureSteppingLocked configures the stepping mode.

pls comment that when disabling stepping, the readSeq is bumped

pkg/kv/txn_interceptor_seq_num_allocator.go, line 144 at r4 (raw file):

// configureSteppingLocked configures the stepping mode.
// Used by the TxnCoordSender's ConfigureStepping() method.

nit: since you're taking the blame, I think the 2nd line of the comment doesn't add anything

pkg/kv/txn_interceptor_seq_num_allocator.go, line 145 at r4 (raw file):

// configureSteppingLocked configures the stepping mode.
// Used by the TxnCoordSender's ConfigureStepping() method.
func (s *txnSeqNumAllocator) configureSteppingLocked(enabled bool) (prevEnabled bool) {

nit: I'd carry over the enum from all the higher layers. Not for the s.steppingModeEnabled field, though; that one should stay a bool.

pkg/sql/create_table.go, line 1384 at r5 (raw file):

		}
	}

nit: spurious diff

pkg/sql/plan.go, line 135 at r5 (raw file):

}

// planNodeReadingOwnWrites can be implemented by planNodes which do

s/can be/is
Unless you're trying to highlight that there are alternatives to achieving what this achieves?

pkg/sql/plan.go, line 141 at r5 (raw file):

// descriptors, expecting to read their own writes.
//
// This constraint is obeyed by (*planner).startExec().

please clarify this saying that it's only startExec that runs with the stepping disabled, not the plan node more broadly.

pkg/sql/plan.go, line 143 at r5 (raw file):

// This constraint is obeyed by (*planner).startExec().
type planNodeReadingOwnWrites interface {
	// ReadingOwnWrites can be implemented as no-op by nodes wishing

nit: just say ReadingOwnWrites is a marker interface.

pkg/sql/row/kv_batch_fetcher.go, line 349 at r5 (raw file):

}

// TestingSetKVBatchSize changes the kvBatchFetcher batch size, and returns a function that restores it.

nit: if you don't want to curse at you every time I rediscover this function and rage-blame, put it where it was so that I curse at someone else.

knz

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @andreimatei)

pkg/internal/client/sender.go, line 255 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/true/SteppingEnabled

Done.

pkg/internal/client/txn.go, line 1209 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

I find this error comment quite confusing, and also it doesn't seem to line up with a comment at the point in this commit where one of these errors is ignored.
We seem to only return errors from assertions on leaves. I'd much rather fatal and remove the error from the interface.

Done.

pkg/internal/client/txn.go, line 1210 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/with some txn and mode/with some mode

Done.

pkg/kv/txn_coord_sender.go, line 1089 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

stale comment

Done.

pkg/kv/txn_coord_sender.go, line 1098 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

long line

Done.

pkg/kv/txn_interceptor_seq_num_allocator.go, line 143 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

pls comment that when disabling stepping, the readSeq is bumped

Done.

pkg/kv/txn_interceptor_seq_num_allocator.go, line 145 at r4 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: I'd carry over the enum from all the higher layers. Not for the s.steppingModeEnabled field, though; that one should stay a bool.

Done.

pkg/sql/create_sequence.go, line 120 at r3 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please clarify "wants to operate in step-wise execution" a bit.

Done.

pkg/sql/plan.go, line 135 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

s/can be/is
Unless you're trying to highlight that there are alternatives to achieving what this achieves?

Done.

pkg/sql/plan.go, line 141 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

please clarify this saying that it's only startExec that runs with the stepping disabled, not the plan node more broadly.

Done.

pkg/sql/plan.go, line 143 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: just say ReadingOwnWrites is a marker interface.

Done.

pkg/sql/row/kv_batch_fetcher.go, line 349 at r5 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

nit: if you don't want to curse at you every time I rediscover this function and rage-blame, put it where it was so that I curse at someone else.

Done.

Previously, all individual KV reads performed by a SQL statement were able to observe the most recent KV writes that it performed itself. This is in violation of PostgreSQL's dialect semantics, which mandate that statements can only observe data as per a read snapshot taken at the instant a statement begins execution. Moreover, this invalid behavior causes a real observable bug: a statement that reads and writes to the same table may never complete, as the read part may become able to consume the rows that it itself writes. Or worse, it could cause logical operations to be performed multiple times: https://en.wikipedia.org/wiki/Halloween_Problem This patch fixes it (partially) by exploiting the new KV `Step()` API which decouples the read and write sequence numbers. The fix is not complete however; additional sequence points must also be introduced prior to FK existence checks and cascading actions. See [cockroachdb#42864](cockroachdb#42864) and [cockroachdb#33475](cockroachdb#33475) for details. For now, this patch excludes any mutation that 1) involves a foreign key and 2) does not uyse the new CBO-driven FK logic, from the new (fixed) semantics. When a mutation involves a FK without CBO involvement, the previous (broken) semantics still apply. Release note (bug fix): SQL mutation statements that target tables with no foreign key relationships now correctly read data as per the state of the database when the statement started execution. This is required for compatibility with PostgreSQL and to ensure deterministic behavior when certain operations are parallelized. Prior to this fix, a statement [could incorrectly operate multiple times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that itself was writing, and potentially never terminate. This fix is limited to tables without FK relationships, and for certain operations on tables with FK relationships; in other cases, the fix is not active and the bug is still present. A full fix will be provided in a later release.

knz · 2020-01-20T21:27:54Z

bors r=andreimatei

42862: sql: make SQL statements operate on a read snapshot r=andreimatei a=knz Fixes #33473. Fixes #28842. Informs #41569 and #42864. Previously, all individual KV reads performed by a SQL statement were able to observe the most recent KV writes that it performed itself. This is in violation of PostgreSQL's dialect semantics, which mandate that statements can only observe data as per a read snapshot taken at the instant a statement begins execution. Moreover, this invalid behavior causes a real observable bug: a statement that reads and writes to the same table may never complete, as the read part may become able to consume the rows that it itself writes. Or worse, it could cause logical operations to be performed multiple times: https://en.wikipedia.org/wiki/Halloween_Problem This patch fixes it by exploiting the new KV `Step()` API which decouples the read and write sequence numbers. The fix is not complete however; additional sequence points must also be introduced prior to FK existence checks and cascading actions. See #42864 and #33475 for details. For now, this patch excludes any mutation that involves a foreign key from the new (fixed) semantics. When a mutation involves a FK, the previous (broken) semantics still apply. Release note (bug fix): SQL mutation statements that target tables with no foreign key relationships now correctly read data as per the state of the database when the statement started execution. This is required for compatibility with PostgreSQL and to ensure deterministic behavior when certain operations are parallelized. Prior to this fix, a statement [could incorrectly operate multiple times](https://en.wikipedia.org/wiki/Halloween_Problem) on data that itself was writing, and potentially never terminate. This fix is limited to tables without FK relationships, and for certain operations on tables with FK relationships; in other cases, the fix is not active and the bug is still present. A full fix will be provided in a later release. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

craig · 2020-01-20T22:14:01Z

Build succeeded

GitHub CI (Cockroach)

RaduBerinde · 2020-01-21T18:03:18Z

Too little too late, but I got a chance to look through the diff and it looks great to me!

knz · 2020-01-21T18:06:17Z

radu 💖 you answered all my questions

These comments were added and accidentally left by cockroachdb#42862. Release note: None

44174: kv: delete disabled debug logging about txn stepping r=nvanbenschoten a=nvanbenschoten These comments were added and accidentally left by #42862. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

knz requested review from andreimatei and nvanbenschoten November 29, 2019 14:27

knz changed the title ~~sql: make SQL statements operate on a read snapshot~~ [DNM] sql: make SQL statements operate on a read snapshot Nov 29, 2019

knz force-pushed the 20191129-sql-step branch 4 times, most recently from 51efb5d to 451ecd7 Compare November 29, 2019 17:05

knz changed the title ~~[DNM] sql: make SQL statements operate on a read snapshot~~ sql: make SQL statements operate on a read snapshot Nov 29, 2019

knz force-pushed the 20191129-sql-step branch 2 times, most recently from 088ac43 to da262f6 Compare November 29, 2019 17:59

knz mentioned this pull request Nov 29, 2019

roachpb,kv: new TxnCoordSender API for step-wise execution #42854

Merged

knz force-pushed the 20191129-sql-step branch from da262f6 to 2d9bd98 Compare November 29, 2019 19:52

knz force-pushed the 20191129-sql-step branch 4 times, most recently from 22e7690 to 10662ad Compare December 2, 2019 22:03

knz force-pushed the 20191129-sql-step branch from 10662ad to f825da2 Compare December 2, 2019 23:36

knz requested a review from a team as a code owner December 2, 2019 23:36

knz force-pushed the 20191129-sql-step branch 4 times, most recently from 0988369 to 35c3103 Compare December 7, 2019 17:12

knz force-pushed the 20191129-sql-step branch from 35c3103 to cc29ba8 Compare January 6, 2020 07:31

knz removed the request for review from nvanbenschoten January 6, 2020 07:31

knz force-pushed the 20191129-sql-step branch 2 times, most recently from d4d036c to 84f0349 Compare January 6, 2020 08:53

andreimatei reviewed Jan 6, 2020

View reviewed changes

knz commented Jan 15, 2020

View reviewed changes

knz force-pushed the 20191129-sql-step branch from 84f0349 to 3d10880 Compare January 15, 2020 10:44

knz commented Jan 15, 2020

View reviewed changes

knz force-pushed the 20191129-sql-step branch from 3d10880 to fda15ca Compare January 15, 2020 11:39

andreimatei approved these changes Jan 17, 2020

View reviewed changes

knz force-pushed the 20191129-sql-step branch from fda15ca to 2f26c54 Compare January 20, 2020 14:10

knz commented Jan 20, 2020

View reviewed changes

knz force-pushed the 20191129-sql-step branch 4 times, most recently from 20d180d to 0322fe1 Compare January 20, 2020 17:34

knz force-pushed the 20191129-sql-step branch from 0322fe1 to 0a658c1 Compare January 20, 2020 21:22

knz mentioned this pull request Jan 20, 2020

sql: use the new KV sequence field to separate reads from writes in SQL mutations #33473

Closed

3 tasks

craig bot merged commit 0a658c1 into cockroachdb:master Jan 20, 2020

knz deleted the 20191129-sql-step branch January 20, 2020 22:15

nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Jan 21, 2020

kv: delete disabled debug logging about txn stepping

d2cbd97

These comments were added and accidentally left by cockroachdb#42862. Release note: None

nvanbenschoten mentioned this pull request Jan 21, 2020

kv: delete disabled debug logging about txn stepping #44174

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: make SQL statements operate on a read snapshot #42862

sql: make SQL statements operate on a read snapshot #42862

knz commented Nov 29, 2019 •

edited

Loading

cockroach-teamcity commented Nov 29, 2019

knz commented Nov 29, 2019

knz commented Nov 29, 2019

knz commented Nov 29, 2019

knz commented Dec 2, 2019 •

edited

Loading

knz commented Dec 2, 2019

knz commented Jan 6, 2020

andreimatei left a comment

knz left a comment

knz left a comment

andreimatei left a comment

knz left a comment

knz commented Jan 20, 2020

craig bot commented Jan 20, 2020

RaduBerinde commented Jan 21, 2020

knz commented Jan 21, 2020

sql: make SQL statements operate on a read snapshot #42862

sql: make SQL statements operate on a read snapshot #42862

Conversation

knz commented Nov 29, 2019 • edited Loading

cockroach-teamcity commented Nov 29, 2019

knz commented Nov 29, 2019

knz commented Nov 29, 2019

knz commented Nov 29, 2019

knz commented Dec 2, 2019 • edited Loading

knz commented Dec 2, 2019

knz commented Jan 6, 2020

andreimatei left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

andreimatei left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Jan 20, 2020

craig bot commented Jan 20, 2020

Build succeeded

RaduBerinde commented Jan 21, 2020

knz commented Jan 21, 2020

knz commented Nov 29, 2019 •

edited

Loading

knz commented Dec 2, 2019 •

edited

Loading