Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jepsen-monotonic failure when using RootTxn on the gateway #41424

Closed
andreimatei opened this issue Oct 8, 2019 · 7 comments · Fixed by #41438
Closed

jepsen-monotonic failure when using RootTxn on the gateway #41424

andreimatei opened this issue Oct 8, 2019 · 7 comments · Fixed by #41438
Assignees
Labels
A-kv-client Relating to the KV client and the KV interface. S-1 High impact: many users impacted, serious risk of high unavailability or data loss

Comments

@andreimatei
Copy link
Contributor

#41307 changed how DistSQL uses transactions. It's first commit 29e18f0 makes a seemingly innocuous change about using RootTxns on the gateway more often than before. This change causes the monotonic jepsen test to fail. It fails about 50% with no nemesis, and always with some of the nemeses (e.g. majority-ring).

The test essentially uses two tables and runs transactions that read the maximum from both tables, increment by one, and write to one of the tables that maximum and the cluster_logical_timestamp(). It then verifies that both the maximum values and the timestamps increase monotonically. But lo and behold, they're no longer monotonic.

I have a theory about the cause: I believe it's because of this check intended to prevent refreshes when cluster_logical_timestamp() (or other things that lock a txn's timestamp) has been used

if txn == nil || txn.OrigTimestampWasObserved {

I believe the check is looking at the wrong txn proto. It's looking at a proto coming from a pErr, but it's unlikely for that one to have the OrigTimestampWasObserved bit set. That bit is probably only set on the client, not the server.
If this theory is correct, this means that we've always had this bug, but #41307 exposed it more broadly. It used to be the case that only queries forced to be planned entirely on the gateway (e.g. mutations) would use the RootTxn, others would use the LeafTxn. Leaves don't do refreshes, and so they're safe (and suboptimal when a refresh would help).
Besides regular queries, schema changes used to use the OrigTimestampWasObserved bit too, so I think those probably are at risk in 19.1 and below.

If this theory is correct, this use of the wrong txn object steams from the fact that we put transactions in pErrs, for maximal confusion. It's always been a pet peeve of mine:
#18919 (comment)

@andreimatei andreimatei added A-kv-client Relating to the KV client and the KV interface. S-1 High impact: many users impacted, serious risk of high unavailability or data loss labels Oct 8, 2019
@andreimatei andreimatei self-assigned this Oct 8, 2019
@knz
Copy link
Contributor

knz commented Oct 8, 2019

I confirm this bug is present in master (post revert of Andrei's recent fixes),
also in 19.1!

also I have written a simplified version of the jepsen test in Go suitable for make stress:

monotonic_test.zip (place in pkg/sql then run make stress PKG=./pkg/sql TESTS=TestMonotonic)

Now also testing 2.1 for good measure.

@knz
Copy link
Contributor

knz commented Oct 8, 2019

Confirming bug present in 2.1 too.

@bdarnell
Copy link
Contributor

bdarnell commented Oct 8, 2019

What do you mean by the bug being present in older releases? Just that we're looking at the wrong Transaction object, or is the monotonic test failing there too? If the latter, do you understand why this test is failing while the jepsen test has never (to my knowledge) hit this before on older versions?

@knz
Copy link
Contributor

knz commented Oct 8, 2019

What do you mean by the bug being present in older releases? Just that we're looking at the wrong Transaction object, or is the monotonic test failing there too?

Both

If the latter, do you understand why this test is failing while the jepsen test has never (to my knowledge) hit this before on older versions?

Yes I understand it.

The Jepsen test separates the phases in a multi-statement txn, using standalone SELECTs first to find the max and cluster timestamp, and then a separate INSERT ... VALUES for the mutation. These standalone SELECTs are using LeafTxn objects since at least 2.1 (ever since we use distsql to serve them, which may have been 2.0), so were never able to auto-refresh (instead any conflict would have caused a client-side retry)

In order to tickle the bug, you need a read that encounters a refresh error on a RootTxn object with a mutation that also gets a call to cluster_logical_timestamp(). To get this, you need a SQL that combines INSERT with cluster_logical_timestamp() (and possibly some SELECT too, I'm not sure), which is what my simplified Go code does:

INSERT INTO t.t%d(v, ts) SELECT max(x)+1, if(pg_sleep(0.005*random()), cluster_logical_timestamp(), 0)
  FROM (VALUES
   ((SELECT max(v) FROM t.t1)),
   ((SELECT max(v) FROM t.t2))
) AS x(x)

This triggers the bug because in 2.1 19.1 and current master, a mutation is planned locally and is not distributed and so everything uses the RootTxn.

I'd say client apps that only use INSERT ... VALUES are unaffected.

However an UPDATE combined with cluster_logical_timestamp() is likely to be affected since 2.1 too.

@knz knz changed the title jepsen-monotonic failure when using RootTxn on the gateway jepsen-monotonic failure when using RootTxn + cluster_logical_timestamp on the gateway Oct 8, 2019
@knz knz changed the title jepsen-monotonic failure when using RootTxn + cluster_logical_timestamp on the gateway jepsen-monotonic failure when using RootTxn on the gateway Oct 8, 2019
@knz
Copy link
Contributor

knz commented Oct 8, 2019

Andrei tells me that it's not only queries that use cluster_logical_timestamp() that are affected; anything using the txn.OrigTimestamp() call probably had that problem too. That includes schema changes up to and including 19.1.

@andreimatei
Copy link
Contributor Author

The theory about us looking at pErr.Txn being wrong didn't quite pan out. On the response path, the DistSender updates pErr.Txn with the response's txn, which comes from the request's txn. I find all this very unlikely to be correct and I'll try to clean it up, but it's not our smoking gun. Fixing it does not make Jepsen happy.

However, I think the real problem is that cluster_logical_timestamp() is more clearly broken. It calls TxnCoordSender.CommitTimestmp(), which simply returns txn.OrigTimestamp. But, in case the txn already refreshed, the OrigTimestamp is not going to be the CommitTimestamp. The commit timestamp is:

commitTS := txn.OrigTimestamp
commitTS.Forward(txn.RefreshedTimestamp)

andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 8, 2019
Before this patch, the txn.CommitTimestamp() function (which powers
the cluster_logical_timestamp(), among others) was failing to take into
consideration possible refreshes that might have happened since the
current epoch began (i.e. txn.RefreshedTimestamp).

Fixes cockroachdb#41424

Release note (bug fix): Fix a bug causing the
cluster_logical_timestamp() function to sometimes return incorrect
results.
@bdarnell
Copy link
Contributor

bdarnell commented Oct 8, 2019

in case the txn already refreshed

Ah, that makes sense. Originally refreshes were only possible at the end of the transaction, but early refreshes/pushes are now possible.

andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 8, 2019
Before this patch, the txn.CommitTimestamp() function (which powers
the cluster_logical_timestamp(), among others) was failing to take into
consideration possible refreshes that might have happened since the
current epoch began (i.e. txn.RefreshedTimestamp).

Fixes cockroachdb#41424

Release note (bug fix): Fix a bug causing the
cluster_logical_timestamp() function to sometimes return incorrect
results.
craig bot pushed a commit that referenced this issue Oct 8, 2019
41323: sql: bevy of fixes for ActiveRecord compatibility r=jordanlewis a=jordanlewis

- improve efficiency of `'foo'::REGCLASS`
- improve efficiency of `pg_get_coldesc`
- parse INTERVAL(6) as a no-op meaning INTERVAL
- bugfix to generate_subscripts for oidvector and int2vector
- add pg_available_extensions
- fix `pg_get_indexdef` and `pg_attrdef.adbin` to be more Postgres compatible

With these fixes, 90% of the fork in `activerecord-cockroachdb-adapter` can go away.

41438: kv: fix cluster_logical_timestamp() r=andreimatei a=andreimatei

Before this patch, the txn.CommitTimestamp() function (which powers
the cluster_logical_timestamp(), among others) was failing to take into
consideration possible refreshes that might have happened since the
current epoch began (i.e. txn.RefreshedTimestamp).

Fixes #41424

Release note (bug fix): Fix a bug causing the
cluster_logical_timestamp() function to sometimes return incorrect
results.

Co-authored-by: Jordan Lewis <jordanthelewis@gmail.com>
Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 8, 2019
Before this patch, the txn.CommitTimestamp() function (which powers
the cluster_logical_timestamp(), among others) was failing to take into
consideration possible refreshes that might have happened since the
current epoch began (i.e. txn.RefreshedTimestamp).

Fixes cockroachdb#41424

Release note (bug fix): Fix a bug causing the
cluster_logical_timestamp() function to sometimes return incorrect
results.
@craig craig bot closed this as completed in b491dde Oct 8, 2019
andreimatei added a commit to andreimatei/cockroach that referenced this issue Oct 8, 2019
Before this patch, the txn.CommitTimestamp() function (which powers
the cluster_logical_timestamp(), among others) was failing to take into
consideration possible refreshes that might have happened since the
current epoch began (i.e. txn.RefreshedTimestamp).

Fixes cockroachdb#41424

Release note (bug fix): Fix a bug causing the
cluster_logical_timestamp() function to sometimes return incorrect
results.
craig bot pushed a commit that referenced this issue Oct 8, 2019
41439: release-19.2: kv: fix cluster_logical_timestamp() r=andreimatei a=andreimatei

Backport 1/1 commits from #41438.

/cc @cockroachdb/release

---

Before this patch, the txn.CommitTimestamp() function (which powers
the cluster_logical_timestamp(), among others) was failing to take into
consideration possible refreshes that might have happened since the
current epoch began (i.e. txn.RefreshedTimestamp).

Fixes #41424

Release note (bug fix): Fix a bug causing the
cluster_logical_timestamp() function to sometimes return incorrect
results.


Co-authored-by: Andrei Matei <andrei@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-kv-client Relating to the KV client and the KV interface. S-1 High impact: many users impacted, serious risk of high unavailability or data loss
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants