-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ability to crash crdb on DDL using cluster_logical_timestamp() #98269
Comments
The bigger deal is the uncaught panic which crashes the server. Related to #58164. The backfill code itself could catch some panics, but it's orchestrated through distsql, and it seems better to always catch the panics there. |
cockroach/pkg/sql/sem/eval/context.go Lines 499 to 502 in d2594fc
This code can panic in two places! |
@Xiang-Gu can you have a look at fixing the proximate problem (nil check in the above code) and then have the caller check for a nil value? |
After the panic, I'm unable to perform DDL on the table. For instance:
This just hangs. I suspect there are a few things going on, in addition to the panic. |
What's actually going on is that the schema change that crashed is still in the running state, but we have exponential backoff to deal with failures like this so it doesn't just keep killing nodes. You should be able to cancel the original job. |
I think pausing the job before canceling it will help clear the backoff. |
…backfilling Previously, `ADD COLUMN ... DEFAULT cluster_logical_timestamp()` would crash the node and leave the table in a corrupt state. The root cause is a nil pointer dereference. This commit fixed it by returning an unimplemented error and hence disallow using this builtin function as default value when backfilling. Release note (bug fix): fixed a bug as detailed in cockroachdb#98269.
…backfilling Previously, `ADD COLUMN ... DEFAULT cluster_logical_timestamp()` would crash the node and leave the table in a corrupt state. The root cause is a nil pointer dereference. This commit fixed it by returning an unimplemented error and hence disallow using this builtin function as default value when backfilling. Release note (bug fix): fixed a bug as detailed in cockroachdb#98269.
…backfilling Previously, `ADD COLUMN ... DEFAULT cluster_logical_timestamp()` would crash the node and leave the table in a corrupt state. The root cause is a nil pointer dereference. This commit fixed it by returning an unimplemented error and hence disallow using this builtin function as default value when backfilling. Release note (bug fix): fixed a bug as detailed in cockroachdb#98269. Release justification: Fixed a bug that can crash node.
…backfilling Previously, `ADD COLUMN ... DEFAULT cluster_logical_timestamp()` would crash the node and leave the table in a corrupt state. The root cause is a nil pointer dereference. This commit fixed it by returning an unimplemented error and hence disallow using this builtin function as default value when backfilling. Release note (bug fix): fixed a bug as detailed in cockroachdb#98269.
Describe the problem
Possible to crash crdb using
cluster_logical_timestamp()
as a DEFAULT expression.To Reproduce
After the node comes back up, the schema appears to be in an inconsistent state.
Environment:
CockroachDB CCL v22.2.5 (aarch64-apple-darwin21.2, built 2023/02/16 16:37:38, go1.19.4)
Darwin 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:38:37 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6000 arm64
Jira issue: CRDB-25159
The text was updated successfully, but these errors were encountered: