Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql/schemachanger: rules are missing for column families leading to broken rollbacks #99796

Closed
fqazi opened this issue Mar 28, 2023 · 1 comment
Assignees
Labels
branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)

Comments

@fqazi
Copy link
Collaborator

fqazi commented Mar 28, 2023

Currently, on master we are missing rules and logic for rolling back column families, this can lead to rollbacks failing. On 22.2 in one scenario, we observed data corruption because the column family was never properly cleaned up. Presently on master, the situation is a bit better in some ways, since we at least panic about the schema change, but still problematic, we need to clean up column families properly.

I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899  job 851769633655848961: reverting execution encountered retriable error: executing declarative schema change PostCommitNonRevertiblePhase stage 1 of 2 with 14 MutationType ops (rollback
=true) for ALTER TABLE: runtime error: invalid memory address or nil pointer dereference
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +(1) attached stack trace
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  -- stack trace:
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scerrors.EventLogger.HandlePanicAndLogError
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scerrors/errors.go:80
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  -- stack trace:
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scerrors.EventLogger.HandlePanicAndLogError
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scerrors/errors.go:80
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | runtime.gopanic
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        GOROOT/src/runtime/panic.go:884
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | runtime.panicmem
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        GOROOT/src/runtime/panic.go:260
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | runtime.sigpanic
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        GOROOT/src/runtime/signal_unix.go:835
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/scmutationexec.(*immediateVisitor).NotImplementedForPublicObjects
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/scmutationexec/scmutationexec.go:60
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scop.NotImplementedForPublicObjects.Visit
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/bazel-out/darwin-fastbuild/bin/pkg/sql/schemachanger/scop/immediate_mutation_visitor_generated.go:130
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec.executeMutationOps
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/exec_mutation.go:32
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec.ExecuteStage
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scexec/executor.go:28
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scrun.executeStage
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scrun/scrun.go:188
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scrun.RunSchemaChangesInJob.func1
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scrun/scrun.go:134
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scdeps.(*jobExecutionDeps).WithTxnInJob.func1
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scdeps/run_deps.go:144
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).DescsTxn.func1
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/internal.go:1412
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn.func4
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/internal.go:1500
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/kv/db.go:965
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/kv/txn.go:928
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/kv/db.go:964
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/kv/db.go:927
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/kv/db.go:902
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).txn
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/internal.go:1488
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql.(*InternalDB).DescsTxn
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/internal.go:1410
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scdeps.(*jobExecutionDeps).WithTxnInJob
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scdeps/run_deps.go:109
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scrun.RunSchemaChangesInJob
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scrun/scrun.go:130
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scjob.(*newSchemaChangeResumer).run
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scjob/job.go:114
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scjob.(*newSchemaChangeResumer).OnFailOrCancel
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/sql/schemachanger/scjob/job.go:60
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine.func3
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1667
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).stepThroughStateMachine
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/jobs/registry.go:1668
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).runJob
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:467
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/jobs.(*Registry).resumeJob.func2
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/jobs/adopt.go:385
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
I230328 13:25:08.469331 59352 jobs/registry.go:1685 ⋮ [T1,n1] 10899 +  |        github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470

Jira issue: CRDB-26085

@fqazi fqazi added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. GA-blocker T-sql-schema-deprecated Use T-sql-foundations instead labels Mar 28, 2023
@fqazi fqazi self-assigned this Mar 28, 2023
@blathers-crl
Copy link

blathers-crl bot commented Mar 28, 2023

Hi @fqazi, please add branch-* labels to identify which branch(es) this release-blocker affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@fqazi fqazi added branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 branch-master Failures and bugs on the master branch. labels Mar 28, 2023
fqazi added a commit to fqazi/cockroach that referenced this issue Mar 29, 2023
Previously, dropping column families was not implemented,
when we eliminated oprules, we replaced the ops with
NotImplementedForDrop, which wasn't sufficient for
dropped columns. The column families are cleaned up
when the column type itself is dropped, so we don't
really need to do much here. To address this, this
patch will add code to only assert that either the
table is being dropped or the column family has been
removed earlier.

Fixes: cockroachdb#99796
Release note: None
@craig craig bot closed this as completed in 6ec1cd7 Mar 30, 2023
blathers-crl bot pushed a commit that referenced this issue Mar 30, 2023
Previously, dropping column families was not implemented,
when we eliminated oprules, we replaced the ops with
NotImplementedForDrop, which wasn't sufficient for
dropped columns. The column families are cleaned up
when the column type itself is dropped, so we don't
really need to do much here. To address this, this
patch will add code to only assert that either the
table is being dropped or the column family has been
removed earlier.

Fixes: #99796
Release note: None
@exalate-issue-sync exalate-issue-sync bot added T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions) and removed T-sql-schema-deprecated Use T-sql-foundations instead labels May 10, 2023
@rytaft rytaft added C-technical-advisory Caused a technical advisory branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 and removed branch-release-23.1 Used to mark GA and release blockers, technical advisories, and bugs for 23.1 GA-blocker labels Dec 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. branch-release-22.2 Used to mark GA and release blockers, technical advisories, and bugs for 22.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. C-technical-advisory Caused a technical advisory T-sql-foundations SQL Foundations Team (formerly SQL Schema + SQL Sessions)
Projects
None yet
Development

No branches or pull requests

2 participants