Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ddl: add recover for run ddl job #10981

Merged
merged 6 commits into from
Jul 2, 2019
Merged

Conversation

crazycs520
Copy link
Contributor

What problem does this PR solve?

Fix ddl worker run ddl job panic.

[2019/06/28 15:21:54.379 +08:00] [INFO] [ddl_worker.go:474] ["[ddl] run DDL job"] [worker="worker 1, tp general"] [job="ID:48, Type:create table, State:none, SchemaState:none, SchemaID:1, TableID:47, RowCount:0, ArgLen:0, start time: 2019-06-28 15:19:12.653 +0800 CST, Err:<nil>, ErrCount:0, SnapshotVersion:0"]
[2019/06/28 15:21:54.379 +08:00] [ERROR] [ddl.go:441] ["[ddl] DDL worker meet panic"] [worker="worker 1, tp general"] [ID=8c8b0d58-4238-4f21-bd15-6915af71a983] [stack="github.com/pingcap/tidb/ddl.(*ddl).start.func2
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl.go:441
github.com/pingcap/tidb/util.WithRecovery.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/util/misc.go:72
runtime.gopanic
	/usr/local/go1.12/src/runtime/panic.go:522
runtime.panicmem
	/usr/local/go1.12/src/runtime/panic.go:82
runtime.sigpanic
	/usr/local/go1.12/src/runtime/signal_unix.go:390
github.com/pingcap/tidb/infoschema.(*Handle).Get
	/Users/cs/code/goread/src/github.com/pingcap/tidb/infoschema/infoschema.go:302
github.com/pingcap/tidb/ddl.onCreateTable
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/table.go:56
github.com/pingcap/tidb/ddl.(*worker).runDDLJob
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:499
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:402
github.com/pingcap/tidb/kv.RunInNewTxn
	/Users/cs/code/goread/src/github.com/pingcap/tidb/kv/txn.go:50
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:365
github.com/pingcap/tidb/ddl.(*worker).start
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:137
github.com/pingcap/tidb/ddl.(*ddl).start.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl.go:438
github.com/pingcap/tidb/util.WithRecovery
	/Users/cs/code/goread/src/github.com/pingcap/tidb/util/misc.go:80"]
[2019/06/28 15:21:54.380 +08:00] [ERROR] [misc.go:75] ["panic in the recoverable goroutine"] [r="\"invalid memory address or nil pointer dereference\""] ["stack trace"="github.com/pingcap/tidb/util.WithRecovery.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/util/misc.go:77
runtime.gopanic
	/usr/local/go1.12/src/runtime/panic.go:522
runtime.panicmem
	/usr/local/go1.12/src/runtime/panic.go:82
runtime.sigpanic
	/usr/local/go1.12/src/runtime/signal_unix.go:390
github.com/pingcap/tidb/infoschema.(*Handle).Get
	/Users/cs/code/goread/src/github.com/pingcap/tidb/infoschema/infoschema.go:302
github.com/pingcap/tidb/ddl.onCreateTable
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/table.go:56
github.com/pingcap/tidb/ddl.(*worker).runDDLJob
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:499
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:402
github.com/pingcap/tidb/kv.RunInNewTxn
	/Users/cs/code/goread/src/github.com/pingcap/tidb/kv/txn.go:50
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:365
github.com/pingcap/tidb/ddl.(*worker).start
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:137
github.com/pingcap/tidb/ddl.(*ddl).start.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl.go:438
github.com/pingcap/tidb/util.WithRecovery
	/Users/cs/code/goread/src/github.com/pingcap/tidb/util/misc.go:80"] [stack="github.com/pingcap/tidb/util.WithRecovery.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/util/misc.go:75
runtime.gopanic
	/usr/local/go1.12/src/runtime/panic.go:522
runtime.panicmem
	/usr/local/go1.12/src/runtime/panic.go:82
runtime.sigpanic
	/usr/local/go1.12/src/runtime/signal_unix.go:390
github.com/pingcap/tidb/infoschema.(*Handle).Get
	/Users/cs/code/goread/src/github.com/pingcap/tidb/infoschema/infoschema.go:302
github.com/pingcap/tidb/ddl.onCreateTable
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/table.go:56
github.com/pingcap/tidb/ddl.(*worker).runDDLJob
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:499
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:402
github.com/pingcap/tidb/kv.RunInNewTxn
	/Users/cs/code/goread/src/github.com/pingcap/tidb/kv/txn.go:50
github.com/pingcap/tidb/ddl.(*worker).handleDDLJobQueue
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:365
github.com/pingcap/tidb/ddl.(*worker).start
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl_worker.go:137
github.com/pingcap/tidb/ddl.(*ddl).start.func1
	/Users/cs/code/goread/src/github.com/pingcap/tidb/ddl/ddl.go:438
github.com/pingcap/tidb/util.WithRecovery
	/Users/cs/code/goread/src/github.com/pingcap/tidb/util/misc.go:80"]

What is changed and how it works?

  • Add a recover in runDDLJob function.
  • check nil point.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Side effects

Related changes

  • Need to cherry-pick to the release branch

@codecov
Copy link

codecov bot commented Jun 28, 2019

Codecov Report

Merging #10981 into master will not change coverage.
The diff coverage is n/a.

@@             Coverage Diff             @@
##             master     #10981   +/-   ##
===========================================
  Coverage   81.0665%   81.0665%           
===========================================
  Files           421        421           
  Lines         89577      89577           
===========================================
  Hits          72617      72617           
  Misses        11725      11725           
  Partials       5235       5235

@crazycs520 crazycs520 added the type/bugfix This PR fixes a bug. label Jun 28, 2019
ddl/table.go Outdated Show resolved Hide resolved
ddl/serial_test.go Outdated Show resolved Hide resolved
ddl/ddl_worker.go Outdated Show resolved Hide resolved
ddl/serial_test.go Outdated Show resolved Hide resolved
@@ -471,6 +472,18 @@ func chooseLeaseTime(t, max time.Duration) time.Duration {

// runDDLJob runs a DDL job. It returns the current schema version in this transaction and the error.
func (w *worker) runDDLJob(d *ddlCtx, t *meta.Meta, job *model.Job) (ver int64, err error) {
defer func() {
r := recover()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use WithRecovery

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

logutil.BgLogger().Error("run ddl job panic",
zap.Reflect("r", r),
zap.Stack("stack trace"))
job.State = model.JobStateCancelling
Copy link
Contributor

@winkyao winkyao Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some comments, why cancel the job here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

Copy link
Contributor

@zimulala zimulala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@crazycs520
Copy link
Contributor Author

/run-all-tests

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bb7133
Copy link
Member

bb7133 commented Jul 1, 2019

LGTM

@bb7133 bb7133 added the status/LGT2 Indicates that a PR has LGTM 2. label Jul 1, 2019
@crazycs520 crazycs520 added status/LGT3 The PR has already had 3 LGTM. and removed status/LGT2 Indicates that a PR has LGTM 2. labels Jul 2, 2019
@crazycs520
Copy link
Contributor Author

/run-all-tests

@crazycs520
Copy link
Contributor Author

/run-unit-test

@winkyao winkyao merged commit 6542e4a into pingcap:master Jul 2, 2019
crazycs520 added a commit to crazycs520/tidb that referenced this pull request Jul 2, 2019
@you06 you06 added the sig/sql-infra SIG: SQL Infra label Mar 4, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
sig/sql-infra SIG: SQL Infra status/LGT3 The PR has already had 3 LGTM. type/bugfix This PR fixes a bug.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants