Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a panic related to batch GC #4903

Merged
merged 4 commits into from
Nov 20, 2018
Merged

Conversation

notnoop
Copy link
Contributor

@notnoop notnoop commented Nov 19, 2018

deleteJobVersions deletes job versions while it's iterating on them.
This seems to cause a panic eventually.

I am able to reliably reproduce the panic in a test environment where I
spin up multiple batch jobs and run
curl -XPUT http://localhost:4646/v1/system/gc, and confirmed that the
panic doesn't occur with this patch. Yet, I am unable to capture the
case in a unit/integration test sadly.
I have been able to reproduce it reliably in test in the PR.

Test case that reproduces a panic with the following stacktrace:

```
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1149715]

goroutine 35 [running]:
testing.tRunner.func1(0xc0001e2200)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:792 +0x387
panic(0x167e400, 0x1c43a30)
	/usr/local/Cellar/go/1.11.2/libexec/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0003a4080, 0x17f7ba0, 0x0, 0xc0002e74a0, 0xc0003a0510, 0xc0003a0530, 0xc0003a0530)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc0003a0420, 0x1756059, 0xb)
	/go/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc00028f7d0, 0x2711, 0xc0002e7680, 0xc000392100, 0xc0003a4040, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1a1
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc00028f7d0, 0x2711, 0x175334f, 0x7, 0xc000306810, 0x2f, 0xc000392100, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x46c
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc000392100, 0x1777ce0, 0xc000392100)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc00028f7d0, 0xc0000d5e48, 0x0, 0x0)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x79
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc0001e2200)
	/go/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc0001e2200, 0x1777138)
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/usr/local/Cellar/go/1.11.2/libexec/src/testing/testing.go:878 +0x353
```
@notnoop notnoop force-pushed the b-delete-versions-mod-while-iter branch from 70255b2 to d4fd12d Compare November 20, 2018 01:59
`deleteJobVersions` does concurrent modifications to iterated items
while iterating, by deleting job versions while it's iterating on them,
@notnoop notnoop force-pushed the b-delete-versions-mod-while-iter branch from d4fd12d to cf51bc6 Compare November 20, 2018 01:59
@notnoop
Copy link
Contributor Author

notnoop commented Nov 20, 2018

Confirmed that the panic occurred in https://travis-ci.org/hashicorp/nomad/jobs/457262238 against the test sha before the fix:

https://travis-ci.org/hashicorp/nomad/jobs/457262238

--- FAIL: TestStateStore_DeleteJobTxn_BatchDeletes (0.01s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x54c455]
goroutine 106 [running]:
testing.tRunner.func1(0xc000180500)
	/home/travis/.gimme/versions/go1.11.2.linux.amd64/src/testing/testing.go:792 +0x387
panic(0xadf740, 0x1115730)
	/home/travis/.gimme/versions/go1.11.2.linux.amd64/src/runtime/panic.go:513 +0x1b9
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix.(*Iterator).Next(0xc0004afd60, 0xc6b5c0, 0x0, 0xc00044a5a0, 0xc00010dd30, 0xc00010dd50, 0xc00010dd50)
	/home/travis/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-immutable-radix/iter.go:81 +0xa5
github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb.(*radixIterator).Next(0xc00010dc40, 0xbc7e55, 0xb)
	/home/travis/gopath/src/github.com/hashicorp/nomad/vendor/github.com/hashicorp/go-memdb/txn.go:634 +0x2e
github.com/hashicorp/nomad/nomad/state.(*StateStore).deleteJobVersions(0xc000190b70, 0x2711, 0xc00044a780, 0xc0000f9d40, 0xc0004afd20, 0x0)
	/home/travis/gopath/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1130 +0x1d3
github.com/hashicorp/nomad/nomad/state.(*StateStore).DeleteJobTxn(0xc000190b70, 0x2711, 0xbc4fb8, 0x7, 0xc0003c7500, 0x2f, 0xc0000f9d40, 0x0, 0x0)
	/home/travis/gopath/src/github.com/hashicorp/nomad/nomad/state/state_store.go:1102 +0x4d0
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes.func1(0xc0000f9d40, 0xbea518, 0xc0000f9d40)
	/home/travis/gopath/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1705 +0x1a2
github.com/hashicorp/nomad/nomad/state.(*StateStore).WithWriteTransaction(0xc000190b70, 0xc00009de48, 0x0, 0x0)
	/home/travis/gopath/src/github.com/hashicorp/nomad/nomad/state/state_store.go:3953 +0x83
github.com/hashicorp/nomad/nomad/state.TestStateStore_DeleteJobTxn_BatchDeletes(0xc000180500)
	/home/travis/gopath/src/github.com/hashicorp/nomad/nomad/state/state_store_test.go:1703 +0x685
testing.tRunner(0xc000180500, 0xbe9970)
	/home/travis/.gimme/versions/go1.11.2.linux.amd64/src/testing/testing.go:827 +0xbf
created by testing.(*T).Run
	/home/travis/.gimme/versions/go1.11.2.linux.amd64/src/testing/testing.go:878 +0x353

nomad/state/state_store_test.go Outdated Show resolved Hide resolved
nomad/state/state_store_test.go Outdated Show resolved Hide resolved
@notnoop notnoop merged commit f098c9a into master Nov 20, 2018
notnoop added a commit that referenced this pull request Nov 20, 2018
@notnoop notnoop deleted the b-delete-versions-mod-while-iter branch November 21, 2018 02:50
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants