Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSI: fix transaction handling in state store #9438

Merged
merged 8 commits into from
Nov 25, 2020
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented Nov 24, 2020

Fixes #9371 and possibly #9248 and #8034

When making updates to CSI plugins and volumes, the state store methods that have open write transactions were querying the state store using the same methods used by the CSI RPC endpoint, but these methods create their own top-level read transactions. During concurrent plugin updates (as happens when a plugin job is stopped), this can cause write skew in the plugin counts. This bug hasn't yet been identified as the root cause of any volume query bug, but it seems likely.

Best reviewed commit-by-commit.

CSI plugin counts can drift from the correct values if allocations for a CSI
plugin stop concurrently, as with a job stop or a controller and node plugin
on the same client. This test demonstrates the behavior.
CSI volumes need to be "denormalized" with their plugins and (optionally)
allocations. Read-only RPC endpoints should take a snapshot so that we can
make multiple state store method calls with a consistent view.
When making updates to CSI plugins, the state store methods that have open
write transactions were querying the state store using the same methods used
by the CSI RPC endpoint, but these method creates their own top-level read
transactions. During concurrent plugin updates (as happens when a plugin job
is stopped), this can cause write skew in the plugin counts.

Refactor the CSIPlugin query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
Add missing nil checks and missing watch `Add` calls on some state store
methods for CSI volumes
When making updates to CSI volumes, the state store methods that have open
write transactions were querying the state store using the same methods used
by the CSI RPC endpoint, but these method creates their own top-level read
transactions. These have yet not been implicated in any bugs.

Refactor the CSIVolume query methods to have an implementation method that
accepts a transaction, which can be called with either a read txn or a write
txn.
if vol.Namespace != ns {
continue
}

vol, err := snap.CSIVolumeDenormalizePlugins(ws, vol.Copy())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the significance of copying the volume here, but not during Get?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CSIVolumeByID query in Get does a CSIVolumeDenormalizePlugins in the state store, so we already copy in the state store function. (I'm not wild about how that works currently because it's too easy to miss... I didn't want to get too wild with the refactoring in this PR though.)

Copy link
Contributor

@drewbailey drewbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just a feww non blocking questions

Copy link
Contributor

@notnoop notnoop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The commit breakdown with the commit messages are super awesome - made it easy to review this code. Thanks!

Comment on lines 2125 to 2127
if ws != nil {
ws.Add(watchCh)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this is explicit, I always get nervous when handling watchset. Want to note that this is not required - many other cases don't do a nil check, and apparently it's handled in Add: https://github.com/hashicorp/go-memdb/blob/v1.3.0/watch.go#L16-L20

Interestingly, this commit adds a new ws.Add without doing a nil check ;-).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and it's a documented part of the ws.Add method. Maybe we should just remove these extra checks then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 88a7e71

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that's interesting... removing those caused NPE errors at those lines in the test runs for jobVersionByID!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1a928e1]

goroutine 54 [running]:
github.com/hashicorp/nomad/nomad/state.(*StateStore).jobVersionByID(0xc00062b470, 0xc00057e180, 0x0, 0xc00054b630, 0x7, 0xc000551380, 0x28, 0x0, 0x0, 0x0, ...)
	github.com/hashicorp/nomad/nomad/state/state_store.go:1842 +0x161
github.com/hashicorp/nomad/nomad/state.(*StateStore).upsertJobVersion(0xc00062b470, 0x7, 0xc00064e900, 0xc00057e180, 0x0, 0x0)
	github.com/hashicorp/nomad/nomad/state/state_store.go:1753 +0x249
github.com/hashicorp/nomad/nomad/state.(*StateStore).upsertJobImpl(0xc00062b470, 0x7, 0xc00064e6c0, 0x35de500, 0xc00057e180, 0x0, 0x1)
	github.com/hashicorp/nomad/nomad/state/state_store.go:1523 +0x42c
github.com/hashicorp/nomad/nomad/state.(*StateStore).UpsertJob(0xc00062b470, 0x4, 0x7, 0xc00064e6c0, 0x0, 0x0)
	github.com/hashicorp/nomad/nomad/state/state_store.go:1449 +0xa8
github.com/hashicorp/nomad/nomad.(*nomadFSM).applyUpsertJob(0xc0005f5030, 0x4, 0xc000391801, 0x6b4, 0xbb3, 0x7, 0x0, 0x0)
	github.com/hashicorp/nomad/nomad/fsm.go:507 +0x205
github.com/hashicorp/nomad/nomad.(*nomadFSM).Apply(0xc0005f5030, 0xc000241a90, 0x51dd880, 0xbfe7b879c156277c)
	github.com/hashicorp/nomad/nomad/fsm.go:219 +0x34e
github.com/hashicorp/raft.(*Raft).runFSM.func1(0xc0006a43e0)
	github.com/hashicorp/raft@v1.1.3-0.20200211192230-365023de17e6/fsm.go:90 +0x2c2
github.com/hashicorp/raft.(*Raft).runFSM.func2(0xc000692e00, 0x1, 0x40)
	github.com/hashicorp/raft@v1.1.3-0.20200211192230-365023de17e6/fsm.go:113 +0x75
github.com/hashicorp/raft.(*Raft).runFSM(0xc0005a8000)
	github.com/hashicorp/raft@v1.1.3-0.20200211192230-365023de17e6/fsm.go:219 +0x3c4
github.com/hashicorp/raft.(*raftState).goFunc.func1(0xc0005a8000, 0xc0004c4100)
	github.com/hashicorp/raft@v1.1.3-0.20200211192230-365023de17e6/state.go:146 +0x55
created by github.com/hashicorp/raft.(*raftState).goFunc
	github.com/hashicorp/raft@v1.1.3-0.20200211192230-365023de17e6/state.go:144 +0x66
    jobs_test.go:2176: err: Put "http://127.0.0.1:31070/v1/jobs": EOF

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I must be missing something here, because testing with the actual go-memdb library shows otherwise...

func TestNilWatch(t *testing.T) {

	test := func(ws WatchSet) {
		watchCh := make(chan struct{})
		ws.Add(watchCh)
	}
	test(nil)
	test(NewWatchSet())
}
$ go test . -v -run TestNilWatch -count=1
=== RUN   TestNilWatch
--- PASS: TestNilWatch (0.00s)
PASS
ok      github.com/hashicorp/go-memdb   0.025s

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, it's because jobVersionByID's signature is:

(s *StateStore) jobVersionByID(txn *txn, ws *memdb.WatchSet, namespace, id string) ([]*structs.Job, error)

I don't think we're supposed to be taking that pointer?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting jobVersionByID takes a *memdb.WatchSet unlike the other methods in this file. It was introduced as such since the beginning in https://github.com/hashicorp/nomad/pull/2566/files#diff-53c7d524897a192ece819816a9dcf006dee46747fbb853e7f638601ea556a72cR561-R572 - but without explanation...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it's the only place in the code base where we do that, including Consul and Vault, I suspect it's just a long-lived typo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops - I missed the last comment when I posted mine - I agree, it shouldn't be a pointer here.

@github-actions
Copy link

github-actions bot commented Dec 8, 2022

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 8, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nomad confused about how many CSI plugins should be running
3 participants