storage: account for below Raft stats changes when using DeprecatedDelta #38976

nvanbenschoten · 2019-07-18T22:47:15Z

The explanation of what is going wrong is in #38859 (comment) and the next comment.

The problem was that we modify the stats delta for entries in applyRaftCommandToBatch here and here. We then conditionally replace this stats delta here if the raft command was proposed with a DeprecatedDelta instead of a new Delta field.

I'm not adding a unit test here because this is a glaring bug and we don't test either of these migrations at all anymore (since 78e1866). Our acceptance test suite is catching the issue, so I think that's sufficient.

Release note: None

cockroach-teamcity · 2019-07-18T22:47:22Z

This change is

ajwerner

Thanks!

Reviewed 2 of 2 files at r1.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)

pkg/storage/replica_application.go, line 594 at r1 (raw file):

	// upgrades. Thanks to commutativity, the spanlatch manager does not have to
	// serialize on the stats key.
	ms := *replicaState.Stats

nit: this doesn't necessarily seem like the place to do this dereference, it doesn't get used until the next set of conditionals and even then not in the common case. I don't think it'd be crazy to do it in each of the two branches below that use it.

pkg/storage/replica_application.go, line 687 at r1 (raw file):

	start := timeutil.Now()

	// TODO(ajwerner): This assertion no longer makes much sense.

Indeed is garbage. We read the same thing from the same batch twice and compare it to itself.

Fixes cockroachdb#38859. The explanation of what is going wrong is in cockroachdb#38859 (comment) and the next comment. The problem was that we modify the stats delta for entries in `applyRaftCommandToBatch` [here](https://github.com/cockroachdb/cockroach/blob/5a382477a5c38f1718829f2b443783dd8b34e92b/pkg/storage/replica_application.go#L600) and [here](https://github.com/cockroachdb/cockroach/blob/5a382477a5c38f1718829f2b443783dd8b34e92b/pkg/storage/replica_application.go#L631). We then conditionally replace this stats delta [here](https://github.com/cockroachdb/cockroach/blob/5a382477a5c38f1718829f2b443783dd8b34e92b/pkg/storage/replica_application.go#L325) if the raft command was proposed with a `DeprecatedDelta` instead of a new `Delta` field. I'm not adding a unit test here because this is a glaring bug and we don't test either of these migrations at all anymore (since 78e1866). Our test suite is catching the issue, so I think that's sufficient. Release note: None

nvanbenschoten

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner)

pkg/storage/replica_application.go, line 594 at r1 (raw file):

Previously, ajwerner wrote…

nit: this doesn't necessarily seem like the place to do this dereference, it doesn't get used until the next set of conditionals and even then not in the common case. I don't think it'd be crazy to do it in each of the two branches below that use it.

Done.

ajwerner

Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)

nvanbenschoten · 2019-07-19T01:08:55Z

I'm going to see what I can do about adding real unit tests around this behavior when I address the feedback on #38954 and add some tighter mocking around entry application.

bors r=ajwerner

38976: storage: account for below Raft stats changes when using DeprecatedDelta r=ajwerner a=nvanbenschoten Fixes #38859. The explanation of what is going wrong is in #38859 (comment) and the next comment. The problem was that we modify the stats delta for entries in `applyRaftCommandToBatch` [here](https://github.com/cockroachdb/cockroach/blob/5a382477a5c38f1718829f2b443783dd8b34e92b/pkg/storage/replica_application.go#L600) and [here](https://github.com/cockroachdb/cockroach/blob/5a382477a5c38f1718829f2b443783dd8b34e92b/pkg/storage/replica_application.go#L631). We then conditionally replace this stats delta [here](https://github.com/cockroachdb/cockroach/blob/5a382477a5c38f1718829f2b443783dd8b34e92b/pkg/storage/replica_application.go#L325) if the raft command was proposed with a `DeprecatedDelta` instead of a new `Delta` field. I'm not adding a unit test here because this is a glaring bug and we don't test either of these migrations at all anymore (since 78e1866). Our acceptance test suite is catching the issue, so I think that's sufficient. Release note: None Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

craig · 2019-07-19T01:31:16Z

Build succeeded

GitHub CI (Cockroach)

The new package provides abstractions and routines associated with the application of committed raft entries to a replicated state machine. This was inspired by four driving forces: - We've been having a number of discussions on the Core team about making storage abstractions more clear and easier to understand in isolation. One commonly discussed proposal is introducing a `storage/replicate` package that would encapsulate the concerns of raft replication (e.g. log manipulation, snapshots, leader election, heartbeats, etc.). This `storage/apply` package will fit in nicely alongside a replication abstraction. - Initial discussion on cockroachdb#38954 concluded that adding an optimization to acknowledge clients after their raft entries have committed but before they had been applied with the current code structure was moving in the opposite direction and making things even harder to understand due to the introduction of more complex state management. - Recent instability in this area (cockroachdb#38976, cockroachdb#39064, cockroachdb#39135, cockroachdb#39203) has revealed that there exists a high degree of difficulty involved in testing any of the logic in the area of raft entry application. This has naturally led to testing at a distance using tools like testing hooks, which is frustrating and delicate. As a result, we're missing tests for thing like old migrations that we still need to support. We also have trouble writing regression tests when bugs do pop up. - The proposed optimization in cockroachdb#17500 (comment) to apply committed raft entries to the Replica storage engine asynchronously in a separate thread than the raft processing thread will make entry application significantly more complex. For instance, we'll likely need to introduce a separate scheduler to coordinate entry application passes across Ranges on a node. The schedule will likely want to prioritize leaders over followers and embed other policies to optimize for total system throughput. There's a strong desire to isolate this new complexity and to give the logic a place to live. The PR begins to address these concerns by formalizing the process of applying committed raft entries. To start, this makes the process easier to understand both in terms of the macro-level steps that are taken during application of a batch of entries and in terms of the impact that an individual command has on the replicated state machine. For instance, the PR helps provide answers to all of the following questions: - What are the stages of raft entry application? - What is the difference between a "raft entry" and a "replicated command"? - What can a command do besides apply its write batch to the storage engine? - What does it mean for a successfully replicated command to be rejected during application? - When can we acknowledge the outcome of a raft proposal? The refactor also uncovers a large testing surface that future PRs will exploit to write targeted unit tests. Not only can the `storage/apply` package be tested with a mock state machine (done in this PR), but we can test Replica's implementation of the state machine interface in isolation without needing to touch raft at all. Finally, the refactor paves the way for making the proposed change in cockroachdb#38954 in a much cleaner way. This is demonstrated in next commit, which is being included here to show why certain things were designed the way they were but will not be merged with this PR. Release note: None

39254: storage/apply: create apply package for raft entry application r=nvanbenschoten a=nvanbenschoten The new package provides abstractions and routines associated with the application of committed raft entries to a replicated state machine. This was inspired by four driving forces: - We've been having a number of discussions on the Core team about making storage abstractions more clear and easier to understand in isolation. One commonly discussed proposal is introducing a `storage/replicate` package that would encapsulate the concerns of raft replication (e.g. log manipulation, snapshots, leader election, heartbeats, etc.). This `storage/apply` package will fit in nicely alongside a replication abstraction. - Initial discussion on #38954 concluded that adding an optimization to acknowledge clients after their raft entries have committed but before they had been applied with the current code structure was moving in the opposite direction and making things even harder to understand due to the introduction of more complex state management. - Recent instability in this area (#38976, #39064, #39135, #39203) has revealed that there exists a high degree of difficulty involved in testing any of the logic in the area of raft entry application. This has naturally led to testing at a distance using tools like testing hooks, which is frustrating and delicate. As a result, we're missing tests for things like old migrations that we still need to support. We also have trouble writing regression tests when bugs do pop up. - The proposed optimization in #17500 (comment) to apply committed raft entries to the Replica storage engine asynchronously in a separate thread than the raft processing thread will make entry application significantly more complex. For instance, we'll likely need to introduce a separate scheduler to coordinate entry application passes across Ranges on a node. The schedule will likely want to prioritize leaders over followers and embed other policies to optimize for total system throughput. There's a strong desire to isolate this new complexity and to give the logic a place to live. The PR begins to address these concerns by formalizing the process of applying committed raft entries. To start, this makes the process easier to understand both in terms of the macro-level steps that are taken during application of a batch of entries and in terms of the impact that an individual command has on the replicated state machine. For instance, the PR helps provide answers to all of the following questions: - What are the stages of raft entry application? - What is the difference between a "raft entry" and a "replicated command"? - What can a command do besides apply its write batch to the storage engine? - What does it mean for a successfully replicated command to be rejected during application? - When can we acknowledge the outcome of a raft proposal? The refactor also uncovers a large testing surface that future PRs will exploit to write targeted unit tests. Not only can the `storage/apply` package be tested with a mock state machine (done in this PR), but we can test Replica's implementation of the state machine interface in isolation without needing to touch raft at all. Finally, the refactor paves the way for making the proposed change in #38954 in a much cleaner way. This is demonstrated in the second commit, which is being included here to show why certain things were designed the way they were but will not be merged with this PR. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>

The new package provides abstractions and routines associated with the application of committed raft entries to a replicated state machine. This was inspired by four driving forces: - We've been having a number of discussions on the Core team about making storage abstractions more clear and easier to understand in isolation. One commonly discussed proposal is introducing a `storage/replicate` package that would encapsulate the concerns of raft replication (e.g. log manipulation, snapshots, leader election, heartbeats, etc.). This `storage/apply` package will fit in nicely alongside a replication abstraction. - Initial discussion on cockroachdb#38954 concluded that adding an optimization to acknowledge clients after their raft entries have committed but before they had been applied with the current code structure was moving in the opposite direction and making things even harder to understand due to the introduction of more complex state management. - Recent instability in this area (cockroachdb#38976, cockroachdb#39064, cockroachdb#39135, cockroachdb#39203) has revealed that there exists a high degree of difficulty involved in testing any of the logic in the area of raft entry application. This has naturally led to testing at a distance using tools like testing hooks, which is frustrating and delicate. As a result, we're missing tests for thing like old migrations that we still need to support. We also have trouble writing regression tests when bugs do pop up. - The proposed optimization in cockroachdb#17500 (comment) to apply committed raft entries to the Replica storage engine asynchronously in a separate thread than the raft processing thread will make entry application significantly more complex. For instance, we'll likely need to introduce a separate scheduler to coordinate entry application passes across Ranges on a node. The schedule will likely want to prioritize leaders over followers and embed other policies to optimize for total system throughput. There's a strong desire to isolate this new complexity and to give the logic a place to live. The PR begins to address these concerns by formalizing the process of applying committed raft entries. To start, this makes the process easier to understand both in terms of the macro-level steps that are taken during application of a batch of entries and in terms of the impact that an individual command has on the replicated state machine. For instance, the PR helps provide answers to all of the following questions: - What are the stages of raft entry application? - What is the difference between a "raft entry" and a "replicated command"? - What can a command do besides apply its write batch to the storage engine? - What does it mean for a successfully replicated command to be rejected during application? - When can we acknowledge the outcome of a raft proposal? The refactor also uncovers a large testing surface that future PRs will exploit to write targeted unit tests. Not only can the `storage/apply` package be tested with a mock state machine (done in this PR), but we can test Replica's implementation of the state machine interface in isolation without needing to touch raft at all. Finally, the refactor paves the way for making the proposed change in cockroachdb#38954 in a much cleaner way. This is demonstrated in next commit, which is being included here to show why certain things were designed the way they were but will not be merged with this PR. Release note: None

nvanbenschoten requested review from ajwerner and a team July 18, 2019 22:47

ajwerner approved these changes Jul 18, 2019

View reviewed changes

nvanbenschoten force-pushed the nvanbenschoten/fixStatsPanic branch from f5a7be9 to 19c5c09 Compare July 18, 2019 23:10

nvanbenschoten commented Jul 18, 2019

View reviewed changes

ajwerner reviewed Jul 18, 2019

View reviewed changes

craig bot merged commit 19c5c09 into cockroachdb:master Jul 19, 2019

nvanbenschoten deleted the nvanbenschoten/fixStatsPanic branch July 19, 2019 01:37

nvanbenschoten mentioned this pull request Aug 2, 2019

storage/apply: create apply package for raft entry application #39254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: account for below Raft stats changes when using DeprecatedDelta #38976

storage: account for below Raft stats changes when using DeprecatedDelta #38976

nvanbenschoten commented Jul 18, 2019

cockroach-teamcity commented Jul 18, 2019

ajwerner left a comment

nvanbenschoten left a comment

ajwerner left a comment

nvanbenschoten commented Jul 19, 2019

craig bot commented Jul 19, 2019

storage: account for below Raft stats changes when using DeprecatedDelta #38976

storage: account for below Raft stats changes when using DeprecatedDelta #38976

Conversation

nvanbenschoten commented Jul 18, 2019

cockroach-teamcity commented Jul 18, 2019

ajwerner left a comment

Choose a reason for hiding this comment

nvanbenschoten left a comment

Choose a reason for hiding this comment

ajwerner left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Jul 19, 2019

craig bot commented Jul 19, 2019

Build succeeded