Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the data inconsistency issue by moving the SetConsistentIndex into the transaction lock #13854

Merged
merged 6 commits into from
Apr 7, 2022

Conversation

ahrtr
Copy link
Member

@ahrtr ahrtr commented Mar 30, 2022

Fix issues/13766 .

Previously the SetConsistentIndex() is called during the apply workflow, but it's outside the backend transaction. Obviously they do not belong to an atomic operation any more. If a periodic commit happens between SetConsistentIndex and the left apply workflow, and etcd crashes for whatever reason right after the commit, then etcd runs into the data inconsistency issue. Please refer to discussion in pull/13844 and issues/13766.

In this PR, I moved the SetConsistentIndex into a txPostLockHook, which is executed each time right after the batchTx.Lock() being called.

batchTx.Lock() is called in many places, but txPostLockHook is only supposed to be executed in the apply workflow. So I added one more interface method LockWithoutHook into BatchTx; it doesn't execute the txPostLockHook at all.

	// LockWithoutHook doesn't execute the txPostLockHook.
	LockWithoutHook()

Only the operations in the apply workflow call the BatchTx.Lock, and all others should call BatchTx.LockWithoutHook. If the operation doesn't have any impact on the apply workflow, such as the etcdutl commands, then it doesn't matter which lock it calls.

cc all related people @ptabor @serathius @spzala @wilsonwang371 @chaochn47 @tangcong @liuycsd @PaulFurtado @gyuho @hexfusion @jingyih @wpedrak Please take a look. Thanks.

@chaochn47 @serathius @liuycsd @moonovo @michaljasionowski could you please try this PR to check whether you can still reproduce the data inconsistency issue? Thanks.

@ahrtr ahrtr force-pushed the data_corruption branch 5 times, most recently from f72fbb0 to c6dc71f Compare March 30, 2022 18:42
@ahrtr ahrtr changed the title Fix the data inconsistency by moving the SetConsistentIndex into the transaction lock Fix the data inconsistency issue by moving the SetConsistentIndex into the transaction lock Mar 30, 2022
@codecov-commenter
Copy link

codecov-commenter commented Mar 30, 2022

Codecov Report

Merging #13854 (7b52e74) into main (c4d055f) will decrease coverage by 0.14%.
The diff coverage is 85.96%.

❗ Current head 7b52e74 differs from pull request most recent head 4033f5c. Consider uploading reports for the commit 4033f5c to get more accurate results

@@            Coverage Diff             @@
##             main   #13854      +/-   ##
==========================================
- Coverage   72.45%   72.31%   -0.15%     
==========================================
  Files         469      469              
  Lines       38275    38352      +77     
==========================================
+ Hits        27733    27735       +2     
- Misses       8771     8833      +62     
- Partials     1771     1784      +13     
Flag Coverage Δ
all 72.31% <85.96%> (-0.15%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
etcdutl/etcdutl/backup_command.go 9.94% <0.00%> (ø)
server/storage/schema/migration.go 81.48% <0.00%> (ø)
server/storage/backend/verify.go 65.21% <50.00%> (-8.12%) ⬇️
server/storage/mvcc/store.go 84.61% <50.00%> (ø)
server/storage/schema/schema.go 92.72% <60.00%> (ø)
server/storage/schema/membership.go 59.85% <71.42%> (ø)
server/storage/schema/auth_roles.go 76.00% <73.91%> (+3.27%) ⬆️
server/storage/backend/batch_tx.go 65.19% <83.33%> (+0.98%) ⬆️
server/storage/schema/auth_users.go 79.16% <89.47%> (+2.97%) ⬆️
server/etcdserver/cindex/cindex.go 92.45% <90.90%> (-1.67%) ⬇️
... and 44 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c4d055f...4033f5c. Read the comment docs.

@ptabor
Copy link
Contributor

ptabor commented Mar 31, 2022

Thank you @ahrtr. The fix logic seems to me to be good. @serathius is the problem reproducible with the fix ?

What I'm afraid is that it is error prone maintain the correct calls going forward:

  • it's easy to forget whether library-logic performing transaction should call Unlock vs UnlockWithoutHooks
  • it's difficult to asses whether the transitively called code is and will be calling the correct method.

I'm think we need to have some method of verification that we don't forget or messed the calls. For example:

If ETCD_VERIFY env variable (

const ENV_VERIFY = "ETCD_VERIFY"
) is set
in Unlock & UnlockWithoutHooks we perform verification whether 'apply' method is/is-not (respectively) on the stack-trace. The mode is already enabled by default in both integrational & e2e tests. It would be slow... but it would catch most of the wrong calls... It might miss case when 'apply' is creating go-routine that performs a transaction... but I think it would be weird pattern we should be also aware off.
Alternative is to use 'ctx' for such verification... (implicit ctx driving business logic is imho anti-pattern) but it would require us to pass ctx to all methods that performs transactions (that seems a good thing in general).

I reserved a ~full day for tomorrow to potentially try hands-on some of the ideas proposed above.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 31, 2022

I'm think we need to have some method of verification that we don't forget or messed the calls

It's really a good point. But I don't think it's a blocker for the release of 3.5.3, because this issue is urgent and important.

@ahrtr
Copy link
Member Author

ahrtr commented Mar 31, 2022

Just added detailed comment to explain the difference between Lock and LockWithoutHook. @ptabor

@michaljasionowski
Copy link
Contributor

I still managed to reproduce inconsistencies due to downgrade/upgrade as described in #13514 (comment) with #13844 cherry-picked to v3.5.2. I didn't try this PR, as it's more complex to cherry-pick.

@deads2k
Copy link

deads2k commented Mar 31, 2022

With this PR, the test from #13838 no longer produces "found data inconsistency with peers"

@ahrtr
Copy link
Member Author

ahrtr commented Mar 31, 2022

With this PR, the test from #13838 no longer produces "found data inconsistency with peers"

Thanks @deads2k for the good news. Just to double confirm, can you easily reproduce this issue without this PR in your environment?

@serathius
Copy link
Member

serathius commented Apr 1, 2022

I have also run qualification and was not longer able to reproduce the issue on this PR. To give some data I have run qualification 4 times each taking 10 minutes and didn't find any issues. Compare this to reproduction on main branch, which I run 5 times each of them managing to reproduce the issue on average in 2 minutes (minimal 30 seconds, maximum 9 minutes)

@dims
Copy link
Contributor

dims commented Apr 1, 2022

this is great news @serathius Nice hob @ahrtr !

@deads2k
Copy link

deads2k commented Apr 1, 2022

Thanks @deads2k for the good news. Just to double confirm, can you easily reproduce this issue without this PR in your environment?

Yes, using #13838 without this PR, I'm able to produce the "found data inconsistency with peers" failure reliably.

@ptabor
Copy link
Contributor

ptabor commented Apr 1, 2022

Still looking into this. Added verification as described above and discovering subtle cases as expected:

  1. We do call (read-only): Authentication from the applier.
    This illustrate that we need to assume there are idempotent/reentrant transaction that will NOT move consistency_index forward. So it's OK, but we cannot assume all nested-apply transaction will call .Lock(WithHook).
    Alternatively authBackendTx needs to support RLock.
go.etcd.io/etcd/server/v3/storage/backend.(*batchTx).LockWithoutHook
	etcd/server/storage/backend/batch_tx.go:106
go.etcd.io/etcd/server/v3/storage/schema.(*authBatchTx).LockWithoutHook
	etcd/server/storage/schema/auth.go:112
go.etcd.io/etcd/server/v3/storage/schema.(*authBackend).GetUser
	etcd/server/storage/schema/auth_users.go:24
go.etcd.io/etcd/server/v3/auth.(*authStore).Authenticate
	etcd/server/auth/store.go:315
go.etcd.io/etcd/server/v3/etcdserver.(*applierV3backend).Authenticate
	etcd/server/etcdserver/apply.go:827
go.etcd.io/etcd/server/v3/etcdserver.(*applierV3backend).Apply
	etcd/server/etcdserver/apply.go:197
go.etcd.io/etcd/server/v3/etcdserver.(*authApplierV3).Apply
	etcd/server/etcdserver/apply_auth.go:61
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntryNormal
	etcd/server/etcdserver/server.go:1870
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).apply
	etcd/server/etcdserver/server.go:1777
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyEntries
	etcd/server/etcdserver/server.go:1084
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).applyAll
	etcd/server/etcdserver/server.go:904
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).run.func8
	etcd/server/etcdserver/server.go:836
go.etcd.io/etcd/pkg/v3/schedule.(*fifo).run
	etcd/pkg/schedule/schedule.go:157

And the other way around: Authenticate is being called outside of 'apply' from interceptors...

go.etcd.io/etcd/server/v3/storage/backend.(*batchTx).Lock
	/etcd/server/storage/backend/batch_tx.go:89
go.etcd.io/etcd/server/v3/storage/schema.(*authBatchTx).Lock
	/etcd/server/storage/schema/auth.go:108
go.etcd.io/etcd/server/v3/auth.(*authStore).CheckPassword.func1
	/etcd/server/auth/store.go:350
go.etcd.io/etcd/server/v3/auth.(*authStore).CheckPassword
	/etcd/server/auth/store.go:363
go.etcd.io/etcd/server/v3/etcdserver.(*EtcdServer).Authenticate
	/etcd/server/etcdserver/v3_server.go:434
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.(*AuthServer).Authenticate
	/etcd/server/etcdserver/api/v3rpc/auth.go:57
go.etcd.io/etcd/api/v3/etcdserverpb._Auth_Authenticate_Handler.func1
	/etcd/api/etcdserverpb/rpc.pb.go:8064
go.etcd.io/etcd/pkg/v3/grpc_testing.(*GrpcRecorder).UnaryInterceptor.func1
	/etcd/pkg/grpc_testing/recorder.go:38
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/Users/ptab/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
github.com/grpc-ecosystem/go-grpc-prometheus.(*ServerMetrics).UnaryServerInterceptor.func1
	/Users/ptab/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-prometheus@v1.2.0/server_metrics.go:107
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/Users/ptab/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.newUnaryInterceptor.func1
	/etcd/server/etcdserver/api/v3rpc/interceptor.go:74
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/Users/ptab/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
go.etcd.io/etcd/server/v3/etcdserver/api/v3rpc.newLogUnaryInterceptor.func1
	/etcd/server/etcdserver/api/v3rpc/interceptor.go:81
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1
	/Users/ptab/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:25
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1
	/Users/ptab/go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.3.0/chain.go:34
go.etcd.io/etcd/api/v3/etcdserverpb._Auth_Authenticate_Handler
	/etcd/api/etcdserverpb/rpc.pb.go:8066
google.golang.org/grpc.(*Server).processUnaryRPC
	/Users/ptab/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1279
google.golang.org/grpc.(*Server).handleStream
	/Users/ptab/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:1608
google.golang.org/grpc.(*Server).serveStreams.func1.2
	/Users/ptab/go/pkg/mod/google.golang.org/grpc@v1.41.0/server.go:923

@ahrtr ahrtr force-pushed the data_corruption branch 2 times, most recently from be02248 to 45e2ad0 Compare April 1, 2022 22:21
@ahrtr
Copy link
Member Author

ahrtr commented Apr 1, 2022

Thanks @ptabor .

In your first stack trace, we do need to call Lock(). It's coming from the apply workflow, so we need to move the consistent_index forward although it's read-only.

In your second stack trace coming from outside apply, yes. It's a good catch.

I just enhanced authBackend to support read-only ReadTx. Please note that some read-only operations may also need to move the consistent_index forward if they are coming from the apply workflow. But the GetUser is a corner case, because it's called by a couple of places; some calling are coming from the apply workflow, while others are not. So I addressed it separately.

I rebased this PR, so all the three commits are submitted again, but you only need to take a look at the last one to check the latest change I made.

@ahrtr ahrtr force-pushed the data_corruption branch 7 times, most recently from d6d7c53 to c698e13 Compare April 7, 2022 02:22
Copy link
Contributor

@ptabor ptabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much !!!
Great work.

server/etcdserver/cindex/cindex.go Outdated Show resolved Hide resolved
server/storage/backend/backend.go Outdated Show resolved Hide resolved
@ptabor ptabor added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 7, 2022
…ackage

Removed the fields consistentIdx and consistentTerm from struct EtcdServer,
and added applyingIndex and applyingTerm into struct consistentIndex in
package cindex. We may remove the two fields completely if we decide to
remove the OnPreCommitUnsafe, and it will depend on the performance test
result.
// and raftpb.Entry.Term, and they are not ready to be persisted yet. They will be
// saved to consistentIndex and term above in the txPostLockInsideApplyHook.
//
// TODO(ahrtr): try to remove the OnPreCommitUnsafe, and compare the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do the performance check before merging this PR? I would prefer not to backport a PR with todo. Please let me know if you need help with measuring it.

Copy link
Member Author

@ahrtr ahrtr Apr 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are all good so far. I will investigate whether or not we should completely remove the OnPreCommitUnsafe in the next step only in the main branch. The change (I mean possibly removing OnPreCommitUnsafe) will not be cherry picked to 3.5.

We can use the tools/benchmark, correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to backport this change because it makes the code much simpler, which will be important for long term maintenance of release-3.5 branch. Just making sure that the decision not to fix it imminently is not rushed, let's not sacrifice the quality, however it you think it's not worth backporting that's also ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your comment is valid, but usually we only backport bug fixes to previous release, so does this PR. But completely possibly removing OnPreCommitUnsafe is a refactor, and so I don't think we should backport it to 3.5. Please note that refactoring may also introduce regression, so we should only do it in main branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR is good to go.

@serathius
Copy link
Member

Can you squash the commits?

@ahrtr
Copy link
Member Author

ahrtr commented Apr 7, 2022

Can you squash the commits?

I intentionally added multiple commits, each has a clear goal/comment. I can merge the first two commits. WDYT?

@serathius
Copy link
Member

@ahrtr can you handle backport to v3.5?

@ahrtr
Copy link
Member Author

ahrtr commented Apr 7, 2022

@ahrtr can you handle backport to v3.5?

Definitely yes, will do it today.

Many thanks to @ptabor and @serathius for all the helps and valuable review comments.

@ahrtr
Copy link
Member Author

ahrtr commented Apr 7, 2022

Also thanks to @deads2k @serathius and @michaljasionowski for the verification of issue based on this PR!

@ahrtr
Copy link
Member Author

ahrtr commented Apr 9, 2022

FYI. In case anyone needs a simple summary

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release.
Development

Successfully merging this pull request may close these issues.

Inconsistent revision and data occurs
7 participants