Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency #11670

Merged
merged 3 commits into from
Mar 15, 2020

Conversation

tangcong
Copy link
Contributor

@tangcong tangcong commented Mar 3, 2020

There are three minor changes:

  1. add warning log for troubleshooting when auth revision is inconsistent(current,no log is printed when node failed to apply command because of inconsistent auth revision).
    for example.
    17:04:56 etcd1 | {"level":"warn","ts":"2020-03-03T17:04:56.173+0800","caller":"auth/store.go:856","msg":"request auth revision is less than current node auth revision","auth store revision":23,"request auth revision":19,"request key":"hello","error":"auth: revision in header is old"}

  2. no need to save consistentIndex in NewAuthStore(no command is executed) and it will cause the error log to be printed when starting an empty etcd cluster.

  3. print warning log when failed to apply request.

auth/store.go Outdated Show resolved Hide resolved
@codecov-io
Copy link

codecov-io commented Mar 3, 2020

Codecov Report

Merging #11670 into master will decrease coverage by 0.81%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #11670      +/-   ##
==========================================
- Coverage   66.65%   65.83%   -0.82%     
==========================================
  Files         402      402              
  Lines       36655    36673      +18     
==========================================
- Hits        24432    24145     -287     
- Misses      10734    11020     +286     
- Partials     1489     1508      +19
Impacted Files Coverage Δ
etcdserver/util.go 98.8% <100%> (+0.19%) ⬆️
etcdserver/apply.go 79.33% <100%> (+0.03%) ⬆️
auth/store.go 59.41% <14.28%> (-20.37%) ⬇️
clientv3/txn.go 77.77% <0%> (-22.23%) ⬇️
client/members.go 65.32% <0%> (-20.17%) ⬇️
etcdserver/api/v3rpc/util.go 51.61% <0%> (-16.13%) ⬇️
client/client.go 50% <0%> (-15.04%) ⬇️
auth/jwt.go 69.66% <0%> (-7.87%) ⬇️
pkg/tlsutil/tlsutil.go 86.2% <0%> (-6.9%) ⬇️
... and 25 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cb63341...526add1. Read the comment docs.

Copy link
Contributor

@mitake mitake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content LGTM. The commit logs should be auth: .... etcd's commit log format is <package name>: <a short description of the commit>.

@tangcong tangcong closed this Mar 3, 2020
@tangcong tangcong reopened this Mar 3, 2020
@tangcong tangcong changed the title auth/store: optimize auth store log auth: optimize auth store log Mar 3, 2020
@tangcong tangcong changed the title auth: optimize auth store log *: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency problem Mar 10, 2020
@tangcong tangcong changed the title *: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency problem *: optimize auth/etcdserver logs to facilitate troubleshooting data inconsistency Mar 10, 2020
@tangcong
Copy link
Contributor Author

tangcong commented Mar 10, 2020

we encounter another serious data inconsistency issue and can make sure that it has nothing to do with auth revision. we are striving to reproduce it but there is no valid information in etcd log, so it is better to print warning log when failed to apply request. i have added it in this pr.
@jingyih @mitake PTAL. thanks.

@@ -132,6 +132,9 @@ func (a *applierV3backend) Apply(r *pb.InternalRaftRequest) *applyResult {
ar := &applyResult{}
defer func(start time.Time) {
warnOfExpensiveRequest(a.s.getLogger(), start, &pb.InternalRaftStringer{Request: r}, ar.resp, ar.err)
if ar.err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering how much does this affect server throughput. Maybe use debug level to print?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging overhead will be introduced only for failed requests, so I feel adding the warning log would help troubleshooting.

Copy link
Contributor Author

@tangcong tangcong Mar 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes ,i agree with @mitake. if there are too much failed log,something may be wrong with the cluster and users can also adjust the log leve to error. @jingyih

@jingyih
Copy link
Contributor

jingyih commented Mar 10, 2020

we encounter another serious data inconsistency issue and can make sure that it has nothing to do with auth revision. we are striving to reproduce it but there is no valid information in etcd log, so it is better to print warning log when failed to apply request. i have added it in this pr.
@jingyih @mitake PTAL. thanks.

FYI, there is a known data inconsistency bug which is fixed by #11613.

auth/store.go Show resolved Hide resolved
@tangcong
Copy link
Contributor Author

tangcong commented Mar 10, 2020

we encounter another serious data inconsistency issue and can make sure that it has nothing to do with auth revision. we are striving to reproduce it but there is no valid information in etcd log, so it is better to print warning log when failed to apply request. i have added it in this pr.
@jingyih @mitake PTAL. thanks.

FYI, there is a known data inconsistency bug which is fixed by #11613.

We have excluded this bug and have reproduced it, and we are investigating further.

@tangcong
Copy link
Contributor Author

tangcong commented Mar 11, 2020

#11689 @jingyih

we encounter another serious data inconsistency issue and can make sure that it has nothing to do with auth revision. we are striving to reproduce it but there is no valid information in etcd log, so it is better to print warning log when failed to apply request. i have added it in this pr.
@jingyih @mitake PTAL. thanks.

FYI, there is a known data inconsistency bug which is fixed by #11613.

Copy link
Contributor

@mitake mitake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, defer to @jingyih

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tangcong added a commit to tangcong/etcd that referenced this pull request Apr 5, 2020
jingyih added a commit that referenced this pull request Apr 10, 2020
jingyih added a commit that referenced this pull request Apr 10, 2020
jingyih added a commit that referenced this pull request Apr 10, 2020
@tangcong tangcong deleted the optimize-auth-store-log branch February 26, 2021 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants