Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: use txn for saving timestamp #6199

Merged
merged 6 commits into from
Mar 22, 2023
Merged

*: use txn for saving timestamp #6199

merged 6 commits into from
Mar 22, 2023

Conversation

rleungx
Copy link
Member

@rleungx rleungx commented Mar 21, 2023

What problem does this PR solve?

Issue Number: Ref #5895.

What is changed and how does it work?

This PR supports using Txn to save the timestamp to prevent TSO from falling back when two places advance at the same time.

Check List

Tests

  • Unit test

Release note

None.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Mar 21, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • JmPotato
  • lhy1024

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added the release-note-none Denotes a PR that doesn't merit a release note. label Mar 21, 2023
@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Mar 21, 2023
@rleungx rleungx removed the request for review from nolouch March 21, 2023 07:08
@rleungx rleungx force-pushed the tso-txn branch 2 times, most recently from a7a59b2 to 087c36d Compare March 21, 2023 07:14
@rleungx rleungx requested a review from JmPotato March 21, 2023 07:20
@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Mar 21, 2023
@JmPotato
Copy link
Member

/merge

@ti-chi-bot
Copy link
Member

@JmPotato: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 087c36d

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 21, 2023
Signed-off-by: Ryan Leung <rleungx@gmail.com>
@lhy1024
Copy link
Contributor

lhy1024 commented Mar 21, 2023

ci failed

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@ti-chi-bot ti-chi-bot removed the status/can-merge Indicates a PR has been approved by a committer. label Mar 21, 2023
@rleungx
Copy link
Member Author

rleungx commented Mar 21, 2023

/hold

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 21, 2023
Copy link
Contributor

@binshi-bing binshi-bing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left a few comments.

func (se *StorageEndpoint) SaveTimestamp(prefix string, key string, ts time.Time) error {
return se.RunInTxn(context.Background(), func(txn kv.Txn) error {
prefixEnd := clientv3.GetPrefixRangeEnd(prefix)
keys, values, err := txn.LoadRange(prefix, prefixEnd, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my understanding is correct, for global location, we use '/pd/[cluster-id]' to do prefix scan, but there are a lot of things, under this path, irrelevant to tso. In order to "“LoadTimestamp will get all time windows of Local/Global TSOs from etcd and return the biggest one.”, shall we use the /pd//timestamp/ for local TSOs?

previousTS := typeutil.ZeroTime
for i, key := range keys {
key := strings.TrimSpace(key)
if !strings.HasSuffix(key, timestampKey) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this condition "!strings.HasSuffix(key, timestampKey)" too loose, considering we're using "/pd/[cluster-id]" to do prefix range scan? Could there be chance that 'timestamp' suffix are added to the path for other use cases?

return nil
}
data := typeutil.Uint64ToBytes(uint64(ts.UnixNano()))
return txn.Save(key, string(data))
Copy link
Contributor

@binshi-bing binshi-bing Mar 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because LoadTimestamp() use the logic to do prefix scan then return the largest timestamp among all timestamps, it seems that we don't need the whole change in this pr -- tso service doesn't need to use the same path as pd's but the same prefix for timestamp range scan, then anyway LoadTimestamp() will return the largest one among the last writes of pd and tso service. Thus we can avoid using the expensive transaction here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this transaction, we scan all the dir and sub-dir of /pd/【cluster-id】 in a transaction, which is too expensive.

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Signed-off-by: Ryan Leung <rleungx@gmail.com>
Copy link
Contributor

@binshi-bing binshi-bing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

return nil
}
data := typeutil.Uint64ToBytes(uint64(ts.UnixNano()))
return txn.Save(key, string(data))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this transaction, we scan all the dir and sub-dir of /pd/【cluster-id】 in a transaction, which is too expensive.

}
data := typeutil.Uint64ToBytes(uint64(ts.UnixNano()))
return txn.Save(key, string(data))
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this pr,even LoadTimestamp is invoked only once after it becomes the leader/primary,when we switch service mode, we need to ensure the it campaigns to be the leader/primary then invoke LoadTimestamp() to load the largest one among the last writes of pd and tso service which use the same prefix but different paths.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

already change to Load which will only get one key.

}

previousTS := typeutil.ZeroTime
if value != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if value != "" {
if len(value) > 0 {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is not much difference.

}
}
if previousTS != typeutil.ZeroTime && typeutil.SubRealTimeByWallClock(ts, previousTS) <= 0 {
log.Warn("save timestamp failed, the timestamp is not bigger than the previous one", zap.Time("previous", previousTS), zap.Time("current", ts))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this log line? It seems to be a normal case.

@codecov
Copy link

codecov bot commented Mar 22, 2023

Codecov Report

Patch coverage: 78.57% and project coverage change: +0.04 🎉

Comparison is base (738e15f) 74.77% compared to head (b4d822a) 74.81%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6199      +/-   ##
==========================================
+ Coverage   74.77%   74.81%   +0.04%     
==========================================
  Files         395      395              
  Lines       38704    38716      +12     
==========================================
+ Hits        28939    28966      +27     
+ Misses       7232     7225       -7     
+ Partials     2533     2525       -8     
Flag Coverage Δ
unittests 74.81% <78.57%> (+0.04%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
pkg/storage/endpoint/tso.go 83.87% <78.57%> (+4.92%) ⬆️

... and 22 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

Signed-off-by: Ryan Leung <rleungx@gmail.com>
@rleungx
Copy link
Member Author

rleungx commented Mar 22, 2023

/hold cancel

@ti-chi-bot ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2023
@rleungx
Copy link
Member Author

rleungx commented Mar 22, 2023

/merge

@ti-chi-bot
Copy link
Member

@rleungx: It seems you want to merge this PR, I will help you trigger all the tests:

/run-all-tests

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: d491ede

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Mar 22, 2023
@ti-chi-bot
Copy link
Member

@rleungx: Your PR was out of date, I have automatically updated it for you.

If the CI test fails, you just re-trigger the test that failed and the bot will merge the PR for you after the CI passes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@lhy1024
Copy link
Contributor

lhy1024 commented Mar 22, 2023

ci failed

@rleungx
Copy link
Member Author

rleungx commented Mar 22, 2023

/hold

@ti-chi-bot ti-chi-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2023
@rleungx
Copy link
Member Author

rleungx commented Mar 22, 2023

/hold cancel

@ti-chi-bot ti-chi-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 22, 2023
@ti-chi-bot ti-chi-bot merged commit 27b9474 into tikv:master Mar 22, 2023
@rleungx rleungx deleted the tso-txn branch March 22, 2023 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note-none Denotes a PR that doesn't merit a release note. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants