Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

session: make tidb_disable_txn_auto_retry disable retry for the explicit transactions #10339

Merged
merged 2 commits into from
May 5, 2019

Conversation

jackysp
Copy link
Member

@jackysp jackysp commented May 4, 2019

Signed-off-by: Shuaipeng Yu jackysp@gmail.com

What problem does this PR solve?

It is not a good idea to make tidb_disable_txn_auto_retry deal with the write-conflict error only. Some rigorous tests e.g. jepsen will find that this will make the transaction behave abnormally. Although we can set tidb_retry_limit = 0 to disable retry either.

What is changed and how it works?

Make tidb_disable_txn_auto_retry work the same as before, it will disable retry for the explicit transactions, no matter what kind of error.

Check List

Tests

  • Unit test

Code changes

  • Has exported function/method change

Signed-off-by: Shuaipeng Yu <jackysp@gmail.com>
@jackysp jackysp added the type/enhancement The issue or PR belongs to an enhancement. label May 4, 2019
@jackysp jackysp requested review from AndreMouche, disksing and lysu May 4, 2019 11:13
@codecov
Copy link

codecov bot commented May 4, 2019

Codecov Report

Merging #10339 into master will decrease coverage by 0.0339%.
The diff coverage is 100%.

@@               Coverage Diff               @@
##             master     #10339       +/-   ##
===============================================
- Coverage   77.6992%   77.6652%   -0.034%     
===============================================
  Files           411        411               
  Lines         85441      85405       -36     
===============================================
- Hits          66387      66330       -57     
- Misses        14095      14111       +16     
- Partials       4959       4964        +5

@codecov
Copy link

codecov bot commented May 4, 2019

Codecov Report

Merging #10339 into master will decrease coverage by 0.0114%.
The diff coverage is 100%.

@@               Coverage Diff                @@
##             master     #10339        +/-   ##
================================================
- Coverage   77.6767%   77.6652%   -0.0115%     
================================================
  Files           411        411                
  Lines         85440      85405        -35     
================================================
- Hits          66367      66330        -37     
+ Misses        14112      14111         -1     
- Partials       4961       4964         +3

@jackysp
Copy link
Member Author

jackysp commented May 4, 2019

/run-all-tests

@jackysp
Copy link
Member Author

jackysp commented May 4, 2019

/run-unit-test

@siddontang
Copy link
Member

do we need to set tidb_retry_limit to 0 by default later?

@siddontang
Copy link
Member

Please test this in jepsen @overvenus

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use failpoint to add some tests ?

@shenli
Copy link
Member

shenli commented May 4, 2019

@siddontang Yes, I think so. But for autocommit txn, we could do the retry.

@shenli
Copy link
Member

shenli commented May 4, 2019

Should we merge this PR after #10266 ? I think this PR will make user's query easier to fail.

@shenli shenli added the priority/P1 The issue has P1 priority. label May 4, 2019
@winkyao
Copy link
Contributor

winkyao commented May 4, 2019

@jackysp
Copy link
Member Author

jackysp commented May 4, 2019

Could you use failpoint to add some tests?

I think it is not necessary, this pr actually reverts something to it used to be.

Is this PR duplicated with https://github.com/pingcap/tidb/pull/10266/files#diff-928cb614ae092ed4c6f349f4c4c991e4R408?

It is part of #10266, I think it is better to split it out of that pr.

@winkyao

@jackysp
Copy link
Member Author

jackysp commented May 4, 2019

Should we merge this PR after #10266 ? I think this PR will make user's query easier to fail.

I think no need to do it. tidb_disable_txn_auto_retry is not suggested to our users to use it. It works for write conflict recently, it didn't retry all errors for a long time. @shenli

@jackysp
Copy link
Member Author

jackysp commented May 4, 2019

do we need to set tidb_retry_limit to 0 by default later?

I think we could set tidb_disable_txn_auto_retry to 1 by default later. Then the auto-commit transactions could retry. @siddontang

@siddontang
Copy link
Member

Then the auto-commit transactions could retry

@jackysp so we still need to improve other retires like PD crashed later?

@jackysp
Copy link
Member Author

jackysp commented May 5, 2019

Then the auto-commit transactions could retry

@jackysp so we still need to improve other retires like PD crashed later?

We could improve the backoff in TiDB.

@disksing
Copy link
Contributor

disksing commented May 5, 2019

LGTM

Copy link
Contributor

@winkyao winkyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lysu lysu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jackysp jackysp deleted the revert_retry branch May 29, 2019 07:36
@liuhuanHappyStudy
Copy link

I am a little confused, how can I solve the write conflict? Can I still do this on the official website?

9.1.1 Error 8005 (HY000): Write conflict, txnStartTS is outdated
You can check if the tidb_disable_txn_auto_retry is on. If so, set it to off; if it is already off, increase the tidb_retry_limit until the error no longer occurs.
@jackysp

@jackysp
Copy link
Member Author

jackysp commented Oct 15, 2019

Yes, you could follow our documents on the official website. Or you could catch the error code and retry the transaction in the application logic. @liuhuanHappyStudy

@liuhuanHappyStudy
Copy link

Thank you for your reply!
If we use the method recommended by the official website, can we guarantee data consistency? I am still a little confused because I saw some people say that it will lead to data inconsistency. Also, I saw that in your reply above: Tidb_disable_txn_auto_retry is not suggested to our users to use it. It works for write conflict recently, it didn't retry all errors for a long time. For some financial companies, they are very sensitive to data. For such enterprises, what advice do you have for solving the problem of writing conflicts? Thank you very much!
@jackysp

@jackysp
Copy link
Member Author

jackysp commented Oct 16, 2019

I think the data consistency here refers to the problem that the isolation level is broken. Indeed, retrying brings this problem, the same as the Read Committed isolation level, it has the same problem as retrying. If the application is written using the Read Committed isolation level, there should be no problems. Because the application has handled this problem. I think many financial scenarios use the Read Committed isolation level because they like to use Oracle, the default level of Oracle is Read Committed. However, it is recommended to disable retry to capture the error code yourself to avoid this problem. @liuhuanHappyStudy

@liuhuanHappyStudy
Copy link

Thank you very much for your advice!
@jackysp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/P1 The issue has P1 priority. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants