Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tx/rm_stm: deal with duplicate begin_tx requests #18187

Merged
merged 1 commit into from
May 2, 2024

Conversation

bharathv
Copy link
Contributor

Chaos logs showed that there can be duplicate (back to back) begin_tx requests due to certain client quirks beyond our control. We had some checks in place to ignore them but they were invalidated by #18076 which redefined the transaction boundaries. After #18076 an entry in the ongoing_map does not mean the data has been replicated as a part of the transaction. This PR adjusts the checks accordingly.

Note: This code is going away soon, so we will have better API in the producer_state to make the code more easily understandable. This PR is to make chaos happy until then.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

Chaos logs showed that there can be duplicate (back to back) begin_tx
requests due to some certain client quirks beyond our control. We had
some checks in place to ignore them but they were invalidated by
redpanda-data#18076 which redefined the
transaction boundries.

After redpanda-data#18076 an entry in the ongoing_map does not mean the data has been
replicated as a part of the transaction. This PR adjusts the checks
accordingly.

Note: This code is going away soon, so we will have better API in the
producer_state to make the code more easily understandable. This PR is
to make chaos happy until then.
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 1, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/48540#018f3196-772a-4c10-8f40-4dc1748d061b:

"rptest.tests.read_replica_e2e_test.ReadReplicasUpgradeTest.test_upgrades.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48540#018f3196-772e-4b6d-9378-c928356f7ec3:

"rptest.tests.upgrade_test.UpgradeFromPriorFeatureVersionCloudStorageTest.test_rolling_upgrade.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48540#018f319f-1a85-488f-91b2-cb1e3bd69456:

"rptest.tests.read_replica_e2e_test.ReadReplicasUpgradeTest.test_upgrades.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_cross_shard.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"
"rptest.tests.partition_movement_test.SIPartitionMovementTest.test_shadow_indexing.num_to_upgrade=2.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48540#018f319f-1a8e-4294-9e0b-78a68b967939:

"rptest.tests.upgrade_test.UpgradeFromPriorFeatureVersionCloudStorageTest.test_rolling_upgrade.cloud_storage_type=CloudStorageType.S3"

@bharathv bharathv requested review from ztlpn and mmaslankaprv May 1, 2024 03:08
@mmaslankaprv mmaslankaprv requested a review from bashtanov May 2, 2024 07:12
@mmaslankaprv
Copy link
Member

It the duplicated begin_tx request identical to the first one i.e. the sequence number, epoch, etc ?

@bharathv
Copy link
Contributor Author

bharathv commented May 2, 2024

It the duplicated begin_tx request identical to the first one i.e. the sequence number, epoch, etc ?

yes.. the check right above the changed lines of code checks for it..

    auto txseq_it = _log_state.current_txes.find(pid);
    if (txseq_it != _log_state.current_txes.end()) {
        if (txseq_it->second.tx_seq != tx_seq) {
            vlog(
              _ctx_log.warn,
              "can't begin a tx {} with tx_seq {}: a producer id is already "
              "involved in a tx with tx_seq {}",
              pid,
              tx_seq,
              txseq_it->second.tx_seq);
            co_return tx_errc::unknown_server_error;
        }

@bharathv bharathv requested a review from mmaslankaprv May 2, 2024 16:22
@bharathv
Copy link
Contributor Author

bharathv commented May 2, 2024

/ci-repeat 1

@piyushredpanda
Copy link
Contributor

Failures are all known and fixed by #18199

@piyushredpanda piyushredpanda merged commit ebdbc4d into redpanda-data:dev May 2, 2024
12 of 18 checks passed
@vbotbuildovich
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants