Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm stm state cleanup #18277

Merged
merged 6 commits into from
May 14, 2024
Merged

Conversation

bharathv
Copy link
Contributor

@bharathv bharathv commented May 7, 2024

Main changes in this PR

  • Makes the state machine rm_stm lean by moving all the types to a separate header/cc
  • Moves all transaction related state to cluster::tx namespace from cluster
  • Removes snapshot v3 related dead code.

The change is purely mechanical. This preps the code for the next PR that moves even more state out of the state machine. The end goal is to make the state machine lean and easy to reason about.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.1.x
  • v23.3.x
  • v23.2.x

Release Notes

  • none

@bharathv
Copy link
Contributor Author

bharathv commented May 7, 2024

/dt

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented May 7, 2024

new failures in https://buildkite.com/redpanda/redpanda/builds/48762#018f50fd-d113-4dd7-8ff3-f4ce242ba3b7:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48762#018f50fd-d115-4dd2-8caf-73486656d256:

"rptest.tests.controller_snapshot_test.ControllerSnapshotPolicyTest.test_upgrade_auto_enable"

new failures in https://buildkite.com/redpanda/redpanda/builds/48762#018f50fd-d10f-4469-a349-bb63573fb6ef:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/48762#018f5105-93e0-4313-ac3e-84f231967828:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.S3"

new failures in https://buildkite.com/redpanda/redpanda/builds/48762#018f5105-93db-4385-b29f-250fa99d2215:

"rptest.tests.controller_snapshot_test.ControllerSnapshotPolicyTest.test_upgrade_auto_enable"

new failures in https://buildkite.com/redpanda/redpanda/builds/48762#018f5105-93de-4c27-bc31-6b015249222d:

"rptest.tests.e2e_shadow_indexing_test.EndToEndThrottlingTest.test_throttling.cloud_storage_type=CloudStorageType.ABS"

new failures in https://buildkite.com/redpanda/redpanda/builds/49048#018f7458-6b5a-4989-9551-f49a77ada713:

"rptest.tests.cloud_storage_scrubber_test.CloudStorageScrubberTest.test_scrubber.cloud_storage_type=CloudStorageType.ABS"

@bharathv bharathv force-pushed the rm_stm_type_cleanup branch from efe0525 to dd9a388 Compare May 9, 2024 18:43
@bharathv
Copy link
Contributor Author

bharathv commented May 9, 2024

/dt

@bharathv bharathv changed the title Rm stm type cleanup rm stm state cleanup May 9, 2024
@bharathv bharathv marked this pull request as ready for review May 9, 2024 22:39
@@ -99,7 +99,8 @@ inline const std::unordered_set<std::string_view> retired_features = {
"mtls_authentication",
"rm_stm_kafka_cache",
"transaction_ga",
};
"idempotency_v2",
"transaction_partitioning"};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you make sure the feature flag is not in use anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/170491930/Features+Cluster+Logical+Versioning is relevant. Once the cluster is fully upgraded, the feature is always true and this coupled with no rolling back requirement across minors guarantees it is always true.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so how do you know all clusters are upgraded so that the feature is always true on all of them?

model::tx_seq tx_seq,
std::chrono::milliseconds transaction_timeout_ms,
model::partition_id tm) {
if (is_transaction_partitioning()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like the last use of function is_transaction_partitioning, should we remove it?


return std::move(builder).build();
}

SEASTAR_THREAD_TEST_CASE(fence_batch_compatibility) {
vlog(logger.info, "Test fence_batch_v1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we still need it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can still read old fence batch versions in the log after the upgrade. (eg: if old data is not truncated).

Copy link
Contributor Author

@bharathv bharathv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added an extra commit on top (that was supposed to come in the next PR) since it is mechanical logging change, so thought it'd make sense to lump with this one.

src/v/features/feature_table.h Show resolved Hide resolved

return std::move(builder).build();
}

SEASTAR_THREAD_TEST_CASE(fence_batch_compatibility) {
vlog(logger.info, "Test fence_batch_v1");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can still read old fence batch versions in the log after the upgrade. (eg: if old data is not truncated).

@@ -99,7 +99,8 @@ inline const std::unordered_set<std::string_view> retired_features = {
"mtls_authentication",
"rm_stm_kafka_cache",
"transaction_ga",
};
"idempotency_v2",
"transaction_partitioning"};
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://redpandadata.atlassian.net/wiki/spaces/CORE/pages/170491930/Features+Cluster+Logical+Versioning is relevant. Once the cluster is fully upgraded, the feature is always true and this coupled with no rolling back requirement across minors guarantees it is always true.

These are always true by default going forward.
@bharathv bharathv force-pushed the rm_stm_type_cleanup branch from dd9a388 to 8f04fd1 Compare May 13, 2024 18:37
src/v/cluster/rm_stm_types.h Outdated Show resolved Hide resolved
bharathv added 5 commits May 13, 2024 14:00
snapshot version=5 is the active version since 23.3.x,
Retiring code related to snapshot version=3.
Next change consolidates more types in this header
consolidate under cluster::tx so we donot polluter top level ns.
Break the monolith of rm_stm further and move unnecessarly logic
out of it.
@bharathv bharathv force-pushed the rm_stm_type_cleanup branch from 8f04fd1 to 63bc2c0 Compare May 13, 2024 22:33
}
};

struct transaction_info {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: now that this is in tx namespace, the name transaction_info has additional gravity :) Can we modify the name to hint that this is about a single partition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya, this is changing in the next PR, trying to cleanup a lot of stuff in this area.

@piyushredpanda piyushredpanda merged commit d76e9a6 into redpanda-data:dev May 14, 2024
14 of 17 checks passed
@bharathv bharathv deleted the rm_stm_type_cleanup branch May 14, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants