Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alias property names do not work correctly while bootstrapping from redpanda.yaml #13362

Closed
andijcr opened this issue Sep 11, 2023 · 1 comment · Fixed by #15725 · May be fixed by #13820
Closed

alias property names do not work correctly while bootstrapping from redpanda.yaml #13362

andijcr opened this issue Sep 11, 2023 · 1 comment · Fixed by #15725 · May be fixed by #13820
Assignees
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages

Comments

@andijcr
Copy link
Contributor

andijcr commented Sep 11, 2023

Version & Environment

Redpanda version: > v23.2 (when property aliases were introduced):

What went wrong?

Aliases for properties can be used in a running redpanda cluster instead of the property name.
This feature is meant to evolve property names while allowing some backward compatibility.

edit: this should be fixed with #15605

Properties can also be set via repdanda.yaml while bootstrapping in a new node; this is for legacy reasons and to allow some disaster recovery.

Properties aliases don't work in this context: this is due to this incorrect block of code https://github.com/redpanda-data/redpanda/blob/1c3069e06f5e79e9c1e3b021ae30c21388e26347/src/v/config/config_store.h#L71-L87 the lookup in aliases is done in the block that will already skip setting the value.~~~

Note that only fixing this is not enough: with delete_retention_ms -> log_retention_ms alias,

def test_upgrade_redpanda_yaml(self):
"""
Verify that on first start, values are imported from redpanda.yaml
to facilitate upgrades, but on subsequent startups we just emit
a log message to say we're ignoring them.
"""
node = self.redpanda.nodes[0]
admin = Admin(self.redpanda)
# Since we skip RedpandaService.start, must clean node explicitly
self.redpanda.clean_node(node)
# Start node outside of the usual RedpandaService.start, so that we
# skip writing out bootstrap.yaml files (the presence of which disables
# the upgrade import of values from redpanda.yaml)
self.redpanda.start_node(
node, override_cfg_params={'delete_retention_ms': '9876'})
# On first startup, redpanda should notice the value in
# redpanda.yaml and import it into central config store
assert admin.get_cluster_config()['delete_retention_ms'] == 9876
# On second startup, central config is already initialized,
# so the modified value in redpanda.yaml should be ignored.
self.redpanda.restart_nodes(
[node], override_cfg_params={'delete_retention_ms': '1234'})
assert admin.get_cluster_config()['delete_retention_ms'] == 9876
assert self.redpanda.search_log_any(
"Ignoring value for 'delete_retention_ms'")
this test fails even after applying the above fix. The failure is not in accepting the alias but in the fact that at the second reboot, the property is reset back to an earlier value with a mechanism that is not yet clear. see these two log lines where a property was set via an alias cloud_storage_graceful_transfer_timeout to 1235, but after reboot the value is reset to an earlier value:

TRACE 2023-09-08 09:15:47,340 [shard 0:main] cluster - config_manager.cc:313 - Loaded property cloud_storage_graceful_transfer_timeout=1235 from local cache
TRACE 2023-09-08 09:15:47,342 [shard 0:main] cluster - config_manager.cc:313 - Loaded property cloud_storage_graceful_transfer_timeout_ms=1234 from local cache

What should have happened instead?

alias names for properties should work in redpanda.yaml bootstrapping

@andijcr andijcr added the kind/bug Something isn't working label Sep 11, 2023
@andijcr andijcr self-assigned this Sep 11, 2023
@andijcr andijcr added area/cloud-storage Shadow indexing subsystem sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages labels Sep 11, 2023
@andijcr
Copy link
Contributor Author

andijcr commented Sep 11, 2023

sev/low since the use is niche - disaster recovery by a human operator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/cloud-storage Shadow indexing subsystem kind/bug Something isn't working sev/low Bugs which are non-functional paper cuts, e.g. typos, issues in log messages
Projects
None yet
1 participant