Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cluster/config_manager: fix double apply of configs #13820

Draft
wants to merge 15 commits into
base: dev
Choose a base branch
from

Conversation

andijcr
Copy link
Contributor

@andijcr andijcr commented Sep 29, 2023

previously, to support validation, the code would call a method and then call it again with a smp::invoke_on_all.

this pr preserve this behavior but uses invoke_on_others to eliminate the double-apply on the current shard.

to support this, the lambdas needs to be no_except_copy/move, so the parameters are taken by pointer, since they live in the parent scope

Fixes #13362
Fixes #13787

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.2.x
  • v23.1.x
  • v22.3.x

Release Notes

  • none
  • none

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

previously, to support validation, the code would call a method and then
call it again with a smp::invoke_on_all.

this pr preserve this behavior but uses invoke_on_others to eliminate
the double-apply on the current shard.

to support this, the lambdas needs to be no_except_copy/move, so the
parameters are taken by pointer, since they live in the parent scope
@andijcr andijcr force-pushed the fix/config_issues branch from 5b82e4c to e8d6bd3 Compare October 2, 2023 11:52
@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@andijcr
Copy link
Contributor Author

andijcr commented Oct 4, 2023

note: i've been following admin_sever::patch_cluster_config_handler and there are some issues there as well, seems like a configuration is available in the shard before it has been propagated, and that is in contrast with the purpose of api validation

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@andijcr
Copy link
Contributor Author

andijcr commented Oct 6, 2023

Found the manifestation of the alias not persisting after a reboot

In this excerpt, after a restart, the node rejoins and applies a snapshot of configs.

From the log, delete_retention_ms has the new value, but just after that, another entry has log_retention_ms set to the old value, and since it comes after, it overwrites the value.

The name log_retention_ms appears even though the test only applied the old name. This is because when dumping the properties, the new name is used, and somehow, this was merged with a dump where the old names appear (and the old name actually has the new value!)

This means that there is a need to normalize these values, and that in the context of partial upgrades + rollback this might result in the original cluster losing some properties, so a test needs to be crafted for this scenario

 530 
 531 
 532 Welcome to the Redpanda community!
 533 
 534 Documentation: https://docs.redpanda.com - Product documentation site
 535 GitHub Discussion: https://github.com/redpanda-data/redpanda/discussions - Longer, more involved discussions
 536 GitHub Issues: https://github.com/redpanda-data/redpanda/issues - Report and track issues with the codebase
 537 Support: https://support.redpanda.com - Contact the support team privately
 538 Product Feedback: https://redpanda.com/feedback - Let us know how we can improve your experience
 539 Slack: https://redpanda.com/slack - Chat about all things Redpanda. Join the conversation!
 540 Twitter: https://twitter.com/redpandadata - All the latest Redpanda news!
 541 
 542 
 543 INFO  2023-10-06 10:11:43,633 seastar - Reactor backend: linux-aio
 544 INFO  2023-10-06 10:11:43,641 [shard 0:main] seastar - Created fair group io-queue-0 for 2 queues, capacity rate 2147483:2147483, limit 12582912, rate 16777216 (factor 1), threshold 2000, per tick grab 6291456
 545 INFO  2023-10-06 10:11:43,641 [shard 0:main] seastar - IO queue uses 0.75ms latency goal for device 0
 546 INFO  2023-10-06 10:11:43,641 [shard 0:main] seastar - Created io group dev(0), length limit 4194304:4194304, rate 2147483647:2147483647
 547 INFO  2023-10-06 10:11:43,641 [shard 0:main] seastar - Created io queue dev(0) capacities: 512:2000:2000 1024:3000:3000 2048:5000:5000 4096:9000:9000 8192:17000:17000 16384:33000:33000 32768:65000:65000 65536:129000:129000 131072:257000:257000
 548 INFO  2023-10-06 10:11:43,642 [shard 0:main] main - application.cc:351 - Redpanda v23.3.0-dev-766-gb7ba0a8c37 - b7ba0a8c379e1c00095174499d3c43fcfe6158f8-dirty
 549 INFO  2023-10-06 10:11:43,642 [shard 0:main] main - application.cc:359 - kernel=6.2.0-34-generic, nodename=057835480b2c, machine=x86_64
 550 INFO  2023-10-06 10:11:43,642 [shard 0:main] main - application.cc:287 - System resources: { cpus: 2, available memory: 2.000GiB, reserved memory: 0.000bytes}
 551 INFO  2023-10-06 10:11:43,642 [shard 0:main] main - application.cc:295 - File handle limit: 65535/65535
 552 INFO  2023-10-06 10:11:43,645 [shard 0:main] cluster - config_manager.cc:313 - Loaded property cluster_id=f826043f-afed-4074-87cd-27363ab446c8 from local cache
 553 INFO  2023-10-06 10:11:43,645 [shard 0:main] cluster - config_manager.cc:313 - Loaded property default_topic_partitions=4 from local cache
 554 INFO  2023-10-06 10:11:43,645 [shard 0:main] cluster - config_manager.cc:313 - Loaded property delete_retention_ms=5678 from local cache
 555 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property enable_metrics_reporter=false from local cache
 556 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property join_retry_timeout_ms=200 from local cache
 557 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property kafka_connections_max=2048 from local cache
 558 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property kafka_connections_max_overrides=[1.2.3.4:5] from local cache
 559 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property kafka_connections_max_per_ip=1024 from local cache
 560 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property log_retention_ms=1234 from local cache
 561 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property log_segment_size_jitter_percent=0 from local cache
 562 INFO  2023-10-06 10:11:43,646 [shard 0:main] cluster - config_manager.cc:313 - Loaded property superusers=[admin] from local cache

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@vbotbuildovich
Copy link
Collaborator

@piyushredpanda
Copy link
Contributor

@andijcr : Is this PR still valid? What is next here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants