Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor/testing domain update callback #6365

Conversation

davidporter-id-au
Copy link
Member

@davidporter-id-au davidporter-id-au commented Oct 16, 2024

What changed?

This predominantly adds some unit test coverage for the domain callback function triggered during domain updates and failovers. It also does some refactoring to enable such test coverage and fixes a few associated minor issues.

Refactoring

  • pulled the domain change callback into it's own method
  • pulled graceful failover checks into their own method
  • pulled locking/unlocking into its own method

failover documentation

The sheer number of domain notification and version fields is quite confusing and requires a lot of careful understanding to even understand how they interact. The tests setup use some actual data examples and lay out a snapshot from the database of each of the scenarios before encoding that into assertions, along with some brief explanation of how notification versions and failover versions work.

Misc other changes:

  • removes not-used cross-cluster config (which is easily confused with the XDC config and I wasted a bunch of time editing dead config).
  • Adds the necessary shard-manager config to the XDC config to enable it to work
  • fixes noisy log for failovers complaining about nongraceful failover (-1) type failovers.

Why?
This is intended to improve the reliability and coverage.

How did you test it?
Tested locally with three clusters talking to one another.

Potential risks
Assuming a problem here, this could break domain failover and possibly replication

Release notes

Documentation Changes

@@ -53,107 +53,121 @@ func (e *historyEngineImpl) registerDomainFailoverCallback() {
// failover min / max task levels calculated & updated to shard (using shard lock) -> failover start
// above 2 guarantees that failover start is after persistence of the task.

failoverPredicate := func(shardNotificationVersion int64, nextDomain *cache.DomainCacheEntry, action func()) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this large function has been decomposed into a few methods. It's not super easy to read as a diff, but hopefully it's more readable with a degree of decomposition

@@ -77,7 +77,7 @@ func NewParams(serviceName string, config *config.Config, dc *dynamicconfig.Coll
return Params{}, fmt.Errorf("inbound TLS config: %v", err)
}
outboundTLS := map[string]*tls.Config{}
for _, outboundServiceName := range service.List {
for _, outboundServiceName := range service.ShardedServicesList {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was to fix a startup issue where the services were not being configured correctly on start. It might be no longer necessary, I might remove it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you move it to a separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good call

@davidporter-id-au davidporter-id-au changed the title Feature/adding test refactor/testing domain update callback Oct 16, 2024
@davidporter-id-au davidporter-id-au marked this pull request as ready for review October 16, 2024 02:22
Copy link

codecov bot commented Oct 16, 2024

Codecov Report

Attention: Patch coverage is 79.26829% with 17 lines in your changes missing coverage. Please review.

Project coverage is 74.92%. Comparing base (5270ea8) to head (4117969).

Files with missing lines Patch % Lines
...ne/engineimpl/register_domain_failover_callback.go 76.38% 14 Missing and 3 partials ⚠️
Additional details and impacted files
Files with missing lines Coverage Δ
common/cache/domainCache.go 75.28% <100.00%> (+0.17%) ⬆️
common/rpc/params.go 78.21% <100.00%> (ø)
common/service/name.go 100.00% <ø> (ø)
...ne/engineimpl/register_domain_failover_callback.go 77.92% <76.38%> (+72.03%) ⬆️

... and 9 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5270ea8...4117969. Read the comment docs.

@davidporter-id-au davidporter-id-au merged commit f2452ed into cadence-workflow:master Oct 16, 2024
18 checks passed
@davidporter-id-au davidporter-id-au deleted the feature/adding-test branch October 16, 2024 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants