-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
state: ensure that identical manual virtual IP updates result in not bumping the modify indexes #21909
base: main
Are you sure you want to change the base?
Conversation
…bumping the modify indexes
@@ -1106,6 +1108,9 @@ func (s *Store) AssignManualServiceVIPs(idx uint64, psn structs.PeeredServiceNam | |||
for _, ip := range ips { | |||
assignedIPs[ip] = struct{}{} | |||
} | |||
|
|||
txnNeedsCommit := false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is practically an issue, but I did notice that the logic was:
begin txn
maybe write
maybe early return
write
commit
and with this change i fixed it to
begin txn
maybe write
maybe write
maybe commit
|
||
newEntry.ManualIPs = filteredIPs | ||
newEntry.ModifyIndex = idx | ||
if err := tx.Insert(tableServiceVirtualIPs, newEntry); err != nil { | ||
return false, nil, fmt.Errorf("failed inserting service virtual IP entry: %s", err) | ||
} | ||
modifiedEntries[newEntry.Service] = struct{}{} | ||
|
||
if err := updateVirtualIPMaxIndexes(tx, idx, thisServiceName.PartitionOrDefault(), thisPeer); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we were not updating the max index table for the entries that had VIPs stolen from them.
@@ -1130,13 +1141,20 @@ func (s *Store) AssignManualServiceVIPs(idx uint64, psn structs.PeeredServiceNam | |||
filteredIPs = append(filteredIPs, existingIP) | |||
} | |||
} | |||
sort.Strings(filteredIPs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we were storing VIPs in whatever order they happened to be in. It seemed silly to not be sorting them.
agent/consul/state/catalog.go
Outdated
if err := updateVirtualIPMaxIndexes(tx, idx, psn.ServiceName.PartitionOrDefault(), psn.Peer); err != nil { | ||
return false, nil, err | ||
// Check to see if the slice already contains the same ips. | ||
if !vipSliceEqualsMapKeys(newEntry.ManualIPs, assignedIPs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key part of the fix.
func updateVirtualIPMaxIndexes(txn WriteTxn, idx uint64, partition, peerName string) error { | ||
// update global max index (for snapshots) | ||
if err := indexUpdateMaxTxn(txn, idx, tableServiceVirtualIPs); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The snapshot logic grabs the max index from this table without peering/partition prefixes, so in order for that to be more correct we update the un-prefixed index here too.
return lastIndex | ||
} | ||
|
||
testutil.RunStep(t, "assign to nonexistent service is noop", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New effective start to the test, using the variety of helpers above to hopefully make this clearer to read.
// No manual IP should be set yet. | ||
checkManualVIP(t, psn, "0.0.0.1", []string{}, regIndex1) | ||
|
||
checkMaxIndexes(t, regIndex1, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note now we actually verify the max index table is correctly updated.
} else { | ||
require.Equal(t, expectManual, serviceVIP.ManualIPs) | ||
} | ||
require.Equal(t, expectIndex, serviceVIP.ModifyIndex) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these tests will verify that the various entries did or did not have their modify index updated when writes occur.
checkMaxIndexes(t, assignIndex4, assignIndex4) | ||
}) | ||
|
||
testutil.RunStep(t, "repeat the last write and no indexes should be bumped", func(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the test that repeating a write doesn't actually change anything.
if err != nil { | ||
return fmt.Errorf("error checking for existing manual ips for service: %w", err) | ||
} | ||
if existingIPs != nil && stringslice.EqualMapKeys(existingIPs.ManualIPs, vipMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we just return the same positive response that the FSM would have generated in this no-op case without all of the raft expense.
} else { | ||
if again { | ||
require.Equal(t, tc.expectAgain, resp) | ||
require.Equal(t, idx1, idx2, "no raft operations occurred") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the cheapest hack I could do to verify the "skip raft" behavior without crazy refactoring of the Server behavior.
vipMap[ip] = struct{}{} | ||
} | ||
// Silently ignore duplicates. | ||
args.ManualVIPs = maps.Keys(vipMap) | ||
|
||
psn := structs.PeeredServiceName{ | ||
ServiceName: structs.NewServiceName(args.Service, &args.EnterpriseMeta), | ||
} | ||
|
||
// Check to see if we can skip the raft apply entirely. | ||
{ | ||
existingIPs, err := m.srv.fsm.State().ServiceManualVIPs(psn) | ||
if err != nil { | ||
return fmt.Errorf("error checking for existing manual ips for service: %w", err) | ||
} | ||
if existingIPs != nil && stringslice.EqualMapKeys(existingIPs.ManualIPs, vipMap) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we do similar thing for writing service nodes, but thinking about this isn't it racy? Another request could be writing this piece of data right after we read it from the state store.
It's safe to do in the fsm because the fsm is single threaded, but here I'm not sure 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically each peered-service-name (PSN) should only be manipulated by one entity at a time externally. In the case of consul-k8s
that is the endpoints controller (EC) workflow mostly. Even if you imagine rearranging the EC to run with more than one instance, sharing the work we'd likely shard it by PSN name so there wouldn't be two active writers.
Ideally we'd update the EC code to do a read-before-write check like this to avoid a duplicate write as you'd expect with a controller-type workflow.
There is also a lot of prior art about this sort of thing, like for all config entry writes and the catalog as you pointed out.
Description
The
consul-k8s
endpoints controller issues catalog register and manual virtual ip updates without first checking to see if the updates would be effectively not changing anything. This is supposed to be reasonable because the state store functions do the check for a no-op update and should discard repeat updates so that downstream blocking queries watching one of the resources don't fire pointlessly (and CPU wastefully).While this is true for the check/service/node catalog updates, it is not true for the "manual virtual ip" updates triggered by the
PUT /v1/internal/service-virtual-ip
. Forcing the connect injector pod to recycle while watching some lightly modified FSM code can show that a lot of updates are of theupdate list of ips from [A] to [A]
. Immediately following this stray update you can see a lot of activity inproxycfg
andxds
packages waking up due to blocking queries triggered by this.This PR skips updates that change nothing both:
Testing & Reproduction steps
consul-k8s
+kind
with 2 connect-enabled servicesPR Checklist
[ ] external facing docs updated