Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

state: ensure that identical manual virtual IP updates result in not bumping the modify indexes #21909

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

rboyer
Copy link
Member

@rboyer rboyer commented Oct 31, 2024

Description

The consul-k8s endpoints controller issues catalog register and manual virtual ip updates without first checking to see if the updates would be effectively not changing anything. This is supposed to be reasonable because the state store functions do the check for a no-op update and should discard repeat updates so that downstream blocking queries watching one of the resources don't fire pointlessly (and CPU wastefully).

While this is true for the check/service/node catalog updates, it is not true for the "manual virtual ip" updates triggered by the PUT /v1/internal/service-virtual-ip. Forcing the connect injector pod to recycle while watching some lightly modified FSM code can show that a lot of updates are of the update list of ips from [A] to [A]. Immediately following this stray update you can see a lot of activity in proxycfg and xds packages waking up due to blocking queries triggered by this.

This PR skips updates that change nothing both:

  • at the RPC layer before passing it to raft (ideally)
  • if the write does make it through raft and get applied to the FSM (failsafe)

Testing & Reproduction steps

  • Deployed a small 1-node cluster using consul-k8s + kind with 2 connect-enabled services
  • Watched the server debug logs before/during/after recycling the connect injector pod.
  • Before you could see the api PUTs immediately preceding proxycfg/xds activity.
  • After you no longer see these as often.

PR Checklist

  • updated test coverage
  • [ ] external facing docs updated
  • appropriate backport labels added
  • not a security concern

@rboyer rboyer self-assigned this Oct 31, 2024
@rboyer rboyer requested a review from a team as a code owner October 31, 2024 19:57
@@ -1106,6 +1108,9 @@ func (s *Store) AssignManualServiceVIPs(idx uint64, psn structs.PeeredServiceNam
for _, ip := range ips {
assignedIPs[ip] = struct{}{}
}

txnNeedsCommit := false
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is practically an issue, but I did notice that the logic was:

begin txn
maybe write
maybe early return
write
commit

and with this change i fixed it to

begin txn
maybe write
maybe write
maybe commit


newEntry.ManualIPs = filteredIPs
newEntry.ModifyIndex = idx
if err := tx.Insert(tableServiceVirtualIPs, newEntry); err != nil {
return false, nil, fmt.Errorf("failed inserting service virtual IP entry: %s", err)
}
modifiedEntries[newEntry.Service] = struct{}{}

if err := updateVirtualIPMaxIndexes(tx, idx, thisServiceName.PartitionOrDefault(), thisPeer); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were not updating the max index table for the entries that had VIPs stolen from them.

@@ -1130,13 +1141,20 @@ func (s *Store) AssignManualServiceVIPs(idx uint64, psn structs.PeeredServiceNam
filteredIPs = append(filteredIPs, existingIP)
}
}
sort.Strings(filteredIPs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we were storing VIPs in whatever order they happened to be in. It seemed silly to not be sorting them.

if err := updateVirtualIPMaxIndexes(tx, idx, psn.ServiceName.PartitionOrDefault(), psn.Peer); err != nil {
return false, nil, err
// Check to see if the slice already contains the same ips.
if !vipSliceEqualsMapKeys(newEntry.ManualIPs, assignedIPs) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the key part of the fix.

func updateVirtualIPMaxIndexes(txn WriteTxn, idx uint64, partition, peerName string) error {
// update global max index (for snapshots)
if err := indexUpdateMaxTxn(txn, idx, tableServiceVirtualIPs); err != nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The snapshot logic grabs the max index from this table without peering/partition prefixes, so in order for that to be more correct we update the un-prefixed index here too.

return lastIndex
}

testutil.RunStep(t, "assign to nonexistent service is noop", func(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New effective start to the test, using the variety of helpers above to hopefully make this clearer to read.

// No manual IP should be set yet.
checkManualVIP(t, psn, "0.0.0.1", []string{}, regIndex1)

checkMaxIndexes(t, regIndex1, 0)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note now we actually verify the max index table is correctly updated.

} else {
require.Equal(t, expectManual, serviceVIP.ManualIPs)
}
require.Equal(t, expectIndex, serviceVIP.ModifyIndex)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of these tests will verify that the various entries did or did not have their modify index updated when writes occur.

checkMaxIndexes(t, assignIndex4, assignIndex4)
})

testutil.RunStep(t, "repeat the last write and no indexes should be bumped", func(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the test that repeating a write doesn't actually change anything.

@rboyer rboyer added the backport/all Apply backports for all active releases per .release/versions.hcl label Oct 31, 2024
if err != nil {
return fmt.Errorf("error checking for existing manual ips for service: %w", err)
}
if existingIPs != nil && stringslice.EqualMapKeys(existingIPs.ManualIPs, vipMap) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we just return the same positive response that the FSM would have generated in this no-op case without all of the raft expense.

} else {
if again {
require.Equal(t, tc.expectAgain, resp)
require.Equal(t, idx1, idx2, "no raft operations occurred")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the cheapest hack I could do to verify the "skip raft" behavior without crazy refactoring of the Server behavior.

Comment on lines +782 to +797
vipMap[ip] = struct{}{}
}
// Silently ignore duplicates.
args.ManualVIPs = maps.Keys(vipMap)

psn := structs.PeeredServiceName{
ServiceName: structs.NewServiceName(args.Service, &args.EnterpriseMeta),
}

// Check to see if we can skip the raft apply entirely.
{
existingIPs, err := m.srv.fsm.State().ServiceManualVIPs(psn)
if err != nil {
return fmt.Errorf("error checking for existing manual ips for service: %w", err)
}
if existingIPs != nil && stringslice.EqualMapKeys(existingIPs.ManualIPs, vipMap) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we do similar thing for writing service nodes, but thinking about this isn't it racy? Another request could be writing this piece of data right after we read it from the state store.
It's safe to do in the fsm because the fsm is single threaded, but here I'm not sure 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically each peered-service-name (PSN) should only be manipulated by one entity at a time externally. In the case of consul-k8s that is the endpoints controller (EC) workflow mostly. Even if you imagine rearranging the EC to run with more than one instance, sharing the work we'd likely shard it by PSN name so there wouldn't be two active writers.

Ideally we'd update the EC code to do a read-before-write check like this to avoid a duplicate write as you'd expect with a controller-type workflow.

There is also a lot of prior art about this sort of thing, like for all config entry writes and the catalog as you pointed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/all Apply backports for all active releases per .release/versions.hcl pr/no-metrics-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants