Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual Backport of Fix issue where terminating gateway service resolvers weren't properly cleaned up into release/1.13.x #16557

Merged
merged 4 commits into from
Mar 7, 2023

Conversation

andrewstucki
Copy link
Contributor

Backport

Manual backport from #16498 to release/1.13.x.

The below text is copied from the body of the original PR.


Description

This fixes an issue where terminating gateways weren't properly cleaning up service resolvers attached to their external services. When a resolver was deleted, the casting guard was keeping around the old ServiceResolver value due to a failed nil cast.

However, there are some questions that I do have about whether or not ServiceResolver subsets are ever supposed to work with an external service accessed through a terminating gateway, as they currently do not -- when a local proxy attempts to resolve the endpoint for an upstream that is an external service that has a ServiceResolver with subsets it winds up attempting to watch the subset upstream directly which is not returning any address (verses without a subset, returning the terminating gateway address). Either way though, that would be a different bug.

Testing & Reproduction steps

Run the following bash script:

#!/bin/bash

cleanup() {
  echo "shutting down upstreams"
}

trap 'trap " " SIGTERM; kill 0; wait; cleanup' SIGINT SIGTERM

cat << EOF | ./consul config write -
Kind      = "proxy-defaults"
Name      = "global"
Config {
  protocol = "http"
}
EOF

echo "Writing terminating gateway config entry"
cat << EOF | ./consul config write -
Kind = "terminating-gateway"
Name = "gateway"

Services = [
  {
    Name = "external"
  }
]
EOF

cat << EOF > /tmp/external.json
{
  "Node": "hashicorp",
  "Address": "127.0.0.1",
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "external-v1",
    "Service": "external",
    "Port": 9877,
    "Meta": {
      "version": "v1"
    }
  }
}
EOF
curl --request PUT --data @/tmp/external.json localhost:8500/v1/catalog/register

cat << EOF > /tmp/external.json
{
  "Node": "hashicorp",
  "Address": "127.0.0.1",
  "NodeMeta": {
    "external-node": "true",
    "external-probe": "true"
  },
  "Service": {
    "ID": "external-v2",
    "Service": "external",
    "Port": 9878,
    "Meta": {
      "version": "v2"
    }
  }
}
EOF
curl --request PUT --data @/tmp/external.json localhost:8500/v1/catalog/register

echo "Writing resolver config entry"
cat << EOF | ./consul config write -
Kind = "service-resolver"
Name = "external"
DefaultSubset = "v1"
Subsets = {  
  v1 = {    
    Filter = "Service.Meta.version == v1"  
  } 
  v2 = {    
    Filter = "Service.Meta.version == v2"  
  }
}
EOF

echo "Running terminating gateway"
./consul connect envoy -gateway terminating -register -service gateway -proxy-id gateway -- -l trace  &

wait

Before the fix:

➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v1.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v2.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
➜  consul git:(terminating-gateway-resolvers) ✗ ./consul config delete -kind service-resolver -name external
Config entry deleted: service-resolver/external
➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v1.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul
v2.external.default.dc1.internal.4e38b938-3a3e-7b67-6624-a8dae1058918.consul

After the fix:

➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul
v1.external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul
v2.external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul
➜  consul git:(terminating-gateway-resolvers) ✗ ./consul config delete -kind service-resolver -name external
Config entry deleted: service-resolver/external
➜  consul git:(terminating-gateway-resolvers) ✗ curl -s http://localhost:19000/clusters | grep external | cut -d':' -f1 | uniq | sort
external.default.dc1.internal.d27ec462-6e9d-edfa-bf3a-8d603e174492.consul

PR Checklist

  • updated test coverage
  • external facing docs updated
  • not a security concern

Overview of commits

Copy link
Member

@nathancoleman nathancoleman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: Diff is slightly different because the assert_upstream_missing helper didn't exist in the base branch here

@andrewstucki andrewstucki merged commit 90c4c7d into release/1.13.x Mar 7, 2023
@andrewstucki andrewstucki deleted the release-1.13.x-backport-resolver-fix-1 branch March 7, 2023 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants