Scale-able direct pod addressibility #36596

howardjohn · 2021-12-21T16:58:23Z

Currently, Istio identifies traffic:

TCP/TLS: Must have VIP
HTTP: Must have Service name in Host header
Auto: Must have VIP (even if HTTP, hostname is not used)
Headless TCP/TLS: Match any Pod IP as a dedicated listener
Headless HTTP: match service name in host header or *.<service>
headless auto: Match any Pod IP as a dedicated listener (even if HTTP, hostname is not used)

There are two main problems here:

Headless is not scale able, or as well supported as other cases. We create a listener per pod; in large headless services (typically DaemonSets), this leads to massive XDS payloads. On 300 node clusters with a few headless services we saw 15mb LDS configs. Headless services also typically do not set a host header, causing them to only work when using auto protocol - we prefer users to name the protocol typically. Headless also has weak auto mtls support - it requires a homogeneous cluster (all mtls or none).
Direct pod connections are not identified, and thus do not have mTLS (among other issues). There are some workarounds like making a orig_dst dest rule, but it only works with HTTP and has the same issues as headless for mtls. See Make pod addressability work even in meshes and drop fallbacks knative/serving#10751

This issue tracks what we can do to improve this.

The text was updated successfully, but these errors were encountered:

howardjohn · 2021-12-21T17:16:05Z

For pod connections, one solution is a smarter PassthroughCluster. Today, it is just an original_dest cluster, so we blindly pass everything through. Ideally, what we have is:

if destinationIsAPod() { 
  forwardWithMTLS()
} else {
  passthrough()
}

As far as I know, there is no scale-able way to do this today. My assumption is that to scale, we need to provide Envoy with the set of pod IPs in EDS (or equivalent), and likely duplicating IPs no more than once.

One thing that is close is using Envoy endpoint subsets. For example:

  clusters:
  - name: Passthrough
    type: STATIC
    lb_subset_config:
      subset_selectors:
      - keys:
        - address

    load_assignment:
      cluster_name: Passthrough
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 1.2.3.4
                port_value: 80
          metadata:
            filter_metadata:
              envoy.lb:
                address: "1.2.3.4:80"
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 2.3.4.5
                port_value: 80
          metadata:
            filter_metadata:
              envoy.lb:
                address: "2.3.4.5:80"

Then add a network filter that sets envoy.lb.address metadata to be equal to the original destination (I didn't find a way to do this built into Envoy, but its a trivial filter to write).

What this does is allow us to passthrough requests, like orig_dst, but still attach metadata to them. Importantly, this metadata can be our tlsMode metadata and used for a transport_socket_match, allowing upgrading passthrough requests to mTLS. For multinetwork with IP conflicts, we will favor the local network, as the requests to direct pod IPs will not traverse the network gateway so must be the local IP.

What is missing here is the fallback if there is no match. Envoy does have a fallback_policy, but it only allows picking any endpoint or failing; what we want is "fallback to orig_dst".

Just doing this alone could get us mTLS but we will still have degraded telemetry and likely other features. To take it a step further, we could consider adding explicit clusters for some groups of pods. What the groups are is open to discussion - it could be Services (and pick one if there are multiple...? we do this for headless), canonical service, or something else. From there, we could add a FCM extension that could match based on EDS data. For example, to implement headless services we could do:

listeners:
- filter_chain_match:
    from_eds: outbound|80|headless
  filters:
   set_dest_addr_metadata: {}
   tcp_proxy:  outbound|80|headless
endpoints:
 - cluster_name:  outbound|80|headless
   subset_config: { ... address subset config ... }
   endpoints: { .. same as above, with address metadata ..}

What this would do is look at all pod IPs/ports in the outbound|80|headless EDS response. If a request matched these, the filter chain matches. From there, we send to this cluster, and select the original destination endpoint. Because of the associated metadata, we can selectively enable mTLS as we do with Services

related: envoyproxy/envoy#15750

howardjohn · 2022-03-22T15:15:53Z

Not stale

howardjohn · 2022-07-15T19:37:00Z

More info in https://docs.google.com/document/d/1NAccj8WyjBXOUsMdOHW9sWW6PeCnrIWpc3muBHd4Cs8/edit

howardjohn added the area/networking label Dec 21, 2021

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 22, 2022

istio-policy-bot removed the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Mar 22, 2022

istio-policy-bot added the lifecycle/stale Indicates a PR or issue hasn't been manipulated by an Istio team member for a while label Jun 21, 2022

istio-policy-bot closed this as completed Jul 6, 2022

istio-policy-bot added the lifecycle/automatically-closed Indicates a PR or issue that has been closed automatically. label Jul 6, 2022

howardjohn reopened this Jul 15, 2022

howardjohn mentioned this issue Aug 24, 2022

Kong Ingress controller is unable to route requests to the app #39460

Closed

howardjohn mentioned this issue Sep 18, 2022

Ingress-nginx and istio with TCP connections #41033

Closed

howardjohn mentioned this issue Feb 20, 2023

Ignore HTTP Host header when appProtocol is http #42930

Closed

howardjohn mentioned this issue Apr 16, 2024

Migrate from using bind_to_port=false to filter chains. #10533

Open

This was referenced Aug 20, 2024

Support partial headless apps by build two different outbound clusters. #27376

Closed

Do not run Envoy in prometheus TLS sidecar mode #34768

Open

Istio has sent a listener with the incorrect port for a headless service. #51908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale-able direct pod addressibility #36596

Scale-able direct pod addressibility #36596

howardjohn commented Dec 21, 2021

howardjohn commented Dec 21, 2021 •

edited

Loading

howardjohn commented Mar 22, 2022

howardjohn commented Jul 15, 2022

Scale-able direct pod addressibility #36596

Scale-able direct pod addressibility #36596

Comments

howardjohn commented Dec 21, 2021

howardjohn commented Dec 21, 2021 • edited Loading

howardjohn commented Mar 22, 2022

howardjohn commented Jul 15, 2022

howardjohn commented Dec 21, 2021 •

edited

Loading