Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enabling Internal Encryption breaks DomainMappings when using Contour #862

Open
KauzClay opened this issue Feb 1, 2023 · 7 comments · Fixed by #860
Open

Enabling Internal Encryption breaks DomainMappings when using Contour #862

KauzClay opened this issue Feb 1, 2023 · 7 comments · Fixed by #860
Assignees
Labels
bug Something isn't working lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@KauzClay
Copy link
Contributor

KauzClay commented Feb 1, 2023

What version of Knative?

Relocating from knative/serving#13659 since this is just a Contour issue.

Working off main branch of knative-serving, net-contour

Discovered while trying to add internal encryption e2e tests for net-contour here: knative/serving#13536

Expected Behavior

When I create a DomainMapping for my Knative Service when Internal Encryption is enabled, I am able to reach the KService successfully.

Actual Behavior

DomainMappings fail to become ready, get stuck in "EndpointsNotReady"

net-contour controller says:

{"severity":"ERROR","timestamp":"2023-01-26T21:30:40.304820765Z","logger":"net-contour-controller","caller":"status/status.go:404","message":"Probing of http://hello.gen-14.hello.clay.tanzu.biz.default.net-contour.invalid failed, IP: 10.24.2.35:8080, ready: false, error: unexpected status code: want 200, got 503 (depth: 0)","commit":"e458d29-dirty","knative.dev/controller":"knative.dev.net-contour.pkg.reconciler.contour.Reconciler","knative.dev/kind":"networking.internal.knative.dev.Ingress","knative.dev/traceid":"724fb06d-90db-49cf-917e-a664dd798cb8","knative.dev/key":"default/hello.clay.tanzu.biz--ep","stacktrace":"knative.dev/networking/pkg/status.(*Prober).processWorkItem\n\tknative.dev/networking@v0.0.0-20221202133217-891aac251fc2/pkg/status/status.go:404\nknative.dev/networking/pkg/status.(*Prober).Start.func1\n\tknative.dev/networking@v0.0.0-20221202133217-891aac251fc2/pkg/status/status.go:289"}

I also see this in envoy logs:

[2023-01-26 21:34:40.776][19][debug][router] [source/common/router/router.cc:1212] [C49017][S13478037937466377759] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2023-01-26 21:34:40.776][19][debug][http] [source/common/http/filter_manager.cc:905] [C49017][S13478037937466377759] Sending local reply with details upstream_reset_before_response_started{connection_failure,TLS_error:_268435703:SSL_routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2023-01-26 21:34:40.776][19][debug][http] [source/common/http/conn_manager_impl.cc:1551] [C49017][S13478037937466377759] encoding headers via codec (end_stream=false):
':status', '503'
'content-type', 'text/plain'
'content-encoding', 'gzip'
'vary', 'Accept-Encoding'
'date', 'Thu, 26 Jan 2023 21:34:40 GMT'
'server', 'envoy'
[2023-01-26 21:34:40.735][21][debug][conn_handler] [source/server/active_tcp_listener.cc:147] [C49205] new connection from 10.24.2.43:43224
[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:306] [C49205] new stream
[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:930] [C49205][S163296292697216300] request headers complete (end_stream=true):
':authority', 'hello.gen-3.hello.claysreallyverylongtestineee50218ef4390e47e8e913ebbbebaf8.default.net-contour.invalid'
':path', '/healthz'
':method', 'GET'
'user-agent', 'Knative-Ingress-Probe'
'k-network-hash', 'override'
'k-network-probe', 'probe'
'accept-encoding', 'gzip'
...

[2023-01-26 21:34:40.735][21][debug][http] [source/common/http/conn_manager_impl.cc:913] [C49205][S163296292697216300] request end stream
[2023-01-26 21:34:40.735][21][debug][connection] [./source/common/network/connection_impl.h:92] [C49205] current connecting state: false
[2023-01-26 21:34:40.735][21][debug][router] [source/common/router/router.cc:470] [C49205][S163296292697216300] cluster 'default/hello/80/a67dfba3e6' match for URL '/healthz'
[2023-01-26 21:34:40.735][21][debug][router] [source/common/router/router.cc:678] [C49205][S163296292697216300] router decoding headers:
':authority', 'hello.default.svc.cluster.local'
':path', '/healthz'
':method', 'GET'
':scheme', 'http'
'user-agent', 'Knative-Ingress-Probe'
'k-network-probe', 'probe'
'accept-encoding', 'gzip'
'x-forwarded-for', '10.24.2.43'
'x-forwarded-proto', 'http'
'x-envoy-internal', 'true'
'x-request-id', '9d510376-f3c0-4a55-962d-a1f5a9f0ebe4'
'k-network-hash', 'dc12e833d98a355da2775ad80b3ae02658ed076ec3da0d5670b05f377f36f39e'
'x-request-start', 't=1674768880.735'
...
[2023-01-26 21:34:40.736][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
[2023-01-26 21:34:40.736][21][debug][pool] [source/common/conn_pool/conn_pool_base.cc:453] invoking idle callbacks - is_draining_for_deletion_=false
[2023-01-26 21:34:40.758][21][debug][router] [source/common/router/router.cc:1796] [C49205][S163296292697216300] performing retry
...
[2023-01-26 21:34:40.759][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
...
[2023-01-26 21:34:40.796][21][debug][router] [source/common/router/router.cc:1796] [C49205][S163296292697216300] performing retry
...
[2023-01-26 21:34:40.798][21][debug][router] [source/common/router/router.cc:1212] [C49205][S163296292697216300] upstream reset: reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER
...
[2023-01-26 21:34:40.798][21][debug][http] [source/common/http/filter_manager.cc:905] [C49205][S163296292697216300] Sending local reply with details upstream_reset_before_response_started{connection_failure,TLS_error:_268435703:SSL_routines:OPENSSL_internal:WRONG_VERSION_NUMBER}
[2023-01-26 21:34:40.798][21][debug][http] [source/common/http/conn_manager_impl.cc:1551] [C49205][S163296292697216300] encoding headers via codec (end_stream=false):
':status', '503'
'content-type', 'text/plain'
'content-encoding', 'gzip'
'vary', 'Accept-Encoding'
'date', 'Thu, 26 Jan 2023 21:34:40 GMT'
'server', 'envoy'

When I try this out with AutoTLS enabled, the domainmappings become ready, but I still get the error when I try hitting the endpoint.

upstream connect error or disconnect/reset before headers. retried and the latest reset reason: connection failure, transport failure reason: TLS error: 268435703:SSL routines:OPENSSL_internal:WRONG_VERSION_NUMBER

Steps to Reproduce the Problem

  1. enable internal encryption in config-network
  2. deploy a simple hello world knative service
  3. set up a clusterdomainclaim for your new domain
  4. create a domainmapping

Analysis

I think the problem is in part due to fact that DomainMappings point you back at the envoy.
If you look at the DAG, you can see all the routes point to a service on port 443. However, the one for hello goes to 80:

(DAG output comes from the Contour controller, see here)
contour-dag-encryption

That service spec looks like this:

apiVersion: v1
kind: Service
metadata:
  ...
  name: hello
  namespace: default
spec:
  clusterIP: None
  clusterIPs:
  - None
  internalTrafficPolicy: Cluster
  ipFamilies:
  - IPv4
  - IPv6
  ipFamilyPolicy: RequireDualStack
  ports:
  - name: http2
    port: 80
    protocol: TCP
    targetPort: 80
  sessionAffinity: None
  type: ClusterIP

Internal encryption was implemented so that ports named http2 are h2c when internal encryption is disabled, h2 when enabled.

This means that the HTTPProxy defines hitting the hello service on port 80 with h2 protocol.

So when Envoy tries to make the call, it uses https (for h2), but hits the http listener on Envoy.

When you put autotls on, there is at least a listener for 443 now, but it doesn't have the route data to deal with the request (since svc.cluster.local domains don't get TLS).

Suggestion

I think one way around this is to use the internal encryption secrets for the ClusterLocal visibility domains when internal encryption is enabled. That way you get a listener on 443 for those domains. Then you'd need to change the svc to also use 443. The trouble with that is that it is kinda venturing towards TLS for ClusterLocal routes, which is probably a big undertaking.

I suppose another, more simple option is to make the calls from the envoy back to itself not use encryption. But to me, that seems like leaving a hole in the internal encryption path.

@KauzClay
Copy link
Contributor Author

Okay #860 didn't totally fix this, going to reopen while I address that

@github-actions
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2023
@KauzClay
Copy link
Contributor Author

KauzClay commented Jun 1, 2023

I think the work to add https to clusterLocal routes here (knative-extensions/net-certmanager#538) might help in this scenario.

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 2, 2023
@github-actions
Copy link

github-actions bot commented Sep 1, 2023

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 1, 2023
@KauzClay
Copy link
Contributor Author

there is an effort to rework some of the internal encryption changes for Knative (see https://github.com/orgs/knative/projects/63/views/1)

As we progress with that, I plan to revisit net-contour. I will try to address this issue then

@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 26, 2023
Copy link

This issue is stale because it has been open for 90 days with no
activity. It will automatically close after 30 more days of
inactivity. Reopen the issue with /reopen. Mark the issue as
fresh by adding the comment /remove-lifecycle stale.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 25, 2023
@dprotaso
Copy link
Contributor

/lifecycle frozen

@dprotaso dprotaso reopened this Jan 26, 2024
@knative-prow knative-prow bot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
None yet
3 participants