Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Gateway Controller in secondary datacenter has insufficient permissions #1344

Closed
krarey opened this issue Jul 15, 2022 · 12 comments · Fixed by #1462 or #1481
Closed

API Gateway Controller in secondary datacenter has insufficient permissions #1344

krarey opened this issue Jul 15, 2022 · 12 comments · Fixed by #1462 or #1481
Assignees
Labels
theme/api-gateway Related to Consul API Gateway type/bug Something isn't working

Comments

@krarey
Copy link
Member

krarey commented Jul 15, 2022

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request. Searching for pre-existing feature requests helps us consolidate datapoints for identical requirements into a single place, thank you!
  • Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.

Overview of the Issue

When deploying a federated secondary Consul datacenter via Helm chart, the API Gateway Controller deployment is configured to retrieve a token at launch time via Kubernetes auth method. This token has the local flag set, and the associated policy is further scoped only to the secondary datacenter.

Because this token is used to create config-entry resources, which are globally created in the primary datacenter and replicated back to the secondaries, attachment of new HTTPRoute and TCPRoute resources within the secondary cluster fails to complete as the attached token is invalid in the primary DC. This prevents creation of the underlying Consul *-gateway, service-defaults, and service-intentions resources managed by the API Gateway Controller.

Reproduction Steps

  1. Apply Kubernetes Gateway SIG and Consul API Gateway CRDs to a cluster
  2. Use Helm chart to deploy a Consul secondary datacenter with ACLs enabled
  3. Deploy a Gateway resource in the Kubernetes secondary DC. e.g.:
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: Gateway
metadata:
  name: ns-api-gateway
  namespace: consul-infra
spec:
  gatewayClassName: consul-api-gateway
  listeners:
  - protocol: HTTP
    port: 8080
    name: http
    allowedRoutes:
      namespaces:
        from: All
  1. Associate an HTTPRoute or TCPRoute with the deployed Gateway, that references a running and connect-injected upstream service. e.g.:
apiVersion: v1
kind: ServiceAccount
metadata:
  name: dashboard
  namespace: webapp
---
apiVersion: v1
kind: Service
metadata:
  name: dashboard
  namespace: webapp
spec:
  selector:
    app: dashboard
  ports:
  - port: 9002
    targetPort: 9002
    name: dashboard
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: dashboard
  name: dashboard
  namespace: webapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: dashboard
  template:
    metadata:
      annotations:
        'consul.hashicorp.com/connect-inject': 'true'
        'consul.hashicorp.com/connect-service-upstreams': 'counting:9001'
      labels:
        app: dashboard
    spec:
      serviceAccountName: dashboard
      containers:
      - name: dashboard
        image: hashicorp/dashboard-service:0.0.4
        ports:
        - containerPort: 9002
        env:
        - name: COUNTING_SERVICE_URL
          value: 'http://localhost:9001'
---
apiVersion: consul.hashicorp.com/v1alpha1
kind: ServiceDefaults
metadata:
  name: dashboard
  namespace: webapp
spec:
  protocol: http
---
apiVersion: gateway.networking.k8s.io/v1alpha2
kind: HTTPRoute
metadata:
  name: dashboard
  namespace: webapp
spec:
  parentRefs:
  - name: ns-api-gateway
    namespace: consul-infra
  rules:
    - matches:
      - path:
          value: /
    - backendRefs:
      - kind: Service
        name: dashboard
        namespace: webapp
        port: 9002
  1. Observe the Consul UI/API, see no underlying ingress-gateway or service-intentions are created by the gateway controller.

Expected behavior

HTTPRoute and TCPRoute resources created within a Kubernetes cluster configured as a Consul secondary DC should lead to successful creation of the associated config entries within Consul.

Environment details

consul-k8s version: 0.45.0, also tested with 0.41.1 (prior to addition of component auth method)
API Gateway Version: 0.3.0, also tested with 0.1.0
Kubernetes version: v1.22.8-gke.20
Cloud Provider: GCP, also tested with Azure Red Hat OpenShift

Values.yaml:

global:
  name: consul
  image: "hashicorp/consul-enterprise:1.12.2-ent"
  imageK8S: "hashicorp/consul-k8s-control-plane:0.45.0"
  datacenter: secondary
  gossipEncryption:
    secretName: consul-federation
    secretKey: gossipEncryptionKey
  tls:
    enabled: true
    caCert:
      secretName: consul-federation
      secretKey: caCert
    caKey:
      secretName: consul-federation
      secretKey: caKey
  enableConsulNamespaces: true
  acls:
    manageSystemACLs: true
    replicationToken:
      secretName: consul-federation
      secretKey: replicationToken
  enterpriseLicense:
    secretName: consul-license
    secretKey: key
  federation:
    enabled: true
    primaryDatacenter: primary
    primaryGateways: ["[...]"]
    k8sAuthMethodHost: "[...]"
  imageEnvoy: "envoyproxy/envoy:v1.22.2"
server:
  replicas: 3
  storage: 10Gi
  storageClass: premium-rwo
  updatePartition: 0
connectInject:
  enabled: true
  transparentProxy:
    defaultEnabled: false
  consulNamespaces:
    mirroringK8S: true
controller:
  enabled: true
meshGateway:
  enabled: true
apiGateway:
  enabled: true
  image: hashicorp/consul-api-gateway:0.3.0
  logLevel: debug
@krarey krarey added the type/bug Something isn't working label Jul 15, 2022
@nathancoleman nathancoleman self-assigned this Jul 15, 2022
@nathancoleman
Copy link
Member

Thanks for reporting @krarey! This is an issue with the Helm chart that the API Gateway team will work on addressing

@nathancoleman nathancoleman added the theme/api-gateway Related to Consul API Gateway label Jul 15, 2022
@codex70
Copy link

codex70 commented Aug 2, 2022

I'm also seeing issues that may be related in a federated secondary datacenter. I can set up routes manually and when I check the routes status, it appears to be correct.

When I try to connect to the API gateway using curl, I get an immediate closing of the connection:
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to X.X.X.X:844

In the primary datacenter the connection works correctly as expected.

It seems possible that this is a TLS issue related to the shared consul-federation caCert and caKey, but I'm not at all sure.

I know it's not exactly the same issue as above, but seems to be closely relate.

@manobi
Copy link

manobi commented Aug 27, 2022

I have just fall into this issue while testing the new URLrewrite filter on latest consul-k8s helm chart (v0.47.1).

Managed to register routes in primary datacenter, but similar configs does not work in secondary datacenter.

@nathancoleman do you think that if I change one of secondary cluster to Cluster peering instead of WAN federation that would work?

@nathancoleman
Copy link
Member

nathancoleman commented Aug 29, 2022

@manobi I believe it's just a matter of configuring the acl-auth-method and primary-datacenter appropriately when deploying the API Gateway controller into a secondary datacenter. Please see the PR I just put up over at #1462 .

@codex70
Copy link

codex70 commented Aug 30, 2022

@nathancoleman, do you have any idea when your code is likely to be released so that we can test it?

@nathancoleman
Copy link
Member

@codex70 #1462 will be included in the next release of consul-k8s, slated for this Thursday, September 1

@nathancoleman
Copy link
Member

@codex70 @manobi version 0.48.0 is now available for the consul Helm chart and contains the code changes related to this issue (changelog)

@manobi
Copy link

manobi commented Sep 2, 2022

Thank you guys, will test it right now.

@manobi
Copy link

manobi commented Sep 2, 2022

@codex70 @manobi version 0.48.0 is now available for the consul Helm chart and contains the code changes related to this issue (changelog)

@nathancoleman Consul API gateway controller never becomes ready:

2022-09-02T15:51:21.019Z [ERROR] unable to login: error="Unexpected response code: 403 (rpc error making call: rpc error making call: rpc error making call: Permission denied)"

It looks to be related to serviceaccount/rolebinding stuff, since I've managed to run the following command in controller-acl-init but not in api-gateway-controller-acl-init:

consul-k8s-control-plane acl-init \	
            -component-name=api-gateway-controller \	
            -acl-auth-method=consul-consul-k8s-component-auth-method-REDACTED \	
            -primary-datacenter=REDACTED \	
            -consul-api-timeout=1m \	
            -log-level=info \	
            -log-json=false

I have also been able to complete the initContainer using the "consul-controller" service account instead of "consul-api-gateway-controler".

Any suggestion? maybe track in a separate issue?

@codex70
Copy link

codex70 commented Sep 12, 2022

Please could you keep us updated on the progress with this, it looks like it has become more complicated.

Unfortunately running a single datacenter isn't an option for us due to the flat networking requirements.

@nathancoleman
Copy link
Member

nathancoleman commented Sep 14, 2022

Hi @codex70 the insufficient permissions issue described here is resolved by the combination of #1462 (merged + released) and #1481 (in code review).

With the API Gateway controller running in both the primary and the secondary datacenter, there is one other issue preventing you from successfully spinning up a Gateway in the second datacenter. That issue is over in the consul-api-gateway repo, described in hashicorp/consul-api-gateway#361. I'm testing a fix for that this week.

Edit: We've also updated our docs describing current limitations with regard to federation here. I expect I'll have the datacenter federation feature described there working (controller per datacenter, gateways routing within the datacenter they're deployed to) with #1481 and a fix for hashicorp/consul-api-gateway#361; however, routing from a gateway in one datacenter to a service in a different datacenter is unlikely in the short term.

@sergey-kudriavtsev
Copy link

sergey-kudriavtsev commented Oct 24, 2022

Hi all! @nathancoleman @krarey
Thanks for the fix, when will it be released?
It's just that the latest version of Charts: consul (0.49.0) is already with changes, but they are still not in the binary.
one.

  1. method name
    {{- if and .Values.global.federation.enabled .Values.global.federation.primaryDatacenter }}
             -acl-auth-method={{ template "consul.fullname" . }}-k8s-component-auth-method-{{ .Values.global.datacenter }} \
             -primary-datacenter={{ .Values.global.federation.primaryDatacenter }} \
     {{-else}}
  1. argument -primary-datacente
     {{- if and .Values.global.federation.enabled .Values.global.federation.primaryDatacenter }}
             -primary-datacenter={{ .Values.global.federation.primaryDatacenter }} \
       {{-end}}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/api-gateway Related to Consul API Gateway type/bug Something isn't working
Projects
None yet
5 participants