Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster bootstrapping with Istio #209

Closed
idoshamun opened this issue May 28, 2018 · 29 comments
Closed

Cluster bootstrapping with Istio #209

idoshamun opened this issue May 28, 2018 · 29 comments
Milestone

Comments

@idoshamun
Copy link

Istio blocks direct connection between pods, is there a way to overcome this to allow the cluster bootstrapping?

This is the error I get right now:

21:03:21.123 [error] akka.management.cluster.bootstrap.internal.HttpContactPointBootstrap - Probing [http://10.60.1.32:3311/bootstrap/seed-nodes] failed due to: The http server closed the connection unexpectedly before delivering responses for 1 outstanding requests
@ktoso
Copy link
Member

ktoso commented May 31, 2018

We have not looked into Istio so far in this context; if you could look into it and share your findings that would be very useful

@idoshamun
Copy link
Author

The current conclusion is that Istio doesn't allow pod to pod communication and there is no way to disable this restriction. Meaning that we can't form an Akka cluster using this method on an Istio cluster.
Here is the relevant thread:
https://groups.google.com/forum/#!topic/istio-users/d-THsO19oAM

Any other ideas to form a cluster?

@thomschke
Copy link
Contributor

It's works by using pod dns names see.

apiVersion: "apps/v1"
kind: Deployment
metadata:
  name: akka
spec:
  replicas: 2
  selector:
    matchLabels:
      app: akka
  template:
    metadata:
      labels:
        app: akka
    spec:
      containers:
      - name: akka
        ports:
        - containerPort: 10001
          name: "akka-remote"
        - containerPort: 10002
          name: "akka-mgmt-http"
        env:
        - name: "DYN_JAVA_OPTS"
          value: "-Dakka.remote.netty.tcp.hostname=${POD_IP//./-}.default.pod.cluster.local -Dakka.management.http.hostname=${POD_IP//./-}.default.pod.cluster.local"

  • configure an internal ServiceEntry see to allow outgoing calls:
apiVersion: networking.istio.io/v1alpha3
kind: ServiceEntry
metadata:
  name: akk
spec:
  hosts:
  - "*.default.pod.cluster.local"
  location: MESH_INTERNAL
  ports:
  - number: 10001
    name: "akka-remote"
    protocol: TCP
  - number: 10002
    name: "http-akka-mgmt"
    protocol: HTTP
  resolution: NONE
  • configure a K8S Service (Type: ClusterIP) to allow incoming calls:
apiVersion: v1
kind: Service
metadata:
  name: akka
spec:
  ports:
  - name: akka-remote
    port: 10001
    protocol: TCP
    targetPort: 10001
  - name: akka-mgmt-http
    port: 10002
    protocol: TCP
    targetPort: 10002
  selector:
    app: akka

@ktoso
Copy link
Member

ktoso commented Jun 19, 2018

Interesting, thanks for sharing!
I hope to get time to focus on these things in the upcoming weeks to really confirm what the story is here. In the meantime, all contribs to docs would be very lovely, thanks a ton!

@idoshamun
Copy link
Author

@thomschke Thanks for sharing, I'll try to give it a go in the upcoming days and let you know how it goes

@thomschke
Copy link
Contributor

I forgot to expect that my approach has two side effects:

  • Within an k8s cluster Istio pilot observes service endpoints to manage the proxy rules. If you configure readiness probes for containers, then these endpoints are visible only when the readiness probe is ready. Therefore, incoming calls are passed only when the readiness probe is ready. But you can tune the failureThreshold of the readiness probe see and the failure detector of the akka cluster see.

  • The internal ServiceEntry is a sledgehammer approach, because it allows outgoing call to all pods in the namespace for the configured ports. But only outgoing calls are affected! Either you live with it or you can define network policies too see.

@idoshamun
Copy link
Author

I am waiting for #217 to be merged and released to evaluate your solution

ktoso pushed a commit that referenced this issue Jun 20, 2018
@ktoso ktoso added this to the 0.15.0 milestone Jun 20, 2018
@ktoso ktoso closed this as completed Jun 20, 2018
@idoshamun
Copy link
Author

@thomschke your solution works perfectly. One question though, is there any way to set readiness probe? It is very important because the services take time to boot and I don't like them to get traffic.

@thomschke
Copy link
Contributor

If you are using sbt-reactive-app(see) then you get health/readiness checks OOTB :-)
Either you generate your k8s-deployment (see) or you define it manually:

readinessProbe:
  httpGet:
    path: "/platform-tooling/ready"
    port: "akka-mgmt-http"
    periodSeconds: ...
livenessProbe:
  httpGet:
    path: "/platform-tooling/healthy"
    port: "akka-mgmt-http"
    periodSeconds: ...
    initialDelaySeconds: ...

Or you write your own checks (see) and have to register it (see)

@idoshamun
Copy link
Author

Yes I am using sbt-reactive-app but the only problem is that if the readiness check is not positive the akka cluster cannot be formed. I think that the sbt-reactive-app default readiness check requires the cluster to be formed so it's kind of a deadlock right? or did I miss something...?

@thomschke
Copy link
Contributor

An important point to be mentioned:

Under Istio the ingress traffic is passed through the sidecar-proxy. For this reason all application sockets have to bind to 127.0.0.1 (or 0.0.0.0) and NOT to the POD_IP.

- akka.remote.netty.tcp.hostname=${POD_IP//./-}.default.pod.cluster.local
- akka.remote.netty.tcp.bind-hostname=127.0.0.1
- akka.management.http.hostname=${POD_IP//./-}.default.pod.cluster.local
- akka.management.http.bind-hostname=127.0.0.1

If you using sbt-reactive-app and generate your k8s-deployment with reactive-cli then you have to patch the deployment:

spec:
  template:
    spec:
      containers:
      - name: ...
        env:
        - name: "RP_DYN_JAVA_OPTS"
          value: "-Dakka.discovery.kubernetes-api.pod-namespace=$RP_NAMESPACE -Dakka.remote.netty.tcp.hostname=${RP_KUBERNETES_POD_IP//./-}.default.pod.cluster.local -Dakka.remote.netty.tcp.bind-hostname=127.0.0.1 -Dakka.management.http.hostname=${RP_KUBERNETES_POD_IP//./-}.default.pod.cluster.local -Dakka.management.http.bind-hostname=127.0.0.1"

@idoshamun
Copy link
Author

Already configured everything regarding the binding and hostname, I use Helm charts for my deployments based on reactive-cli template. The only difference is that I use only 2 replicas I'll try to change it. Thanks!

@thomschke
Copy link
Contributor

Sorry, I have to correct myself:

There is an issue #236 and it's fixed :-)
Try v0.16.0+, it should be running with only 2 replicas.

@sebarys
Copy link

sebarys commented Sep 19, 2018

Hello,
Anyone is able to use this approach with mutual TLS enabled in Istio? I am having troubles to configure it...
I would be grateful for any suggestions

@mtalbot
Copy link

mtalbot commented Nov 5, 2018

The mutual auth is I think is being caused because the bootstrap is attempting to call itself which doesn't match to the local egress cluster that enables tls but instead matches the ingress cluster which is expecting tls and drops the connection. This seems to be a flaw in the definition of the clusters and rules in istio, but one thats not easy to resolve. This only happens when one of the TLS methods are enabled.

@chbatey
Copy link
Member

chbatey commented Nov 6, 2018

I don't think bootstrap should need to probe its self. Raised #376

adarshaj added a commit to hackcave/akka-simple-cluster-k8s that referenced this issue Nov 20, 2018
* Update to latest version of akka-management
* Include workarounds to make cluster bootstrap work with istio as documented in akka/akka-management#209 (comment)
@mtalbot
Copy link

mtalbot commented Feb 5, 2019

So there is a work around to this, if you disable TLS only on the read only management port then the bootstrapping will work correctly. This can be done with a permissive policy

apiVersion: "authentication.istio.io/v1alpha1"
kind: "Policy"
metadata:
  name: {{.Release.Name}}-{{.Values.akka.cluster.role}}-health
  namespace: {{.Release.Namespace}}
spec:
  targets:
    - name: {{.Release.Name}}-{{.Values.akka.cluster.role}}-akka
      ports:
      - number: {{.Values.ports.health}}
  peers:
    - mtls:
        mode: PERMISSIVE

@jroper
Copy link
Contributor

jroper commented Mar 25, 2019

There is another significant pitfall to this configuration - nodes rarely if ever leave the cluster gracefully, eg when scaling down or doing a rolling upgrade. The reason for this is that when the pod is stopped, both the Istio sidecar and the Akka cluster node are stopped simultaneously. The Istio sidecar stops almost immediately, which means the remoting traffic immediately stops, and that means that the Akka node cannot attempt to leave the cluster. That failed attempt will actually delay stopping the pod until it times out, meanwhile from the point of view of all the other nodes in the cluster, it will become unreachable, and will stay in that status until a downing strategy kicks in to down it. This will make rolling upgrades in particular quite clunky.

@raboof
Copy link
Member

raboof commented Mar 25, 2019

when the pod is stopped, both the Istio sidecar and the Akka cluster node are stopped simultaneously. The Istio sidecar stops almost immediately, which means the remoting traffic immediately stops, and that means that the Akka node cannot attempt to leave the cluster

Hmm, that is a pretty serious limitation. Is there any way to postpone the sidecar termination until after the node has successfully left?

@thomschke
Copy link
Contributor

thomschke commented Mar 25, 2019

We have to wait for k8s container dependencies (kubernetes/enhancements#753).

In the meantime you can use a prestop hook - but this is a hack and prevent us to use automatic sidecar injection.

BTW: This is a good example of having a good downing strategy - especially during rolling updates because there is only a small window of time to react :-)

@jroper
Copy link
Contributor

jroper commented Mar 25, 2019

That container dependency feature looks like it will solve the issue perfectly for us. By the way, thanks @thomschke for the instructions posted here, I don't know how long it would have taken me to get a cluster bootstrapping in Istio without your instructions here. I'm hoping we (@lightbend) will be able to publish some documentation on doing this soon, and we're also talking to IBM, RedHat and Google to see whether we can work with them to get better first class support for the Akka use case (and any other technologies that need pod to pod communication) in Istio.

@jroper
Copy link
Contributor

jroper commented May 9, 2019

I have done some work on this recently.

  • So firstly, by default, all incoming traffic in Istio is allowed. Only those ports that are registered in the pod spec are redirected through envoy. So, as long as you don't list the remoting and management port in the pod spec, then incoming connections to them will succeed without going through envoy. Of course, the kubernetes-api discovery mechanism requires the management port to be specified, so that requirement needs to be dropped to take advantage of this.
  • It's possible to run Istio in a mode where everything is redirected through envoy, but in that case there's an escape hatch, simply add the traffic.sidecar.istio.io/excludeInboundPorts: "2552,8558" annotation to your pod.
  • Outgoing traffic of course is another issue, it is all routed through envoy. I have a solution to that, I've added support for a traffic.sidecar.istio.io/excludeOutboundPorts: "2552,8558" annotation here. I haven't submitted a PR for that yet, but will once I've written tests etc. But testing locally, it works.

So with the above configured, and assuming we can fix the kubernetes-api discovery so that it just looks up IP addresses rather than requiring the port to be configured, we will be able to run an Akka cluster easily in Istio, with only those two annotations added, nothing else would be different from a normal Akka cluster development, and the traffic wouldn't go through envoy, which means the order of container shutdown wouldn't matter so nodes can exit the cluster gracefully, and also ready checks can be used. On the minus side, that means the remoting can't take advantage of transparent mTLS, but that's probably fine, it should use another mechanism for configuring mTLS, cluster traffic is not service mesh traffic and routing it through envoy is likely to adversely impact performance.

@jroper
Copy link
Contributor

jroper commented May 10, 2019

I've made the following pull requests:

With these two patches applied, I have been able to establish an Akka cluster in Istio, using the standard kubernetes API cluster bootstrap mechanism just by adding the following annotations to my pod template:

      annotations:
        traffic.sidecar.istio.io/excludeInboundPorts: "2552,8558"
        traffic.sidecar.istio.io/excludeOutboundPorts: "2552,8558"

And not declaring the management port. Readiness checks are implemented and work.

An example working project (though needs to be upgraded to new Akka management once the PR is merged) can be found here:

https://github.com/jroper/akka-grpc-ddata-shopping-cart/tree/istio-future

@jroper
Copy link
Contributor

jroper commented May 21, 2019

Hi all, Akka management 1.0.1 has been released with the changes, and the Istio PR has been merged into master. Istio 1.2 is scheduled to be released on June 19 I believe, so if anyone on this thread would be able to test it before then, that would be really helpful. There are instructions on here for installing the latest Istio nightly build:

https://github.com/istio/installer

Alternatively, for a very basic setup, simply run the following on a fresh Kubernetes cluster:

kubectl apply -f https://gist.githubusercontent.com/jroper/9d1aa662ea166bdea1f969edd74e34c4/raw/8f8aefe0d2b4cb130177c253983bbe2f30fc4605/istio.yaml

This will install my own build of Istio master, published to docker hub.

@thomschke
Copy link
Contributor

Hi @jroper, I tested my lagom services under istio 1.2.0 as you described with akka-management 1.0.1 and excludeOutboundPorts annotation without any problems. GJ !!!

@jroper
Copy link
Contributor

jroper commented Jun 25, 2019

Thanks @thomschke! To complete the circle, I've added documentation here.

@TimMoore
Copy link

The documentation is now published at https://doc.akka.io/docs/akka-management/current/bootstrap/istio.html

@ghandim
Copy link

ghandim commented Jul 19, 2019

Hi @jroper, I tested eclipse ditto 0.9.0 with istio 1.2.2 and akka management 1.0.1 and your described K8s annotations. Works without any problems 👍

@vvavepacket
Copy link

vvavepacket commented Jul 2, 2020

Hi @jroper Looks like its no longer working for akka management 1.0.8 and Istio 1.6

I have the following config in my pod

          ports:
            - name: management
              containerPort: 8558 
              protocol: TCP
            - name: live
              containerPort: 9000
          readinessProbe:
            httpGet:
              path: "/ready"
              port: management
            periodSeconds: 10
            failureThreshold: 10
            initialDelaySeconds: 20
            timeoutSeconds: 1
          livenessProbe:
            httpGet:
              path: "/alive"
              port: management
            periodSeconds: 10
            failureThreshold: 10
            initialDelaySeconds: 20
            timeoutSeconds: 1

And I have added the annotations on my pod

      annotations:
        traffic.sidecar.istio.io/includeInboundPorts: "9000"
        traffic.sidecar.istio.io/excludeOutboundPorts: "2552,8558"

I am getting probe failures

2020-07-02T03:16:51.019635Z     error   Request to probe app failed: Get "http://localhost:8558/ready": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/readyz
app URL path = /ready
2020-07-02T03:16:57.829163Z     error   Request to probe app failed: Get "http://localhost:8558/alive": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/livez
app URL path = /alive
2020-07-02T03:17:01.020098Z     error   Request to probe app failed: Get "http://localhost:8558/ready": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/readyz
app URL path = /ready
2020-07-02T03:17:07.832858Z     error   Request to probe app failed: Get "http://localhost:8558/alive": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/livez
app URL path = /alive
2020-07-02T03:17:11.019798Z     error   Request to probe app failed: Get "http://localhost:8558/ready": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/readyz
app URL path = /ready
2020-07-02T03:17:17.828587Z     error   Request to probe app failed: Get "http://localhost:8558/alive": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/livez
app URL path = /alive
2020-07-02T03:17:21.019642Z     error   Request to probe app failed: Get "http://localhost:8558/ready": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/readyz
app URL path = /ready
2020-07-02T03:17:27.829042Z     error   Request to probe app failed: Get "http://localhost:8558/alive": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/livez
app URL path = /alive
2020-07-02T03:17:31.019802Z     error   Request to probe app failed: Get "http://localhost:8558/ready": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/readyz
app URL path = /ready
2020-07-02T03:17:51.019767Z     error   Request to probe app failed: Get "http://localhost:8558/ready": dial tcp
 127.0.0.1:8558: connect: connection refused, original URL path = /app-health/miniapp/readyz
app URL path = /ready

If I try to access the probe directly, its returning 200 OK

curl http://10.244.1.231:8558/ready
OK

I have created a bug on Istio

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests