Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TLS to Fleetautoscaler webhook service #476

Merged
merged 1 commit into from
Jan 10, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
189 changes: 189 additions & 0 deletions docs/create_webhook_fleetautoscaler.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ This guide covers how you can create webhook fleet autoscaler policy.
The main difference from the Buffer policy is that the logic on how many target replicas you need is delegated to a separate pod.
This type of Autoscaler would send an HTTP request to the webhook endpoint every sync period (which is currently 30s) with a JSON body, and scale the target fleet based on the data that is returned.

# Chapter 1 Configuring HTTP fleetautoscaler webhook

## Prerequisites

It is assumed that you have read the instructions to [Create a Game Server Fleet](./create_fleet.md)
Expand Down Expand Up @@ -235,6 +237,193 @@ simple-udp-884fg-6q5sk Ready 35.247.117.202 7373
simple-udp-884fg-b7l58 Allocated 35.247.117.202 7766
```

### 8. Cleanup
You can delete the autoscaler service and associated resources with the following commands.

```
kubeclt delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/autoscaler-webhook/autoscaler-service.yaml
```


Removing the fleet:
```
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/simple-udp/fleet.yaml
```


# Chapter 2 Configuring HTTPS fleetautoscaler webhook with CA Bundle

markmandel marked this conversation as resolved.
Show resolved Hide resolved
⚠️⚠️⚠️ **This is currently a development feature and has not been released** ⚠️⚠️⚠️

## Objectives

Using TLS and CA bundle we can establish trusted communication between Fleetautoscaler and Webhook which controls size of the fleet (Replicas count). The certificate of the webhook should be signed by Certificate Authority provided in fleetautoscaler yaml configuration file. Which eliminates the possibility to perform man in the middle attack when using HTTP connection to a webhook which can be located inside or outside of our cluster.

Description of common steps with Chapter 1 would be ommited for simplicity, you can see pervious chapter for the details.

### 1. Deploy the fleet

Run a fleet in a cluster:
```
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/simple-udp/fleet.yaml
```

### 2. Create X509 Root and Webhook certificates

The procedure of generating Self-signed CA certificate taken from [here](https://datacenteroverlords.com/2012/03/01/creating-your-own-ssl-certificate-authority/)

The first step is to create the private root key:
```
openssl genrsa -out rootCA.key 2048
```

The next step is to self-sign this certificate:
```
openssl req -x509 -new -nodes -key rootCA.key -sha256 -days 1024 -out rootCA.pem
```

This will start an interactive script which will ask you for various bits of information. Fill it out as you see fit.

Every webhook that you wish to install a trusted certificate will need to go through this process. First, just like with the root CA step, you’ll need to create a private key (different from the root CA):
```
openssl genrsa -out webhook.key 2048
```

Once the key is created, you’ll generate the certificate signing request, use valid hostname which is `autoscaler-tls-service.default.svc` as `Common Name (eg, fully qualified host name)` when prompted:
```
openssl req -new -key webhook.key -out webhook.csr
```

Once that’s done, you’ll sign the CSR, which requires the CA root key:
```
openssl x509 -req -in webhook.csr -CA rootCA.pem -CAkey rootCA.key -CAcreateserial -out webhook.crt -days 500 -sha256
```
This would generate webhook.crt certificate

Add secret which later would be mounted to autoscaler-webhook-tls pod.
```
kubectl create secret tls autoscalersecret --cert=webhook.crt --key=webhook.key
```

You need to put Base64-encoded string into caBundle field in your fleetautoscaler yaml configuration:
```
base64 -i ./rootCA.pem
```

Copy the output of the command above and replace the caBundle field in your text editor:
```
vim $GOPATH/src/agones.dev/agones/examples/webhookfleetautoscalertls.yaml
```

### 3. Deploy a Webhook service for autoscaling

Run next command to create a service and a Webhook pod in a cluster:
```
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/autoscaler-webhook/autoscaler-service-tls.yaml
```

To check that it is running and liveness probe is fine:
```
kubectl describe pod autoscaler-webhook-tls
```

Wait for the Running status results:
```
Name: autoscaler-webhook-tls-f74c9bff7-ssrsc
Namespace: default
...
Status: Running
```

### 4. Create a Fleet Autoscaler

Let's create a Fleet Autoscaler using the following command:

```
kubectl apply -f $GOPATH/src/agones.dev/agones/examples/webhookfleetautoscalertls.yaml
```

### 5. See the fleet and autoscaler status.

In order to track the list of gameservers which run in your fleet you can run this command in a separate terminal tab:

```
watch "kubectl get gs -n default"
```

### 6. Allocate two Game Servers from the Fleet to trigger scale up

If you're interested in more details for game server allocation, you should consult the [Create a Game Server Fleet](./create_fleet.md) page.
Here we only interested in triggering allocations to see the autoscaler in action.

```
for i in {0..2} ; do kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/simple-udp/fleetallocation.yaml -o yaml ; done
```

### 7. Check new Autoscaler and Fleet status

Now let's wait a few seconds to allow the autoscaler to detect the change in the fleet and check again its status

```
kubectl describe fleetautoscaler webhook-fleetautoscaler-tls
```

The last part should look similar to this:

```Spec:
Fleet Name: simple-udp
Policy:
Type: Webhook
Webhook:
Ca Bundle: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN1RENDQWFBQ0NRQ29kcEFNbTlTd0pqQU5CZ2txaGtpRzl3MEJBUXNGQURBZU1Rc3dDUVlEVlFRR0V3SlYKVXpFUE1BMEdBMVVFQ3d3R1FXZHZibVZ6TUI0WERURTVNREV3TkRFeE5URTBORm9YRFRJeE1UQXlOREV4TlRFMApORm93SGpFTE1Ba0dBMVVFQmhNQ1ZWTXhEekFOQmdOVkJBc01Ca0ZuYjI1bGN6Q0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFOQ0h5dndDOTZwZDlTdkFhMUIvRWg2ekcxeDBLS1dPaVhtNzhJcngKKzZ5WHd5YVpsMVo1cVExbUZoOThMSGVZUmQwWVgzRTJnelZ5bFpvUlUra1ZESzRUc0VzV0tNUFVpdVo0MUVrdApwbythbEN6alAyaXZzRGZaOGEvdnByL3dZZ2FrWGtWalBUaGpKUk9xTnFIdWROMjZVcUFJYnNOTVpoUkxkOVFFCnFLSjRPNmFHNVMxTVNqZFRGVHFlbHJiZitDcXNKaHltZEIzZmxGRUVvdXExSmoxS0RoQjRXWlNTbS9VSnpCNkcKNHUzY3BlQm1jTFVRR202ZlFHb2JFQSt5SlpMaEVXcXBrd3ZVZ2dCNmRzWE8xZFNIZXhhZmlDOUVUWGxVdFRhZwo1U2JOeTVoYWRWUVV3Z253U0J2djR2R0t1UUxXcWdXc0JyazB5Wll4Sk5Bb0V5RUNBd0VBQVRBTkJna3Foa2lHCjl3MEJBUXNGQUFPQ0FRRUFRMkgzaWJRcWYzQTNES2l1eGJISURkbll6TlZ2Z0dhRFpwaVZyM25ocm55dmxlNVgKR09hRm0rMjdRRjRWV29FMzZDTGhYZHpEWlM4bEpIY09YUW5KOU83Y2pPYzkxVmh1S2NmSHgwS09hU1oweVNrVAp2bEtXazlBNFdoNGE0QXFZSlc3Z3BUVHR1UFpydnc4VGsvbjFaWEZOYVdBeDd5RU5OdVdiODhoNGRBRDVaTzRzCkc5SHJIdlpuTTNXQzFBUXA0Q3laRjVyQ1I2dkVFOWRkUmlKb3IzM3pLZTRoRkJvN0JFTklZZXNzZVlxRStkcDMKK0g4TW5LODRXeDFUZ1N5Vkp5OHlMbXFpdTJ1aThjaDFIZnh0OFpjcHg3dXA2SEZLRlRsTjlBeXZUaXYxYTBYLwpEVTk1eTEwdi9oTlc0WHpuMDJHNGhrcjhzaUduSEcrUEprT3hBdz09Ci0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K
Service: <nil>
URL: https://autoscaler-tls-service.default.svc:8000/scale
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal AutoScalingFleet 5s fleetautoscaler-controller Scaling fleet simple-udp from 2 to 4
```

You can see that the fleet size has increased in particular case doubled to 4 gameservers (based on our custom logic in our webhook), the autoscaler having compensated for the two allocated instances.
Last Scale Time has been updated and a scaling event has been logged.

Double-check the actual number of game server instances and status by running:

```
kubectl get gs -n default
```

This will get you a list of all the current `GameSevers` and their `Status > State`.

```
NAME STATE ADDRESS PORT NODE AGE
simple-udp-njmr7-2t4nx Ready 35.203.159.68 7330 gke-test-cluster2-default-55044752-zt63 1m
simple-udp-njmr7-65rp6 Allocated 35.203.159.68 7294 gke-test-cluster2-default-55044752-zt63 4m
simple-udp-njmr7-klgtt Ready 35.203.159.68 7323 gke-test-cluster2-default-55044752-zt63 1m
simple-udp-njmr7-lglsr Allocated 35.203.159.68 7009 gke-test-cluster2-default-55044752-zt63 4m
```

### 8. Cleanup
You can delete the autoscaler service and associated resources with the following commands.

```
kubeclt delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/autoscaler-webhook/autoscaler-service-tls.yaml
```

Removing x509 key secret:
```
kubectl delete secret autoscalersecret
```

Removing the fleet:
```
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/agones/master/examples/simple-udp/fleet.yaml
```

## Comments

Note that now secure communication is established and we can trust our webhook. If we need to use server outside of the kubernetes cluster we can use other Root certificate authority and put it into as caBundle parameter in fleetautoscaler configuration (in pem format, base64-encoded).

## Next Steps

Read the advanced [Scheduling and Autoscaling](scheduling_autoscaling.md) guide, for more details on autoscaling.
Expand Down
4 changes: 4 additions & 0 deletions docs/fleetautoscaler_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ The `spec` field is the actual `FleetAutoscaler` specification and it is compose
If not specified, the "default" would be used
- `path` is an optional URL path which will be sent in any request to this service. (i. e. /scale)
- `url` gives the location of the webhook, in standard URL form (`[scheme://]host:port/path`). Exactly one of `url` or `service` must be specified. The `host` should not refer to a service running in the cluster; use the `service` field instead. (optional, instead of service)
- `caBundle` (⚠️ ** development feature ** ⚠️) is a PEM encoded certificate authority bundle which is used to issue and then validate the webhook's server certificate. Base64 encoded PEM string. Required only for HTTPS. If not present HTTP client would be used.

Note: only one `buffer` or `webhook` could be defined for FleetAutoscaler which is based on the `type` field.

Expand All @@ -77,6 +78,8 @@ Webhook endpoint is used to delegate the scaling logic to a separate pod or serv
FleetAutoscaler would send a request to the webhook endpoint every sync period (which is currently 30s) with a JSON body, and scale the target fleet based on the data that is returned.
JSON payload with a FleetAutoscaleReview data structure would be sent to webhook endpoint and received from it with FleetAutoscaleResponse field populated. FleetAutoscaleResponse contains target Replica count which would trigger scaling of the fleet according to it.

In order to define the path to your Webhook you can use either `URL` or `service`. Note that `caBundle` parameter is required if you use HTTPS for webhook fleetautoscaler, `caBundle` should be omitted if you want to use HTTP webhook server.

The connection to this webhook endpoint should be defined in `FleetAutoscaler` using Webhook policy type.

```go
Expand Down Expand Up @@ -122,6 +125,7 @@ type FleetStatus struct {
}
```

For Webhook Fleetautoscaler Policy either HTTP or HTTPS could be used. Switching between them occurs depending on https presence in `URL` or by presense of `caBundle`.
The example of the webhook written in Go could be found [here](../examples/autoscaler-webhook/main.go).

It implements the [scaling logic](../examples/autoscaler-webhook/README.md) based on the percentage of allocated gameservers in a fleet.
2 changes: 1 addition & 1 deletion examples/autoscaler-webhook/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ REPOSITORY = gcr.io/agones-images

mkfile_path := $(abspath $(lastword $(MAKEFILE_LIST)))
project_path := $(dir $(mkfile_path))
autoscaler_webhook_tag = $(REPOSITORY)/autoscaler-webhook:0.1
autoscaler_webhook_tag = $(REPOSITORY)/autoscaler-webhook:0.2
root_path = $(realpath $(project_path)/../..)

# _____ _
Expand Down
81 changes: 81 additions & 0 deletions examples/autoscaler-webhook/autoscaler-service-tls.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Copyright 2018 Google Inc. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Define a Service for the autoscaler-webhook

apiVersion: v1
kind: Service
metadata:
name: autoscaler-tls-service
labels:
app: autoscaler-webhook-tls
spec:
selector:
app: autoscaler-webhook-tls
ports:
- port: 8000
protocol: TCP
name: https
targetPort: autoscaler
---
# Deploy a pod to run the autoscaler-webhook code
apiVersion: apps/v1
kind: Deployment
metadata:
name: autoscaler-webhook-tls
namespace: default
labels:
app: autoscaler-webhook-tls
spec:
replicas: 1
selector:
matchLabels:
app: autoscaler-webhook-tls
template:
metadata:
labels:
app: autoscaler-webhook-tls
spec:
#serviceAccount: autoscaler-webhook
volumes:
- name: secret-volume
secret:
secretName: autoscalersecret
containers:
- name: autoscaler-webhook
image: gcr.io/agones-images/autoscaler-webhook:0.2
imagePullPolicy: Always
volumeMounts:
- mountPath: /home/service/certs
name: secret-volume
ports:
- name: autoscaler
containerPort: 8000
livenessProbe:
httpGet:
scheme: HTTPS
path: /health
port: 8000
initialDelaySeconds: 3
periodSeconds: 5
env:
- name: MIN_REPLICAS_COUNT
value: "2"
- name: REPLICA_UPSCALE_TRIGGER
value: "0.7"
- name: REPLICA_DOWNSCALE_TRIGGER
value: "0.3"
- name: "SCALE_FACTOR"
value: "2"

3 changes: 2 additions & 1 deletion examples/autoscaler-webhook/autoscaler-service.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ spec:
#serviceAccount: autoscaler-webhook
containers:
- name: autoscaler-webhook
image: gcr.io/agones-images/autoscaler-webhook:0.1
image: gcr.io/agones-images/autoscaler-webhook:0.2
imagePullPolicy: Always
ports:
- name: autoscaler
Expand All @@ -62,3 +62,4 @@ spec:
port: 8000
initialDelaySeconds: 3
periodSeconds: 5

Loading