This repository serves as a guide for the Amazon EKS Networking Bootcamp. Throughout this bootcamp, which focuses on EKS networking, you'll delve into different networking solutions and tackle IP exhaustion concerns. Additionally, you'll gain proficiency in crafting and enforcing network policies at the workload level, safeguarding your applications with enhanced security measures.
Amazon EKS runs upstream Kubernetes, so you can install alternate compatible CNI plugins to Amazon EC2 nodes in your cluster.
Important
If you have Fargate nodes in your cluster, the Amazon VPC CNI plugin for Kubernetes is already on your Fargate nodes. It's the only CNI plugin you can use with Fargate nodes. An attempt to install an alternate CNI plugin on Fargate nodes fails.
Amazon EKS maintains relationships with a network of partners that offer support for alternate compatible CNI plugins. For this workshop we will focus on Calico options for EKS.
The geeky details of what you get:
Policy | IPAM | CNI | Overlay | Routing | Datastore |
---|---|---|---|---|---|
Calico | AWS | AWS | No | VPC Native | Kubernetes |
Calico network policy enforcement on an EKS cluster using the AWS VPC CNI plugin.
The geeky details of what you get:
Policy | IPAM | CNI | Overlay | Routing | Datastore |
---|---|---|---|---|---|
Calico | Calico | Calico | VXLAN | Calico | Kubernetes |
Deploying Calico as the Container Network Interface (CNI), IP Address Management (IPAM), and for network policy enforcement brings numerous advantages to EKS. It not only resolves issues related to IP exhaustion but also enables additional features like the Calico Egress Gateway.
Calico CNI is designed to offer users the flexibility to select the most suitable dataplane option according to their needs. Calico supports several dataplanes, including:
- Standard Linux (iptables)
- eBPF (Extended Berkeley Packet Filter)
- Windows HNS (Host Networking Service)
- VPP (Vector Packet Processing )
Before diving into the installation of Calico in EKS and configuring various dataplane options, let's first set up our AWS CloudShell environment for seamless execution of commands and configurations.
To begin the bootcamp, you'll need to meet the following basic requirements:
- AWS Account AWS Console
- AWS CloudShell https://portal.aws.amazon.com/cloudshell
- Amazon EKS Cluster - to be created later in this bootcamp.
-
Login to AWS Portal at https://portal.aws.amazon.com.
-
Open the AWS CloudShell.
-
Install the bash-completion on the AWS CloudShell.
sudo yum -y install bash-completion
-
Configure the kubectl autocomplete.
# set up autocomplete in bash into the current shell, bash-completion package should be installed first. source <(kubectl completion bash) # add autocomplete permanently to your bash shell. echo "source <(kubectl completion bash)" >> ~/.bashrc
You can also use a shorthand alias for kubectl that also works with completion:
alias k=kubectl complete -o default -F __start_kubectl k echo "alias k=kubectl" >> ~/.bashrc echo "complete -o default -F __start_kubectl k" >> ~/.bashrc /bin/bash
-
Install the eksctl - Installation instructions
mkdir ~/.local/bin curl --silent --location "https://github.com/weaveworks/eksctl/releases/latest/download/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp sudo mv /tmp/eksctl ~/.local/bin && eksctl version
-
Install the K9S, if you like it.
curl --silent --location "https://github.com/derailed/k9s/releases/download/v0.32.5/k9s_Linux_amd64.tar.gz" | tar xz -C /tmp sudo mv /tmp/k9s ~/.local/bin && k9s version
-
Clone this repository in your Azure Cloud Shell.
git clone https://github.com/tigera-solutions/eks-networking-bootcamp.git && \ cd eks-networking-bootcamp
This module provisions an EKS cluster with AWS VPC CNI and connects it to Calico Cloud.
- Define the environment variables to be used by the resources definition.
Note
In this section, we'll create some environment variables. If your terminal session restarts, you may need to reset these variables. You can do that using the following command:
source ~/labVars.env
# Feel free to use the cluster name and the region that better suits you.
export CLUSTERNAME1=eks-calico
export REGION=us-west-2
export VERSION=1.29
export INSTANCETYPE=m5.xlarge
# Persist for later sessions in case of disconnection.
echo "# Start EKS Bootcamp Lab Params" > ~/labVars.env
echo export CLUSTERNAME1=$CLUSTERNAME1 >> ~/labVars.env
echo export REGION=$REGION >> ~/labVars.env
echo export VERSION=$VERSION >> ~/labVars.env
echo export INSTANCETYPE=$INSTANCETYPE >> ~/labVars.env
-
Create the EKS cluster.
eksctl create cluster \ --name $CLUSTERNAME1 \ --region $REGION \ --version $VERSION \ --node-type $INSTANCETYPE
-
Verify you have API access to your new EKS cluster
kubectl get nodes
Tip
If you loose access to your EKS cluster, you can update the kubeconfig to restore the kubectl access to it by running the following command:
source ~/labVars.env
aws eks update-kubeconfig --name $CLUSTERNAME1 --region $REGION
After provisioning the EKS cluster, the PODs will be networked using Amazon VPC CNI, enabling routable IPs. For enhanced security and observability features offered by Calico, you have the option to connect your cluster to Calico Cloud or install Calico Enterprise on it.
Follow instructions in the official Calico Cloud documentation to connect your EKS cluster to Calico Cloud management plane.
During this bootcamp, we will use the Cat Facts Application
as an example to work on. In this module, we will learn how to use Calico to implement the workload-level network policies for each of the workloads.
-
Install the example application stack:
From the cloned directory, execute:
kubectl apply -f pre
Included in the pre
folder, there is the application that will be used in the exercises during the workshop. The diagram below shows how the applications' microservices communicate between themselves.
There are also other objects that will be created. We will learn about them later.
Important
Wait until all the pods are up and running to move to the next step.
Connect to Calico Cloud GUI. From the menu, select Service Graph > Default
. Explore the options.
A global default deny policy ensures that unwanted traffic (ingress and egress) is denied by default. Pods without policy (or incorrect policy) are not allowed traffic until the appropriate network policy is defined. Although the staging policy tool will help you find the incorrect or the missing policy, a global deny policy helps mitigate other lateral malicious attacks.
By default, all traffic is allowed between the pods in a cluster. Let's start by testing connectivity between application components and across application stacks. All of these tests should succeed as there are no policies in place.
We recommend creating a global default deny policy after you complete reviewing the policy for the traffic you want to allow. Use the stage policy feature to get your allowed traffic working as expected, then lock down the cluster to block unwanted traffic.
-
Create a staged global default-deny policy. It will show all the traffic that would be blocked if it were enforced.
- Go to the
Policies Board
- On the bottom of the tier box
default
click onAdd Policy
- In the
Create Policy
page enter the policy name:default-deny
- On the
Applies To
session, clickAdd Namespace Seletor
First, lets apply only to thecatfacts
namespace- Select Key...
kubernetes.io/metadata.name
- =
- Select Value...
catfacts
- Select Key...
- On the field
Type
select both checkboxes: Ingress and Egress. - You are done. Click
Stage
on the top-right of your page.
- In the
The staged policy does not affect the traffic directly but allows you to view the policy impact if it were to be enforced. You can see the denied traffic in the staged policy.
- Go to the
Calico Security Policies provide a richer set of policy capabilities than the native Kubernetes network policies, including:
- Policies that can be applied to any endpoint: pods/containers, VMs, and/or to host interfaces
- Policies that can define rules that apply to ingress, egress, or both
- Policy rules support:
- Actions: allow, deny, log, pass
- Source and destination match criteria:
- Ports: numbered, ports in a range, and Kubernetes named ports
- Protocols: TCP, UDP, ICMP, SCTP, UDPlite, ICMPv6, protocol numbers (1-255)
- HTTP attributes
- ICMP attributes
- IP version (IPv4, IPv6)
- IP or CIDR
- Endpoint selectors (using label expression to select pods, VMs, host interfaces, and/or network sets)
- Namespace selectors
- Service account selectors
-
Based on the application design, the
db
lists on port3306
and receive connections from theworker
and thefacts
microservices. Let's use the Calico Cloud UI to create a policy to microsegment this traffic.- Go to the
Policies Board
- On the bottom of the tier box
platform
click onAdd Policy
- In the
Create Policy
page enter the policy name:db
- Change the
Scope
fromGlobal
toNamespace
and select the namespacecatfacts
- On the
Applies To
session, clickAdd Label Seletor
- Select Key...
app
- =
- Select Value...
db
- Click on
SAVE LABEL SELECTOR
- Select Key...
- On the field
Type
select the checkbox for Ingress. - Click on
Add Ingress Rule
- On the
Create New Policy Rule
window,- Click on the
dropdown
withAny Protocol
- Change the radio button to
Protocol is
and selectTCP
- In the field
To:
click onAdd Port
Port is
3306 - Save- In the field
From:
, clickAdd Label Seletor
- Select Key...
app
- =
- Select Value...
worker
- Click on
SAVE LABEL SELECTOR
- On
OR
, clickAdd Label Seletor
- Select Key...
app
- =
- Select Value...
facts
- Click on
SAVE LABEL SELECTOR
- Select Key...
- Click on the button
Save Rule
- Click on the
- On the
- You are done. Click
Enforce
on the top-right of your page.
- In the
- Go to the
-
Now, let's use the
Recommend a Policy
feature to create the policies for the other workloads.Let's start with the
facts
workload. -
Now that you have learned how to create policies using Calico Cloud UI, go ahead and create microsegmentation policies for the
worker
workload.
Tip
- Look into the Cat Facts application diagram to figure out all the communications to and from the
worker
microservice. - You can use the
Recommend a Policy
feature as well. Don't forget to edit the recommendations to make it as granular as possible, like restricting the domain names of the API's that the worker needs to communicate with (dog.ceo
andcatfact.ninja
).
If you create all the policies correctly, at some point, you will start seeing zero traffic being denied by your default-deny staged policy. At that point, you can go ahead and enforce the default-deny policy. Voilà! The catfacts
namespace is now secure.
Tip
- After enforcing the default-deny policy, if you need to troubleshoot, use the Service Graph or the Flow Visualizations tools to see what traffic is being blocked.
First, we should delete the application to release the load balancer resource associated with it. Once the application is removed, we can proceed with deleting the EKS cluster.
-
Delete the applications stack to clean up any loadbalancer services.
kubectl delete -f pre/40-catfacts-app.yaml
-
Delete EKS cluster.
source ~/labVars.env eksctl delete cluster --name $CLUSTERNAME1 --region $REGION
🎉 Congratulations! You've completed the Amazon EKS Networking Bootcamp.
If you have the time and are still eager to learn, consider exploring the following bonus modules!
- Bonus Module 1: EKS with Calico CNI
- Bonus Module 2: EKS with Windows nodegroup and Calico for Policy
In this module, the EKS cluster is provisioned and configured to use Calico CNI in eBPF mode. With Calico in place, it will handle Policy, IPAM, and CNI responsibilities.
- Define the environment variables to be used by the resources definition.
Note
In this section, we'll create some environment variables. If your terminal session restarts, you may need to reset these variables. You can do that using the following command:
source ~/labVars.env
# Feel free to use the cluster name and the region that better suits you.
export CLUSTERNAME2=eks-calico-ebpf
# Persist for later sessions in case of disconnection.
echo export CLUSTERNAME2=$CLUSTERNAME2 >> ~/labVars.env
- Create the EKS cluster.
Note
If you intend to use Wireguard
based encryption to security pod-to-pod communications, the EKS cluster must be provisioned to use AMI that includes Wireguard
bits. One of such AMI's is Bottlerocket
that is used in this example.
eksctl create cluster \
--name $CLUSTERNAME2 \
--region $REGION \
--version $VERSION \
--without-nodegroup
-
Verify you have API access to your new EKS cluster
kubectl get pod -A
Tip
If you loose access to your EKS cluster, you can update the kubeconfig to restore the kubectl access to it by running the following command:
source ~/labVars.env
aws eks update-kubeconfig --name $CLUSTERNAME2 --region $REGION
-
Uninstall the AWS VPC CNI and install Calico CNI.
To install Calico CNI, the first step is to remove the AWS VPC CNI, and then proceed with the installation of Calico. For further information about Calico CNI installation on AWS EKS, please refer to the Project Calico documentation
Delete the
aws-node
daemonset resource to uninstall AWS VPN CNIkubectl delete daemonset -n kube-system aws-node
-
Install Calico CNI
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.28.2/manifests/tigera-operator.yaml
-
Create the installation configuration.
kubectl create -f - <<EOF kind: Installation apiVersion: operator.tigera.io/v1 metadata: name: default spec: kubernetesProvider: EKS cni: type: Calico calicoNetwork: bgp: Disabled linuxDataplane: BPF hostPorts: Disabled EOF
-
Check data plane configuration
kubectl get installation.operator.tigera.io default -ojsonpath='{.spec.calicoNetwork.linuxDataplane}'
-
Create the nodegroup and the nodes. Two nodes are enough to demonstrate the concept.
eksctl create nodegroup \ --cluster $CLUSTERNAME2 \ --region $REGION \ --node-type $INSTANCETYPE \ --node-ami-family Bottlerocket \ --max-pods-per-node 100
Wait for all Calico components to get deployed and only then proceed to next step.
kubectl get tigerastatus
-
Configure Calico to talk directly to the API server
-
Get config values from EKS cluster settings
APISERVER_ADDR=$(kubectl get configmap -n kube-system kube-proxy -o yaml | grep server | awk -F "/" '{print $3}') APISERVER_PORT=$(kubectl get endpoints kubernetes -ojsonpath='{.subsets[0].ports [0].port}')
-
Configure kubernetes-services-endpoint configmap
kubectl apply -f - <<-EOF kind: ConfigMap apiVersion: v1 metadata: name: kubernetes-services-endpoint namespace: tigera-operator data: KUBERNETES_SERVICE_HOST: "${APISERVER_ADDR}" KUBERNETES_SERVICE_PORT: "${APISERVER_PORT}" EOF
-
-
Disable
kube-proxy
In eBPF mode Calico replaces kube-proxy so it doesn't make sense to run both. Disable the kube-proxy in the cluster.
kubectl patch ds -n kube-system kube-proxy -p '{"spec":{"template":{"spec":{"nodeSelector":{"non-calico": "true"}}}}}'
- [optional] Enable DSR mode
Direct return mode skips a hop through the network for traffic to services (such as node ports) from outside the cluster. This reduces latency and CPU overhead but it requires the underlying network to allow nodes to send traffic with each other's IPs. In AWS, this requires all your nodes to be in the same subnet and for the source/dest check to be disabled.
DSR mode is disabled by default; to enable it, set the BPFExternalServiceMode Felix configuration parameter to "DSR". This can only be done with calicoctl:
-
Use the following command to download the calicoctl binary.
curl -L https://github.com/projectcalico/calico/releases/download/v3.28.2/calicoctl-linux-amd64 -o kubectl-calico
-
Set the file to be executable.
chmod +x kubectl-calico
-
Configure Calico to use DSR
calicoctl patch felixconfiguration default --patch='{"spec": {"bpfExternalServiceMode": "DSR"}}'
If you want to learn more about eBPF dataplane, here are some links:
- Introducing the Calico eBPF dataplane
- eBPF Explained: Use Cases, Concepts, and Architecture
- eBPF XDP: The Basics and a Quick Tutorial
- Get started with XDP
With Calico CNI you avoid dependencies on AWS VPC CNI. Allocating pod IPs from the underlying VPC is problematic due to IP address range exhaustion challenges, or if the maximum number of pods supported per node by the Amazon VPC CNI plugin is not sufficient for your needs, we recommend using Calico networking in cross-subnet overlay mode. Pod IPs will not be routable outside of the cluster, but you can scale the cluster up to the limits of Kubernetes with no dependencies on the underlying cloud network.
Important
Before deleting the EKS cluster, it's essential to uninstall any sample application and release the load balancer resources allocated to it. This ensures that no resources are left dangling and allows for a clean deletion of the cluster.
-
Delete EKS cluster.
source ~/labVars.env eksctl delete cluster --name $CLUSTERNAME2 --region $REGION
In this module EKS cluster is provisioned with AWS VPC CNI option where each pod gets a routable IP assigned from the Amazon EKS VPC. A Windows nodegroup is also added to the cluster. Calico plugin is installed for network policy enforcement.
- Define the environment variables to be used by the resources definition.
Note
In this section, we'll create some environment variables. If your terminal session restarts, you may need to reset these variables. You can do that using the following command:
source ~/labVars.env
# Feel free to use the cluster name and the region that better suits you.
export CLUSTERNAME3=eks-windows
# Persist for later sessions in case of disconnection.
echo export CLUSTERNAME3=$CLUSTERNAME3 >> ~/labVars.env
-
Create the EKS cluster.
eksctl create cluster \ --name $CLUSTERNAME3 \ --region $REGION \ --version $VERSION \ --node-type $INSTANCETYPE
-
Verify you have API access to your new EKS cluster
kubectl get nodes
-
Tip
If you loose access to your EKS cluster, you can update the kubeconfig to restore the kubectl access to it by running the following command:
source ~/labVars.env
aws eks update-kubeconfig --name $CLUSTERNAME3 --region $REGION
-
Enable Windows support for your Amazon EKS cluster
kubectl apply -f - <<EOF apiVersion: v1 kind: ConfigMap metadata: name: amazon-vpc-cni namespace: kube-system data: enable-windows-ipam: "true" EOF
-
Create the Windows nodegroup
eksctl create nodegroup \ --cluster=$CLUSTERNAME3 \ --region=$REGION \ --node-ami-family=WindowsServer2022CoreContainer
-
Get config values from EKS cluster settings
APISERVER_ADDR=$(kubectl get configmap -n kube-system kube-proxy -o yaml | grep server | awk -F "/" '{print $3}') APISERVER_PORT=$(kubectl get endpoints kubernetes -ojsonpath='{.subsets[0].ports[0].port}') SERVICE_CIDR=$(aws eks describe-cluster --name $CLUSTERNAME3 --region $REGION --query 'cluster.kubernetesNetworkConfig.serviceIpv4Cidr' --no-cli-pager | tr -d '"')
-
Configure kubernetes-services-endpoint configmap
kubectl apply -f - << EOF kind: ConfigMap apiVersion: v1 metadata: name: kubernetes-services-endpoint namespace: tigera-operator data: KUBERNETES_SERVICE_HOST: "${APISERVER_ADDR}" KUBERNETES_SERVICE_PORT: "${APISERVER_PORT}" EOF
-
Configure service CIDR in the Installation CR
kubectl patch installation default --type merge --patch="{\"spec\": {\"serviceCIDRs\": [\"${SERVICE_CIDR}\"], \"calicoNetwork\": {\"windowsDataplane\": \"HNS\"}}}"
-
Install kube-proxy on Windows nodes.
curl -L https://raw.githubusercontent.com/kubernetes-sigs/sig-windows-tools/master/hostprocess/calico/kube-proxy/kube-proxy.yml | sed "s/KUBE_PROXY_VERSION/v1.28.2/g" | kubectl apply -f -
-
Enable firewall rules for Prometheus
kubectl patch felixConfiguration default --type merge --patch '{"spec": {"windowsManageFirewallRules": "Enabled"}}'
Important
Before deleting the EKS cluster, it's essential to uninstall any sample application and release the load balancer resources allocated to it. This ensures that no resources are left dangling and allows for a clean deletion of the cluster.
-
Delete EKS cluster.
source ~/labVars.env eksctl delete cluster --name $CLUSTERNAME3 --region $REGION