Certified Kubernetes Administrator(CKA)

Certified Kubernetes Administrator(CKA)

Useful Links

Infrastructure for container projects - LXC, LXD & LXCFS further reading on what are LXD and LXC container runtime technologies is highly recommended

Components

API

Master-Node-Communication

Running Locally (non Minikube)

Monitoring Kubernetes with Prometheus

Basic Kubernetes Architecture

Kubernetes Node Types and their Components (19%)

Master/Control Plane - cluster’s control plane with HA setup (or) single node instance
- kube-apiserver - answers api call
- etcd - key/value store used by API Server for configuration and other persistent storage needs
- kube-scheduler - Determins which nodes are responsable for PODS and their respective containers as they are up in the cluster
- kube-controller-manager - Component on the master that runs controllers .
  
  Logically, each controller is a separate process, but to reduce complexity, they are all compiled into a single binary and run in a single process.
  
  These controllers include:
  - Node Controller: Responsible for noticing and responding when nodes go down.
  - Replication Controller: Responsible for maintaining the correct number of pods for every replication controller object in the system.
  - Endpoints Controller: Populates the Endpoints object (that is, joins Services & Pods).
  - Service Account & Token Controllers: Create default accounts and API access tokens for new namespaces.
- cloud-controler-manager - split out in to several containers depends on cloud platform we are running - responsible for persistatnt storage, routes for networking
Nodes/Workers - workers/minions are work force of the cluster
- kublet - takes orders from master to run PODS
- kube-proxy - assist with Container Network Interface(CNI) with routing traffic around the cluster
- POD - one (or) more containers run as part of POD which are considered disposablea and replacable

API Premitives (or) Cluster Objects

Persistent entities and represent state of the cluster like,

What applications are running
Which node those applications are running on
Policies around those applications

K8s Objects are records of intent

Object Spec

User provides desired state of an object

Object Status

Provided by K8s to describe the actual state of the object which is managed by control plane of K8s

Common Cluster Objects

Nodes - which makes the cluster
PODS - single instance of application containers
Deployments - Controller for PODS and it ensures that resources are available, such as IP address, and storage and then deploys a ReplicaSet
ReplicaSet - controller which deploys and restart PODS until requested number of PODSare running.
Services - Exposes deployments to external networks by means of 3rd party LoadBalancer (or) ingress rules
ConfigMaps - KV pair that can dynamically plugged in to other objects as needed which allows to decouple configuration from individual PODS (or) Deployments which gives lot of flexibility

Names and UIDs

Names

lower case alphanumeric with max len 253 characters and can be resued and provided by client but must be uniq. BTW: - and . are allowed.

UIDs

generated by K8s

Namespaces

multiple virtual clusters back by same physical cluster
best for large deployments to isolate cluster resources and to define RBAC and quotas
kube-system special namespace used to differenciate systems PODS from user PODS

Cloud Controller Manager

Route Controller(GCE clusters only)
Service Controller - responsbile for listening to service create/update/delete events and update cloud load balancers like AWS-ELB, Google-LB,Azure-LB to reflect the state of services in K8s
Persistent Volume Labels Controller - applies labels AWS-EBS,GCE-PD(persistent disk) volumes when those get created which allows to manually define by user. These lables are centeral to scheduling of PODS as those labels are constrain to work only in the region/zone they are in.

Node Controller

Assigns CIDR block to new nodes(if CIDR assignment is turned on).
Keeps track of the nodes using health of the node and changes the status of Not Ready to UNKNOWNwhen node becomes unreahable with-in the heartbeat intervals(40 sec timeout). If node state is flagged as UNKNOWN and 5 min after that it evicts PODS(rate of 0.1/sec[effected by cluster_size(default 50) configurable by large_cluster_size_threshold] means it will not effect PODS from more than 1 node every 10 sec) from unhealthy nodes. BTW: evictions are stopped if nodes in all availablity zones are flagged as UNKNOWN because there could be scenario where master node could be having problem connecting to worker nodes
Checks the status of nodes every few sec which is configurable by setting node_monitor_period

Services & Network Primitives

Servics refere to deployments and exposes particualr port via kube-proxy (or) using special IP Address for entire POD and prefernce of this setup strictly depends on network policies, security policies, ingress rules to handle load balancing and port forwarding with-in each cluster deployment.

Designing a K8s Cluster

Local Deployment Options

Minikube - single node K8s on local workstation
Kubeadm - multi-nodes cluster on local workstation
Ubuntu on LXD - provides nine-instance suppoted cluster on local workstation

Cloud/COAS(Container Orchestrator as Service) Turnkey Solutions

conjure-up - deploy the Canonical Distribution of Kubernetes with Ubuntu on AWS, Azure, Google and Oracle Cloud
AWS EKS, Azure AKS, Google GKE, Tectonic by CoreOS, Stackpoint.io, kube2go.io as well as madcore.ai

Add-on Solutions

Container Network Interface(CNI)
- Calico - secured L3 networking and policy provider
- Canal unites Flannel & Calico - provided networking and policies
- Cilium - L3 network and policy plugin that can enforce HTTP/API/L7 policies transparently
- CNI-Genie - enables K8s to seamlessly connect to CNI of any choice such as Calico, Canal, Flannel, Romana (or) Weave
- NSX-T(NCP) provides integration between VMWare NSX-T and container orchestrator such as K8s as well as container-based platforms(CaaS,PaaS) like PCF and Openshift.
- Nuage is a SDN platform that provides policy-based networking between K8s PODS and non-K8s environments with visibility and security monitoring
- Weave Net provides networking and policy and doesn't require an external database.

Hardware and Underlying Infrastructure

Nodes including master can be physical/virtual running K8s components and a container runtime like Docker, rocket(rkt)
nodes should be connected by a common network where Flannel is used as basic PODS networking application.

Securing Cluster Communications

Securing API

Communication to API Server/Service, control-plane communications inside the cluster and even POD-to-POD communications.
Default encryption communication is TLS
Ensure any unsecure taffic/ports are DISABLED
Anything that connects to API, including nodes, proxies, the scheduler, volume plugins and users should be Authenticated
Certs for all internal communication channels are automatically creaded unless you chose to deploy K8s cluster the hardway
Once Authenticated Authorization checks should be passed.
RBAC Control Component used to match users/groups to permission sets that are organized in to roles

Securing Kubelet

By default exposes HTTPS endpoints which gives access to both data and actions on the nodes, to enable Kubelet to use RBAC policies by starting it with a flag --anonymous-auth=false and assign an appropriate x509 client certificate in its configuration.

Securing Network

Network policies restrict access to network for a particualr namespace
Users can be assigned quotas or limit ranges to prevent over usage of node ports (or) Load Balance services

Vulnerabilities

etcd is default key/value to store for entire cluster to store configuration and secrets of the cluster so isolate its access behind FW so only it is accessed through API which should be well enforced using RBAC
Audit logging and architect to store the audit file on to a secure server
Rotate Infra Credentails frequently by setting short liftime and use any automated rotation methods where applicable

Thrid Party Integrations

Don't allow them in to kube-system namespace
Always review integration before enabling

Join K8s announcement group for emails about security announcements.

High Availability(HA)

Build Reliable Cluster

Create reliable nodes that forms the cluster
Setup redundant and reliable storage service with multinode deployment of etcd
Start replicated and load balanced API Server/Service
Setup a master-elected scheduler and controller manager daemons.

Step One - `Kubelet`

Ensure cluster services automatically restart if they fail and Kubeletalready does this, but what if Kubelet goes DOWN, so for that we need something like monit to watch the service/process.

Step Two - `Storage`

Clustered etcd already replicates the storage to all master instances in the cluster
Add additional reliability by increasing the cluster from 3 to 5 nodes and even more where applicable
If using cloud provider then use any Block Device Persistent Storage offerings of that cloud provider to map/mount then on to your virtual machines in the cloud.
If running on Baremetal - use options like NFS, Clustered Filesystem like Gluster (or) Ceph, Hardware level RAID

Step Three - `Replicated API Services`

Refere to image in images/k8s-ha-step3-1.png and images/k8s-ha-step3-2.png files for detail steps

Step Four - Controller/Scheduler Daemons

Allow our state to change, we use lease-lock on API to perform master election. So each scheduler and controller manager will be started using --leader-elect flag to ensure only 1 instance of scheduler and controller manager(CM) are running and making changes to cluster
Scheduler and CM can be configured to talk to the API Server/Service that is on the same node (or) through Loadbalancd IP(preferred).

Installing Configuration Files

Create empty log files on each node, so that Docker will mount the files and not make new directhries,
```
touch /var/log/kube-scheduler.log
touch /var/log/kube-controller-manager.log
```
Setup the description of the scheduler and CM PODSon each node by copying the kube-scheduler.yaml and kube-controller-manager.yaml in to the /etc/kubernetes/manifests/ directory.

End-to-End Testing & Validation Nodes & the Cluster

Kubetest Suite - ideal of GCE and AWS users to,
- Build
- Stage
- Extract
- Bring up the Cluster
- Test
- Dump logs
- Tear Down the cluster.
manual validation
- kubectl get nodes
- kubectl desribe node <node name>
- kubectl get pods --all-namespaces -o wide
- kubectl get pods -n kube-system -o wide
- ps aux | grep [k]ube - validate with kube processes are running.

Labels & Selectors

Labels are key/value pairs that are attached to objects, such as pods
Labels are intended to be used to specify identifying attributes of objects that are meaningful and relevant to users
One use case of labels is to constrain the set of nodes onto which a pod can schedule
The API currently supports two types of selectors: equality-based and set-based
- equality-based
```
kubectl get pods -l environment=production,tier=frontend
```
- set-based
  - Newer resources, such as Job, Deployment, Replica Set, and Daemon Set support set-based requirements as well.
```
kubectl get pods -l 'environment in (production),tier in (frontend)'
```

Taints and Tolerations

Node affinity, described here, is a property of pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite – they allow a node to repel a set of pods.
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints. Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.
Example Use Caes
Taint based Evictions
Taint Nodes by Condition beta in v1.12

Manually Scheduling Pods

Monitoring Cluster and Application Components (5% of the Exam)

cAdvisor exposes simple UI for local containers on port 4194(default)
metric-server (< K8s v1.8)
Logging
- Good discussion on K8s issue in github - Kubernetes logging, journalD, fluentD, and Splunk, oh my!
- Logging at the node level
  - Clustet setup usingkube-up.shconfigures logrotate tool to run each hour to rotate containers logs when log file exceeds 10MB (default)
  - Container engine/runtime can also rotate logs for example: Docker Logging Drivers
- System Component Logs
  - Two types of system components: those that run in a container and those that do not run in a container. For example:
    - The Kubernetes scheduler and kube-proxy run in a container.
    - The kubelet and container runtime, for example Docker, do not run in containers.
  - On machines with systemd, the kubelet and container runtime write to journald.
  - If systemd is not present, they write to .log files in the /var/log directory.
  - System components inside containers always write to the /var/log directory,bypassing the default logging mechanism. They use the glog logging library.
    - Conventions for logging severity for these components can be found in the development docs on logging
  - Logrotations happens either daily or once the size exceeds 100MB (default).
- Cluster-Level-Logging
  - logs should have a separate storage and lifecycle independent of nodes, pods, or containers
  - Here are some options:
    - Use a node-level logging agent that runs on every node.
    - Include a dedicated streaming sidecar container (or) dedicated sidecar container with logging agentfor logging in an application pod.
    - Push logs directly to a backend from within an application.
    Node-level Logging Agent
    - Because the logging agent must run on every node, it's implemented asDaemonSet replica
    - Best suties only for applications emitting logs to stdout and stderr
    Streaming sidecar container
    - The individual sidecar container streams application logs to its own stdout and stderr
    - The sidecar container runs a logging agent, which is configured to pick up logs from an application container.
    - The sidecar containers read logs from a file, a socket, or the journald.
    - writing logs to a file and then streaming them to stdout can double disk usage
    - If you have an application that writes to a single file, it’s generally better to set /dev/stdout as destination rather than implementing the streaming sidecar container approach.
    Sidecar Container with Logging Agent
    - ConfigMaps are used to configure the configuration for logging-agents like - Fluend, Splunk (or) Splunkforwarder-with-DeploymentServer

Cluster Maintenance (11% of the Exam)

Upgrading Kubernetes Components using kubeadm

We need to have a kubeadm Kubernetes cluster
Swap must be disabled
The cluster should use a static control plane and etcd pods.
Make sure you read the release notes carefully.
Make sure to back up any important components, such as app-level state stored in a database.
Upgrade kubeadm pkg using the pkg manager specific to OS - yum (or) apt
kubeadm upgrade does not touch your workloads, only components internal to Kubernetes, but `backups are always a best practice.
All containers are restarted after upgrade, because the container spec hash value is changed.
You can upgrade only from one minor version to the next minor version. That is, you cannot skip versions when you upgrade. For example, you can upgrade only from 1.10 to 1.11, not from 1.9 to 1.11.

Upgrading Worker Nodes

Drain node in prepartion for maintenace by using kubectl drain <node-name> --ignore-daemonsets. The given node will be marked unschedulable to prevent new pods from arriving. drain evicts the pods if the APIServer supports eviction. Otherwise, it will use normal DELETE to delete the pods.
If there are DaemonSet-managed pods, drain will not proceed without --ignore-daemonsets, and regardless it will not delete any DaemonSet-managed pods, because those pods would be immediately replaced by the DaemonSet controller, which ignores unschedulable markings
If there are any PODS that are neither mirror pods nor managed by ReplicationController, ReplicaSet, DaemonSet, StatefulSet or Job, then drain will not delete any pods unless you use --force.
drain waits for graceful termination. You should not operate on the machine until the command completes.
Upgrade kubelet pkg using the pkg manager specific to OS - yum (or) apt
Verify using systemctl status kubelet to ensure its still active upgrade the upgrade
Once Upgraded successfully finally uncordon the node so its make as schedulable

Upgrading OS on Nodes

drain the node
Ensure all pods are evicted and node status is flagged as NotReady, SchedulingDisabled
As we are not going to bring same machine in to cluster we will delete the node from the know list of kubeadm using kubectl delete node <node name>
To add new upgraded server as node to cluster, follow these steps to get cluster join command with valid token
- get available token list kubeadm token list
- if no tokens founds (or) expired generate a new token using kubeadm token generate
- create token to request the join command
```
kubeadm token create <token-from-generate-cmd> --ttl <h> --print-join-command
kubeadm join <master-ip>:6443 --token <token-provided> --discovery-token-ca-cert-hash sha256:<random-hash>
```
(or)
- Just use kubeadm token create --print-join-command to generate new token and print-join-command using one single command

Networking (11%)

Node Networking Configuration

Master Node

Ports(over TCP)	Service
6443	API Server
2379, 2380	etcd server client API
10250	Kublet API
10251	kube-scheduler
10252	kube-controller-manager
10255	read-only Kublet API

Worker Node

Ports(over TCP)	Service
10250	Kublet API
10255	read-only Kublet API
30000 - 32767	NodePort Services

Service Networking

ServiceTypes

Kubernetes ServiceTypes allow you to specify what kind of service you want [ The default is ClusterIP]

Type values and their behaviors are:
- ClusterIP: Exposes the service on a cluster-internal IP. Choosing this value makes the service only reachable from within the cluster. This is the default ServiceType.
- NodePort: Exposes the service on each Node’s IP at a static port (the NodePort). A ClusterIP service, to which the NodePort service will route, is automatically created. You’ll be able to contact the NodePort service, from outside the cluster, by requesting <NodeIP>:<NodePort>.
- LoadBalancer: Exposes the service externally using a cloud provider’s load balancer. NodePort and ClusterIP services, to which the external load balancer will route, are automatically created.
- ExternalName: Maps the service to the contents of the externalName field (e.g. foo.bar.example.com), by returning a CNAME record with its value. No proxying of any kind is set up. This requires version 1.7 or higher of kube-dns.
LoadBalancer - On cloud providers which support external load balancers, setting the type field to LoadBalancer will provision a load balancer for your Service. The actual creation of the load balancer happens asynchronously, and information about the provisioned balancer will be published in the Service’s .status.loadBalancer field. For example:
```
kind: Service
apiVersion: v1
metadata:
  name: my-service
spec:
  selector:
    app: MyApp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9376
  clusterIP: 10.0.171.239
  loadBalancerIP: 78.11.24.19
  type: LoadBalancer
status: ## This information is published 
  loadBalancer:
    ingress:
    - ip: 146.148.47.155
```
Note: Special notes for Azure: To use user-specified public type loadBalancerIP, a static type public IP address resource needs to be created first, and it should be in the same resource group of the other automatically created resources of the cluster. For example, MC_myResourceGroup_myAKSCluster_eastus. Specify the assigned IP address as loadBalancerIP. Ensure that you have updated the securityGroupName in the cloud provider configuration file. For information about troubleshooting CreatingLoadBalancerFailed permission issues see, Use a static IP address with the Azure Kubernetes Service (AKS) load balancer or CreatingLoadBalancerFailed on AKS cluster with advanced networking.

Spec for Service with Selector

kind: Service
apiVersion: v1
metadata:
  name: my-service
spec:
  selector:
    app: MyApp
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9376

Spec for Service without Selector
- Services generally abstract access to Kubernetes Pods, but they can also abstract other kinds of backends. For example:
  - You want to have an external database cluster in production, but in test you use your own databases.
  - You want to point your service to a service in another Namespace or on another cluster.
  - You are migrating your workload to Kubernetes and some of your backends run outside of Kubernetes.
In any of these scenarios you can define a service without a selector:
```
kind: Service
apiVersion: v1
metadata:
  name: my-service
spec:
  ports:
  - protocol: TCP
    port: 80
    targetPort: 9376
```
and because this service has no selector, the corresponding Endpoints object will not be created. You can manually map the service to your own specific endpoints:
```
kind: Endpoints
apiVersion: v1
metadata:
  name: my-service
subsets:
  - addresses:
      - ip: 1.2.3.4
    ports:
      - port: 9376
```

Multi-Port Services

kind: Service
apiVersion: v1
metadata:
  name: my-service
spec:
  selector:
    app: MyApp
  ports:
  - name: http
    protocol: TCP
    port: 80
    targetPort: 9376
  - name: https
    protocol: TCP
    port: 443
    targetPort: 9377

Note: Port names must only contain lowercase alphanumeric characters and -, and must begin & end with an alphanumeric character. 123-abc and web are valid, but 123_abc and -web are not valid names.

Load Balancer

Ingress

Manages external access to the services in a cluster, typically HTTP.
Provide load balancing, SSL termination and name-based virtual hosting.
Collection of rules that allow inbound connections
Exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the ingress resource.

An ingress does not expose arbitrary ports or protocols. Exposing services other than HTTP and HTTPS to the internet typically uses a service of type Service.Type=NodePort or Service.Type=LoadBalancer.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: test-ingress
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
spec:
  rules:
  - http:
      paths:
      - path: /testpath
        backend:
          serviceName: test
          servicePort: 80

Types of Ingress

Single Service Ingress:

DNS for Service and Pods

Network Policies

Storage(7%)

Lifetime of volume is same as that of PODenclosing it
Volumes outlives any containers that run with the POD
To use a volume, a Pod specifies what volumes to provide for the Pod (the .spec.volumes field) and where to mount those into Containers (the .spec.containers.volumeMounts field).
Volumes can not mount onto other volumes or have hard links to other volumes

Types of Volumens

emptyDir
- volume is first created when a Pod is assigned to a Node, and exists as long as that Pod is running on that node.
- When a POD [not when container crashes] is removed from a node for any reason, the data in the emptyDir is deleted forever.
- Some uses for an emptyDir are:
  - scratch space, such as for a disk-based merge sort
  - checkpointing a long computation for recovery from crashes
  - holding files that a content-manager Container fetches while a webserver Container serves the data
- emptyDir.medium field in .spec.containers.volumes can be set to "Memory" to tell Kubernetes to mount a tmpfs (RAM-backed filesystem) for you instead, but files you write will count against your Container’s memory limit. Beware: unlike disks, tmpfs is cleared on node reboot
```
apiVersion: v1
kind: Pod
metadata:
  name: test-pd
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /cache
      name: cache-volume
  volumes:
  - name: cache-volume
    emptyDir: {}
```

azureDisk

....
    volumeMounts:
    - name: azure
      mountPath: /mnt/azure
volumes:
      - name: azure
        azureDisk:
          diskName: test.vhd
          diskURI: https://someaccount.blob.microsoft.net/vhds/test.vhd
  ...

azureFile
downwardAPI - is used to make downward API data available to applications. It mounts a directory and writes the requested data in plain text files.

PresistentVolume

Provisioned storage in the cluster
Cluster resource
Volume plugins has independent lifecycle from pods
Volumes share the lifecycle of the pod, Persistent volument doesn't
API object(YAML) details the implementation

PresistentVolumeClaim

Request for Storage part of cluster resource
Pods consume node resources, PVCs consumes PV resources
Pods can request specific CPU and memory:PV's can request specific size and access mode.

Lifecycle is PV's are Provisioned and then bound to PVC's and PV's are reclaimed once PVC's are released.

Provisioning
- Static
  - Administrator creates PV's in the K8s API and made availble for consumption
- Dynamic
  - Used when one of the static PV's match the PVC and its strictly based on StorageClasses
  - PVC must request a created and configured storage class
  - Claims requesting for nameless classes disable Dynamic provisioning
To enable dynamic storage provisioning, DefaultStorageClass addmission controller on the API server must be enabled.

Secuirty

Secrets

Liveness and Readiness Probes

The kubelet uses liveness probes to know when to restart a Container. For example, liveness probes could catch a deadlock, where an application is running, but unable to make progress. Restarting a Container in such a state can help to make the application more available despite bugs.
The kubelet uses readiness probes to know when a Container is ready to start accepting traffic. A Pod is considered ready when all of its Containers are ready. One use of this signal is to control which Pods are used as backends for Services. When a Pod is not ready, it is removed from Service load balancers.

Troubleshooting

Ensure kubelet process is up and running on nodes
Ensure CNI plugin, kube-proxy pods are running across the cluster nodes
Ensure if any taintsand un-ignored Tolerationsare cause of missing core service PODS on nodes, if so fix by updating necessary objects to unblock it
Use kubectl describe node/pod <node-name/pod-name> to look at the events to understand more about the status
Look at log files under /var/log/containersfor core service PODS.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
README.md		README.md
exam-reference-topics.md		exam-reference-topics.md
kubectl-cheatsheet.md		kubectl-cheatsheet.md
node-bootstrap-logs.md		node-bootstrap-logs.md
openssl-cheatsheet.md		openssl-cheatsheet.md

deepak-kosaraju/k8s-admin-certification

Folders and files

Latest commit

History

Repository files navigation