Kubernetes

tags : Infrastructure

FAQ

Resources

Liveness vs Readiness / Healthchecks

Liveness probes are used to tell kubernetes to restart a container. If the liveness probe fails, the application will restart. This can be used to catch issues such as a deadlock and make your application more available. My colleagues at Cloudflare have written about how we use this to restart “stuck” kafka consumers here.
Readiness probes are only used for http based applications and are used to signal that a container is ready to start receiving traffic. A pod is considered ready to receive traffic when all containers are ready. If any container in a pod fails its readiness probe, it is removed from the service load balancer and will not receive any HTTP requests. Failing a readiness probe does not cause your pod to restart like failing a liveness probe does.
From Distributed Systems Horror Stories: Kubernetes Deep Health Checks – Encore Blog
lorentz app

Setting it up from scratch

Much like the Linux kernel needs to be combined with additional software to make a complete operating system
Kubernetes is only an orchestrator and needs to be combined with additional software to make a complete cluster.
- CRI: It needs a container engine to run containers (See OCI Ecosystem, Containers)
- CNI: It needs network plugins to do networking
- PersistentVolume subsystem: It needs storage to be managed

K8s YAML management best practices?

IDK any but maybe Jsonnet can help
Also Kustomize ?

Package management?

HELM uses charts. It’s “the” package manager for k8s
Helm charts are comparable to Ansible roles.

What’s KNative?

Just another abstraction over Kubernetes objects with some unique features.
Its autoscaling is more advanced than native HorizontalPodAutoscaler (HPA) could provide. HPA works with deployments where at least 1 pod exists, because it checks pod metrics - no pod, no metrics. KNative on the other side is focused on HTTP services and could scale them to zero.
Tools like knative are largely solving organizational problems, not technical ones. (easier k8s to say)
Google Cloud Run is essentially a managed version of Knative.

Components

Node can run pods, pods can run containers, often discouraged to run more than one container in a pod unless its a sidecar.

Pods

Pod defines nothing about when/how/where to run, so k8s will not know if it needs to restart it etc. For these we need Deployment object. (A persistent object)

DaemonSet

Specifies stuff that is supposed to run on each node as a pod by default.
It’s started on node creation and stopped on node deletion.
It runs what it specifies as a pod inside the node
Similar to an init(think systemd) process but for the node

Networking

Overlay network

Our cluster(nodes, pods) is one big flat IP network, no NAT etc in between, namespaces are not a security feature.
By default everything can reach everything, we can add network policies to address this.
Responsibilities
- Make sure your pods can send network requests outside your cluster
- Keep a stable mapping of nodes to subnets and keep every node in your cluster updated with that mapping. Do the right thing when nodes are added & removed.

Pod networking

Inside Pod

containers running in the same pod share the same network namespace, i.e can talk over loopback. i.e sidecar can use loopback to talk to main container.

Pod to Pod in same Node

Different pods can talk to each other without using NAT
The IP that a Pod sees itself as is the same IP that others see it as.
This is essentially communicating across network namespaces which can be done via veth pair. See Containers.

Pod to Pod in different nodes

Every Node in the cluster is assigned a CIDR block, IPs for a node can be used by pods running on that Node.
Each Node knows how to deliver packets to Pods that are running within it.
How do we know which node to send the traffic to?
- This is network specific, basically needs a router.
- This is now basically node-node networking
- All Nodes can communicate with all Pods without NAT. Just routing.
- This is where CNI comes in.
- AWS Example
  - In AWS, a Kubernetes cluster runs within a VPC
  - AWS has a CNI plugin that manages this based on AWS VPC,IAM, Security Groups, ENI etc.
  - ENIs are accessible to other ENIs across the VPC.
  - Each Node (EC2 instance) creates ENIs and allocates IPs to them creatig a pool. (Forming CIDR block)
  - CNI plugins usually setup a Daemonset (per node) which picks IP from the pool and provide adds newly created pods to its network

Egress and Ingress

Egress (out of k8s)

pod tries to connect to the internet using a pod ip / cluster ip (when using service) as source ip
Problem: VPC gateway in cloud providers are not container aware. They are aware of the node (ec2) ip address but have no idea about pod ip or cluster ip. So NATing this is weird.
Solution: Using iptables
- SNAT w mangle the pod ip to node ip before packet leaves the node
- Another SNAT of VPC VM(node) IP to public IP by the cloud-provider gateway

Ingress (in to k8s)

This is more involved than egress and requires a dedicated k8s controller.
It needs
- L3: A Service w a type LoadBalancer (LoadBalancer is provided by cloud provider)
- L7: An Ingress controller
  - Uses http, built on top of Service
  - Needs service w NodePort
  - Maps HTTP requests to Kubernetes Services

Persistent network IPs & Load balancing

Services are stored in etcd as k8s REST object, controllers can watch for changes via the api-server.
Service is an abstraction to help you expose groups of Pods over a network. Because Pod IP assigned by CNI are ephemeral.
service (L4) assigns a single stable virtual IP for group of pods (L3).
This virtual IP is also known as cluster IP
You can do L2 networks in K8s but many cloud providers don’t allow it.
Since Services itself is more of L4 concept, L4 load balancer are better fit.
When declaring a Service, you can specify whether to use iptables or IPVS.
The Load balancing we talk about here is not the same as a service with the LoadBalancer type. That one is provided by the cloud-provider and not container aware.

Load Balancing w iptables (L3)

Any traffic coming to the cluster IP is load balanced by k8s
It uses netfilter(backend to iptables) for doing the load balancing
- the iptables rules are configured by kube-proxy controller that watches api-server for changes.
- When cluster IP changes, kube-proxy goes ahead and updates the iptables rules. So basically DNAT rules.

Load Balancing w IPVS (L4)

IPVS (IP Virtual Server) built on top of netfilter
Implements transport-layer(L4)(TCP/UDP) load balancing as part of the Linux kernel.

DNS in K8s

Optionally we can do dns.
Can be used to DNS: cluster IP, IP of a pod w/o a service
Involves configuring to kubelets on each node to do the falicitate this
K8s DNS pod itself runs behind as a service in the cluster w a static cluster IP
- SRV record can be used to specify port within a server (See DNS)
- DNS pod consists of following containers
  - kubedns : Watches apt-server for changes. This is replaced by coredns
  - dnsmasq : dns cache (dnsmasq is both a dns cache and a dhcp server btw)
  - sidecar : healthcheck

Operator Pattern

CRDs + Controllers
Thinking of kubernetes as just a collection of operators where some of them are maintained by the project (“in tree”) and some are maintained by others (“out of tree”) is an interesting thought experiment

Practical notes

Signal handling is important because say if your program does not handle signals properly then the container will kill the program based on the timeout you set. If the timeout is set of 1min and 100 machines need rolling update. It’ll take 100mins if done sequentially, if signals were handled properly container will shutdown as soon as it gets the signal to shut after doing any cleanup assigned.
Blue/Green Deployment, you have two production environments, as identical as possible, as you prepare a new release of your software you do your final stage of testing in the green environment. Once the software is working in the green environment, you switch the router so that all incoming requests go to the green environment - the blue one is now idle.
Out of many ways, two ways to be highly available: make individual components HA, another idea is to do something in a bigger scale, eg. switch entire VM if something is not working in one vm.

TODO Usage Snippets

Move these to cheats later

# show all the spec of the running k8s version for Node
$ kubectl explain node.spec
# show all the spec of the running k8s version for Node
# helpful when you can't lookup the online doc.
$ kubectl explain node --recursive
# getting pods
$ kubectl get pods --all-namespaces
$ kubectl get pods --namespace=<name>
# execing into a pod
$ kubectl exec -it <pod-name> -- /bin/bash

🐏 mogoz

Table of Contents

Kubernetes

FAQ

Resources

Liveness vs Readiness / Healthchecks

Setting it up from scratch

K8s YAML management best practices?

Package management?

What’s KNative?

Components

Pods

DaemonSet

Networking

Overlay network

Pod networking

Inside Pod

Pod to Pod in same Node

Pod to Pod in different nodes

More on CNI

Egress and Ingress

Egress (out of k8s)

Ingress (in to k8s)

Persistent network IPs & Load balancing

Load Balancing w iptables (L3)

More on kube-proxy

Load Balancing w IPVS (L4)

DNS in K8s

Operator Pattern

Practical notes

TODO Usage Snippets

Links

Graph View

Backlinks

🐏 mogoz

Table of Contents

Kubernetes

FAQ §

Resources §

Liveness vs Readiness / Healthchecks §

Setting it up from scratch §

K8s YAML management best practices? §

Package management? §

What’s KNative? §

Components §

Pods §

DaemonSet §

Networking §

Overlay network §

Pod networking §

Inside Pod §

Pod to Pod in same Node §

Pod to Pod in different nodes §

More on CNI §

Egress and Ingress §

Egress (out of k8s) §

Ingress (in to k8s) §

Persistent network IPs & Load balancing §

Load Balancing w iptables (L3) §

More on kube-proxy §

Load Balancing w IPVS (L4) §

DNS in K8s §

Operator Pattern §

Practical notes §

TODO Usage Snippets §

Links §

Graph View

Backlinks

FAQ

Resources

Liveness vs Readiness / Healthchecks

Setting it up from scratch

K8s YAML management best practices?

Package management?

What’s KNative?

Components

Pods

DaemonSet

Networking

Overlay network

Pod networking

Inside Pod

Pod to Pod in same Node

Pod to Pod in different nodes

More on CNI

Egress and Ingress

Egress (out of k8s)

Ingress (in to k8s)

Persistent network IPs & Load balancing

Load Balancing w iptables (L3)

More on kube-proxy

Load Balancing w IPVS (L4)

DNS in K8s

Operator Pattern

Practical notes

TODO Usage Snippets

Links