当前位置：首页 > news >正文

k8s之POD资源限制和健康监测

news 2026/5/26 17:28:33

写在前面本文一起看下POD的资源限制配置和健康监测的相关内容。1资源限制如果是不对POD设置资源限制的话若任由其占用系统资源可能会造成非常严重的后果所以我们需要根据具体情况来设置资源限制如使用多少内存多少CPU多少IOPS等实现这些限制使用的是cgroup技术而配置这些限制使用到属性是resources。在看具体的例子之前需要先来看下cpu资源的表示方法cpu标识的最小单位是0.001m,这里的m是milli的意思1000m就代表使用一个cpu500m就代表0.5cpu具体配置cpu限制的话使用XmX这种后者是直接指定使用CPU的个数。如下yamlapiVersion: v1 kind: Pod metadata: name: ngx-pod-resources spec: containers: - image: nginx:alpine name: ngx resources: requests: cpu: 10m memory: 100Mi limits: cpu: 20m memory: 200Mi通过requests配置初始时需要0.01cpu100MB内存通过limits配置最多只能申请0.02cpu200MB内存如果超过最大限制则会被强制停止。配置了资源限制后k8s会根据每个Node的剩余资源情况来将POD分配到合适的Node上尽量的将Node的资源榨取干净。这个过程有些类似于俄罗斯方块每个方块是分配到某Node上的POD如下图如果我们申请的资源集群中所有的Node都无法满足会发生什么呢比如定义如下申请10个CPU的yamldongyunqimongodaddy:~/k8s$ cat appy-10-cpu.yml apiVersion: v1 kind: Pod metadata: name: ngx-pod-resources spec: tolerations: - key: node-role.kubernetes.io/master effect: NoSchedule operator: Exists containers: - image: nginx:alpine name: ngx resources: requests: cpu: 10 memory: 100Mi limits: cpu: 20 memory: 200Miapply后查看dongyunqimongodaddy:~/k8s$ kubectl get pod | egrep NAME|ngx-pod-resources NAME READY STATUS RESTARTS AGE ngx-pod-resources 0/1 Pending 0 52s可以看到其处于PENDING状态describe查看dongyunqimongodaddy:~/k8s$ kubectl describe pod ngx-pod-resources ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 22s default-scheduler 0/2 nodes are available: 2 Insufficient cpu.0/2 nodes are available: 2 Insufficient cpu.的意思是2个node中有0个可用2个都是因为没有足够的CPU资源导致不可用。2健康监测对于k8s来说应用就像是一个黑盒子只能知道其启动了运行中停止了但是到底是不是能够正常的对外提供服务k8s是不知道的因此应用就不能是一个黑盒子要是一个灰盒子必须能够让k8s获取应用的一些信息为此应用需要预留一些口子给k8s使用这个过程有点类似于核酸检测护士姐姐使用棉签从我们的口腔中蘸取唾液。这个k8s从预留的口子中获取信息的过程我们叫做是探针 probe或者叫做探测器但我觉得叫做探针则更加形象准确。当前k8s定义了3中探针如下startup:加载各种配置文件后启动成功如spring加载器配置文件后启动 liveness:startup之后进入的状态但此时可能因为一些初始化工作还没有完成还不能对外服务如加载MySQL中的数据到缓存中 ready:liveness之后的状态进入该状态则应用就可以正常的对外提供服务了可以看到k8s对于探针的定义还是比较细的参考下图三种探针在失败时k8s的处理方式也不尽相同如下startup失败重启容器不会启动后续的探针检测 liveness失败重启容器不会进行后的探针检测 ready失败认为POD无法对外提供服务会将其排除在集群之外如下图那么该如何使用探针呢3中探针的定义都是一样的主要的属性如下periodSeconds执行探测动作的时间间隔默认是 10 秒探测一次。 timeoutSeconds探测动作的超时时间如果超时就认为探测失败默认是 1 秒。 successThreshold连续几次探测成功才认为是正常对于 startupProbe 和 livenessProbe 来说它只能是 1。 failureThreshold连续探测失败几次才认为是真正发生了异常默认是 3 次。具体探测的方式有shelltcp socket,http GET 三种如下exec执行一个 Linux 命令比如 ps、cat 等等和 container 的 command 字段很类似。 tcpSocket使用 TCP 协议尝试连接容器的指定端口。 httpGet连接端口并发送 HTTP GET 请求。下面我们以Nginx为例来看具体看下为了便于试验先来定义ConfigMap如下:dongyunqimongodaddy:~/k8s$ cat probe-cm.yml apiVersion: v1 kind: ConfigMap metadata: name: ngx-conf data: default.conf: | server { listen 80; location /ready { return 200 I am ready; } }然后我们定义如下pod分别使用三种探针来进行探针检测,yaml如下dongyunqimongodaddy:~/k8s$ cat probe-pod.yml apiVersion: v1 kind: Pod metadata: name: ngx-pod-probe spec: volumes: - name: ngx-conf-vol configMap: name: ngx-conf containers: - image: nginx:alpine name: ngx ports: - containerPort: 80 volumeMounts: - mountPath: /etc/nginx/conf.d name: ngx-conf-vol startupProbe: periodSeconds: 1 exec: command: [cat, /var/run/nginx.pid] livenessProbe: periodSeconds: 10 tcpSocket: port: 80 readinessProbe: periodSeconds: 5 httpGet: path: /ready port: 80startup probe使用了shell探针检测方式liveness probe使用了tcp socket探针检测方式ready probe 使用httpGet检测方式apply后查看如下dongyunqimongodaddy:~/k8s$ kubectl get pod NAME READY STATUS RESTARTS AGE ngx-pod-probe 1/1 Running 0 18skubectl logs查看日志可以看到定期的调用/ready接口如下dongyunqimongodaddy:~/k8s$ kubectl logs ngx-pod-probe /docker-entrypoint.sh: /docker-entrypoint.d/ is not empty, will attempt to perform configuration /docker-entrypoint.sh: Looking for shell scripts in /docker-entrypoint.d/ /docker-entrypoint.sh: Launching /docker-entrypoint.d/10-listen-on-ipv6-by-default.sh 10-listen-on-ipv6-by-default.sh: info: can not modify /etc/nginx/conf.d/default.conf (read-only file system?) /docker-entrypoint.sh: Launching /docker-entrypoint.d/20-envsubst-on-templates.sh /docker-entrypoint.sh: Launching /docker-entrypoint.d/30-tune-worker-processes.sh /docker-entrypoint.sh: Configuration complete; ready for start up 2023/01/29 07:54:33 [notice] 1#1: using the epoll event method 2023/01/29 07:54:33 [notice] 1#1: nginx/1.21.5 2023/01/29 07:54:33 [notice] 1#1: built by gcc 10.3.1 20211027 (Alpine 10.3.1_git20211027) 2023/01/29 07:54:33 [notice] 1#1: OS: Linux 5.15.0-58-generic 2023/01/29 07:54:33 [notice] 1#1: getrlimit(RLIMIT_NOFILE): 1048576:1048576 2023/01/29 07:54:33 [notice] 1#1: start worker processes 2023/01/29 07:54:33 [notice] 1#1: start worker process 23 10.10.4.1 - - [29/Jan/2023:07:54:36 0000] GET /ready HTTP/1.1 200 10 - kube-probe/1.23 - 10.10.4.1 - - [29/Jan/2023:07:54:37 0000] GET /ready HTTP/1.1 200 10 - kube-probe/1.23 - ...接下来我们修改startupProbe让其获取一个不可能存在的文件即让startup probe失败看下POD会是什么状态修改后如下apiVersion: v1 kind: Pod metadata: name: ngx-pod-probe spec: volumes: - name: ngx-conf-vol configMap: name: ngx-conf containers: - image: nginx:alpine name: ngx ports: - containerPort: 80 volumeMounts: - mountPath: /etc/nginx/conf.d name: ngx-conf-vol startupProbe: periodSeconds: 1 exec: command: [cat, /xxxx/run/nginx.pid] livenessProbe: periodSeconds: 10 tcpSocket: port: 80 readinessProbe: periodSeconds: 5 httpGet: path: /ready port: 80apply后查看如下dongyunqimongodaddy:~/k8s$ kubectl get pod NAME READY STATUS RESTARTS AGE ngx-pod-probe 0/1 Running 2 (3s ago) 13s dongyunqimongodaddy:~/k8s$ kubectl get pod NAME READY STATUS RESTARTS AGE ngx-pod-probe 0/1 CrashLoopBackOff 2 (8s ago) 22s dongyunqimongodaddy:~/k8s$ kubectl get pod NAME READY STATUS RESTARTS AGE ngx-pod-probe 0/1 CrashLoopBackOff 2 (12s ago) 26s可以看到POD在不断的重启最后变为CrashLoopBackOff状态我们可以通过kuectl describe来查看具体原因如下dongyunqimongodaddy:~/k8s$ kubectl describe pod ngx-pod-probe Name: ngx-pod-probe Namespace: default ... Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m33s default-scheduler Successfully assigned default/ngx-pod-probe to mongomongo Normal Pulled 4m23s (x3 over 4m32s) kubelet Container image nginx:alpine already present on machine Normal Created 4m23s (x3 over 4m32s) kubelet Created container ngx Normal Started 4m23s (x3 over 4m32s) kubelet Started container ngx Warning Unhealthy 4m19s (x9 over 4m30s) kubelet Startup probe failed: cat: cant open /xxxx/run/nginx.pid: No such file or directory Normal Killing 4m19s (x3 over 4m27s) kubelet Container ngx failed startup probe, will be restarted Warning BackOff 4m13s (x4 over 4m19s) kubelet Back-off restarting failed container从Startup probe failed: cat: cant open /xxxx/run/nginx.pid: No such file or directory可以看出来确实是因为startup probe检测失败导致的。写在后面小结本文一起看了对POD的资源限制配置以及startuplivenessready三种探针及其配置并给出了一个具体的实例。希望本文能够帮助到你。参考文章列表

查看全文

http://www.rkmt.cn/news/1394089.html