Kubernetes — 三種Probe一次滿足

身為DevOps工程師

10 min readMay 6, 2022

通常在kubernetes(k8s)開啟application時，為了確認application有正常運作，會搭配使用k8s的探針(Probe)功能。

目前k8s總共支援了三種探針，分別為

livenessProbe
readinessProbe
startupProbe

以下針對這幾個探針做較深入的說明:

livenessProbe

定時檢查運行中的Pod是不是都有正常在工作。用來預防Pod長時間運行後，裡面出了問題卻沒被發現。通常用於自我檢查，不會去測試與其他服務的連線，例如: DB連線失敗。

以下提供一個例子，在Pod被執行起來後，會先延遲五秒鐘後才進行第一次的檢查，之後每三秒鐘會執行一次`livenessProbe`的測試。

滿足以下條件的Status Code會被判定為成功: >= 200 or < 400，在預設情況下，如果判定失敗三次，就會將Container砍掉。

建立檔案: $ vim livenessProbe.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  containers:
  - name: liveness
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: key
          value: value
      initialDelaySeconds: 5
      periodSeconds: 3

在k8s建立這個Pod

$ kubectl apply -f livenessProbe.yaml

我們可以從 kubectl describe po liveness-http的Events中看到相關資訊

Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  23s               default-scheduler  Successfully assigned default/liveness-http to ip-172-31-29-13.us-east-2.compute.internal
  Normal   Pulled     22s               kubelet            Successfully pulled image "k8s.gcr.io/liveness" in 439.355834ms
  Normal   Pulling    5s (x2 over 22s)  kubelet            Pulling image "k8s.gcr.io/liveness"
  Warning  Unhealthy  5s (x3 over 11s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    5s                kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Created    4s (x2 over 21s)  kubelet            Created container liveness
  Normal   Started    4s (x2 over 21s)  kubelet            Started container liveness
  Normal   Pulled     4s                kubelet            Successfully pulled image "k8s.gcr.io/liveness" in 433.622205ms

readinessProbe

在部分情境中，應用程式會有一段無法接收流量的階段。像是還在Load一個大型的data, 其他的services無法正常連線。在這種情況下，使用readinessProbe就可以將該Pod設定為Not Ready，停止繼續傳送流量過去。

建立檔案: $ vim readinessProbe.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness
  name: readiness-http
spec:
  containers:
  - name: readiness-ctn
    image: nginx
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5

在k8s建立這個Pod

$　kubectl apply -f readinessProbe.yaml

確認結果

$ kubectl describe po 
...
Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  3m                    default-scheduler  Successfully assigned default/readiness-http to ip-172-31-29-13.us-east-2.compute.internal
  Normal   Pulling    2m59s                 kubelet            Pulling image "nginx"
  Normal   Pulled     2m58s                 kubelet            Successfully pulled image "nginx" in 419.628325ms
  Normal   Created    2m58s                 kubelet            Created container readiness-ctn
  Normal   Started    2m58s                 kubelet            Started container readiness-ctn
  Warning  Unhealthy  73s (x21 over 2m53s)  kubelet            Readiness probe failed: cat: /tmp/healthy: No such file or directory$ kubectl get po
NAME             READY   STATUS    RESTARTS   AGE
readiness-http   0/1     Running   0          3m9s

由於我們的YAML中，定義判斷的 /tmp/healthy並不存在，所以執行 get po會看到沒有Container是Ready的。將Probe中執行的Command改成已經存在的檔案試試。

修改檔案: $ vim readinessProbe.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: readiness
  name: readiness-http
spec:
  containers:
  - name: readiness-ctn
    image: nginx
    readinessProbe:
      exec:
        command:
        - cat
        - /etc/hosts
      initialDelaySeconds: 5
      periodSeconds: 5

重新建立

$ kubectl delete po readiness-http
$ kubectl apply -f readiness.yaml

這時候就可以看到Container成功啟動了

$ kubectl describe po 
...
Events:
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  2m15s  default-scheduler  Successfully assigned default/readiness-http to ip-172-31-29-13.us-east-2.compute.internal
  Normal  Pulling    2m14s  kubelet            Pulling image "nginx"
  Normal  Pulled     2m14s  kubelet            Successfully pulled image "nginx" in 305.561941ms
  Normal  Created    2m14s  kubelet            Created container readiness-ctn
  Normal  Started    2m14s  kubelet            Started container readiness-ctn
  
$ kubectl get po
NAME             READY   STATUS    RESTARTS   AGE
readiness-http   1/1     Running   0          2m18s

startupProbe

預防Container啟動時間比較長的時候，造成livenessProbe確認的時候一直失敗。在使用startupProbe後，就必須要等到startupProbe成功後，才會開始執行livenessProbe。

我們拿之前的liveness.yaml來做個修改，在startupProbe的位置修改成 /healthx

建立檔案: $ vim startupProbe.yaml

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: startup
  name: startup-http
spec:
  containers:
  - name: startup
    image: k8s.gcr.io/liveness
    args:
    - /server
    livenessProbe:
      httpGet:
        path: /healthz
        port: 8080
        httpHeaders:
        - name: key
          value: value
      initialDelaySeconds: 5
      periodSeconds: 3
    startupProbe:
      httpGet:
        path: /healthx
        port: 8080
      failureThreshold: 30
      periodSeconds: 10

可以看到第一個行動的探針為Startup probe

$ kubectl describe po
...
Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  12s   default-scheduler  Successfully assigned default/liveness-http to ip-172-31-29-13.us-east-2.compute.internal
  Normal   Pulling    11s   kubelet            Pulling image "k8s.gcr.io/liveness"
  Normal   Pulled     11s   kubelet            Successfully pulled image "k8s.gcr.io/liveness" in 431.313727ms
  Normal   Created    11s   kubelet            Created container liveness
  Normal   Started    11s   kubelet            Started container liveness
  Warning  Unhealthy  3s    kubelet            Startup probe failed: HTTP probe failed with statuscode: 404

沒通過的狀態下Container一樣不會變成Ready

$ kubectl get po
NAME            READY   STATUS    RESTARTS   AGE
startup-http    0/1     Running   0          2m19s

Summary

三個探針中，StartupProbe會是第一個執行的探針。以前為了方便求快，都只設定好container image至多加上個resource limit就開始 apply了。但其實標準流程來說應該要再多設定 liveness 方便長期維運時，能夠自動判斷目前container是不是都處於正常運作的狀態。

Kubernetes — 三種Probe一次滿足

livenessProbe

readinessProbe

startupProbe

Summary

Written by 身為DevOps工程師