Kubernetes — 三種Probe一次滿足

身為DevOps工程師
10 min readMay 6, 2022

--

通常在kubernetes(k8s)開啟application時,為了確認application有正常運作,會搭配使用k8s的探針(Probe)功能。

目前k8s總共支援了三種探針,分別為

  1. livenessProbe
  2. readinessProbe
  3. startupProbe
Photo by Greg Rosenke on Unsplash

以下針對這幾個探針做較深入的說明:

livenessProbe

定時檢查運行中的Pod是不是都有正常在工作。用來預防Pod長時間運行後,裡面出了問題卻沒被發現。通常用於自我檢查,不會去測試與其他服務的連線,例如: DB連線失敗。

以下提供一個例子,在Pod被執行起來後,會先延遲五秒鐘後才進行第一次的檢查,之後每三秒鐘會執行一次`livenessProbe`的測試。

滿足以下條件的Status Code會被判定為成功: >= 200 or < 400,在預設情況下,如果判定失敗三次,就會將Container砍掉。

建立檔案: $ vim livenessProbe.yaml

apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-http
spec:
containers:
- name: liveness
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: key
value: value
initialDelaySeconds: 5
periodSeconds: 3

在k8s建立這個Pod

$ kubectl apply -f livenessProbe.yaml

我們可以從 kubectl describe po liveness-http的Events中看到相關資訊

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23s default-scheduler Successfully assigned default/liveness-http to ip-172-31-29-13.us-east-2.compute.internal
Normal Pulled 22s kubelet Successfully pulled image "k8s.gcr.io/liveness" in 439.355834ms
Normal Pulling 5s (x2 over 22s) kubelet Pulling image "k8s.gcr.io/liveness"
Warning Unhealthy 5s (x3 over 11s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Normal Killing 5s kubelet Container liveness failed liveness probe, will be restarted
Normal Created 4s (x2 over 21s) kubelet Created container liveness
Normal Started 4s (x2 over 21s) kubelet Started container liveness
Normal Pulled 4s kubelet Successfully pulled image "k8s.gcr.io/liveness" in 433.622205ms

readinessProbe

在部分情境中,應用程式會有一段無法接收流量的階段。像是還在Load一個大型的data, 其他的services無法正常連線。在這種情況下,使用readinessProbe就可以將該Pod設定為Not Ready,停止繼續傳送流量過去。

建立檔案: $ vim readinessProbe.yaml

apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: readiness-http
spec:
containers:
- name: readiness-ctn
image: nginx
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5

在k8s建立這個Pod

$ kubectl apply -f readinessProbe.yaml

確認結果

$ kubectl describe po 
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 3m default-scheduler Successfully assigned default/readiness-http to ip-172-31-29-13.us-east-2.compute.internal
Normal Pulling 2m59s kubelet Pulling image "nginx"
Normal Pulled 2m58s kubelet Successfully pulled image "nginx" in 419.628325ms
Normal Created 2m58s kubelet Created container readiness-ctn
Normal Started 2m58s kubelet Started container readiness-ctn
Warning Unhealthy 73s (x21 over 2m53s) kubelet Readiness probe failed: cat: /tmp/healthy: No such file or directory
$ kubectl get po
NAME READY STATUS RESTARTS AGE
readiness-http 0/1 Running 0 3m9s

由於我們的YAML中,定義判斷的 /tmp/healthy並不存在,所以執行 get po會看到沒有Container是Ready的。將Probe中執行的Command改成已經存在的檔案試試。

修改檔案: $ vim readinessProbe.yaml

apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: readiness-http
spec:
containers:
- name: readiness-ctn
image: nginx
readinessProbe:
exec:
command:
- cat
- /etc/hosts
initialDelaySeconds: 5
periodSeconds: 5

重新建立

$ kubectl delete po readiness-http
$ kubectl apply -f readiness.yaml

這時候就可以看到Container成功啟動了

$ kubectl describe po 
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m15s default-scheduler Successfully assigned default/readiness-http to ip-172-31-29-13.us-east-2.compute.internal
Normal Pulling 2m14s kubelet Pulling image "nginx"
Normal Pulled 2m14s kubelet Successfully pulled image "nginx" in 305.561941ms
Normal Created 2m14s kubelet Created container readiness-ctn
Normal Started 2m14s kubelet Started container readiness-ctn

$ kubectl get po
NAME READY STATUS RESTARTS AGE
readiness-http 1/1 Running 0 2m18s

startupProbe

預防Container啟動時間比較長的時候,造成livenessProbe確認的時候一直失敗。在使用startupProbe後,就必須要等到startupProbe成功後,才會開始執行livenessProbe。

我們拿之前的liveness.yaml來做個修改,在startupProbe的位置修改成 /healthx

建立檔案: $ vim startupProbe.yaml

apiVersion: v1
kind: Pod
metadata:
labels:
test: startup
name: startup-http
spec:
containers:
- name: startup
image: k8s.gcr.io/liveness
args:
- /server
livenessProbe:
httpGet:
path: /healthz
port: 8080
httpHeaders:
- name: key
value: value
initialDelaySeconds: 5
periodSeconds: 3
startupProbe:
httpGet:
path: /healthx
port: 8080
failureThreshold: 30
periodSeconds: 10

可以看到第一個行動的探針為Startup probe

$ kubectl describe po
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12s default-scheduler Successfully assigned default/liveness-http to ip-172-31-29-13.us-east-2.compute.internal
Normal Pulling 11s kubelet Pulling image "k8s.gcr.io/liveness"
Normal Pulled 11s kubelet Successfully pulled image "k8s.gcr.io/liveness" in 431.313727ms
Normal Created 11s kubelet Created container liveness
Normal Started 11s kubelet Started container liveness
Warning Unhealthy 3s kubelet Startup probe failed: HTTP probe failed with statuscode: 404

沒通過的狀態下Container一樣不會變成Ready

$ kubectl get po
NAME READY STATUS RESTARTS AGE
startup-http 0/1 Running 0 2m19s

Summary

三個探針中,StartupProbe會是第一個執行的探針。以前為了方便求快,都只設定好container image至多加上個resource limit就開始 apply了。但其實標準流程來說應該要再多設定 liveness 方便長期維運時,能夠自動判斷目前container是不是都處於正常運作的狀態。

--

--

身為DevOps工程師

目前在蓋亞資訊擔任DevOps Consultant。最近才從後端的世界轉成投向DevOps的懷抱,目前專注在Kubernetes, GitLab, DevSecOps的學習中。