This guide explains how to decommission (take offline) specific Pods in Kafka clusters managed by KubeBlocks. Decommissioning provides precise control over cluster resources while maintaining availability. Use this for workload rebalancing, node maintenance, or addressing failures.
In traditional StatefulSet-based deployments, Kubernetes lacks the ability to decommission specific Pods. StatefulSets ensure the order and identity of Pods, and scaling down always removes the Pod with the highest ordinal number (e.g., scaling down from 3 replicas removes Pod-2
first). This limitation prevents precise control over which Pod to take offline, which can complicate maintenance, workload distribution, or failure handling.
KubeBlocks overcomes this limitation by enabling administrators to decommission specific Pods directly. This fine-grained control ensures high availability and allows better resource management without disrupting the entire cluster.
Before proceeding, ensure the following:
kubectl create ns demo
namespace/demo created
KubeBlocks uses a declarative approach for managing Kafka Clusters. Below is an example configuration for deploying a Kafka Cluster with 3 components
Apply the following YAML configuration to deploy the cluster:
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
name: kafka-separated-cluster
namespace: demo
spec:
terminationPolicy: Delete
clusterDef: kafka
topology: separated_monitor
componentSpecs:
- name: kafka-broker
replicas: 1
resources:
limits:
cpu: "0.5"
memory: "0.5Gi"
requests:
cpu: "0.5"
memory: "0.5Gi"
env:
- name: KB_KAFKA_BROKER_HEAP
value: "-XshowSettings:vm -XX:MaxRAMPercentage=100 -Ddepth=64"
- name: KB_KAFKA_CONTROLLER_HEAP
value: "-XshowSettings:vm -XX:MaxRAMPercentage=100 -Ddepth=64"
- name: KB_BROKER_DIRECT_POD_ACCESS
value: "true"
volumeClaimTemplates:
- name: data
spec:
storageClassName: ""
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
- name: metadata
spec:
storageClassName: ""
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- name: kafka-controller
replicas: 1
resources:
limits:
cpu: "0.5"
memory: "0.5Gi"
requests:
cpu: "0.5"
memory: "0.5Gi"
volumeClaimTemplates:
- name: metadata
spec:
storageClassName: ""
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
- name: kafka-exporter
replicas: 1
resources:
limits:
cpu: "0.5"
memory: "1Gi"
requests:
cpu: "0.1"
memory: "0.2Gi"
These three components will be created strictly in controller->broker->exporter
order as defined in ClusterDefinition
.
Monitor the cluster status until it transitions to the Running state:
kubectl get cluster kafka-separated-cluster -n demo -w
Expected Output:
kubectl get cluster kafka-separated-cluster -n demo
NAME CLUSTER-DEFINITION TERMINATION-POLICY STATUS AGE
kafka-separated-cluster kafka Delete Creating 13s
kafka-separated-cluster kafka Delete Running 63s
Check the pod status and roles:
kubectl get pods -l app.kubernetes.io/instance=kafka-separated-cluster -n demo
Expected Output:
NAME READY STATUS RESTARTS AGE
kafka-separated-cluster-kafka-broker-0 2/2 Running 0 13m
kafka-separated-cluster-kafka-controller-0 2/2 Running 0 13m
kafka-separated-cluster-kafka-exporter-0 1/1 Running 0 12m
Once the cluster status becomes Running, your Kafka cluster is ready for use.
If you are creating the cluster for the very first time, it may take some time to pull images before running.
Expected Workflow:
onlineInstancesToOffline
is removedUpdating
to Running
Before decommissioning a specific pod from a component, make sure this component has more than one replicas. If not, please scale out the component ahead.
E.g. you can patch the cluster CR with command, to declare there are 3 replicas in component querynode.
kubectl patch cluster kafka-separated-cluster -n demo --type='json' -p='[
{
"op": "replace",
"path": "/spec/componentSpecs/1/replicas",
"value": 3
}
]'
Wait till all pods are running
kubectl get pods -n demo -l app.kubernetes.io/instance=kafka-separated-cluster,apps.kubeblocks.io/component-name=kafka-broker
Expected Output:
NAME READY STATUS RESTARTS AGE
kafka-separated-cluster-kafka-broker-0 2/2 Running 0 18m
kafka-separated-cluster-kafka-broker-1 2/2 Running 0 3m33m
kafka-separated-cluster-kafka-broker-2 2/2 Running 0 2m1s
To decommission a specific Pod (e.g., 'kafka-separated-cluster-kafka-broker-1'), you can use one of the following methods:
Option 1: Using OpsRequest
Create an OpsRequest to mark the Pod as offline:
apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: kafka-separated-cluster-decommission-ops
namespace: demo
spec:
clusterName: kafka-separated-cluster
type: HorizontalScaling
horizontalScaling:
- componentName: kafka-broker
scaleIn:
onlineInstancesToOffline:
- 'kafka-separated-cluster-kafka-broker-1' # Specifies the instance names that need to be taken offline
Check the progress of the decommissioning operation:
kubectl get ops kafka-separated-cluster-decommission-ops -n demo -w
Example Output:
NAME TYPE CLUSTER STATUS PROGRESS AGE
kafka-separated-cluster-decommission-ops HorizontalScaling kafka-separated-cluster Running 0/1 8s
kafka-separated-cluster-decommission-ops HorizontalScaling kafka-separated-cluster Running 1/1 31s
kafka-separated-cluster-decommission-ops HorizontalScaling kafka-separated-cluster Succeed 1/1 31s
Option 2: Using Cluster API
Alternatively, update the Cluster resource directly to decommission the Pod:
apiVersion: apps.kubeblocks.io/v1
kind: Cluster
spec:
componentSpecs:
- name: kafka-broker
replicas: 2 # explected replicas after decommission
offlineInstances:
- kafka-separated-cluster-kafka-broker-1 # <----- Specify Pod to be decommissioned
...
After applying the updated configuration, verify the remaining Pods in the cluster:
kubectl get pods -n demo -l app.kubernetes.io/instance=kafka-separated-cluster,apps.kubeblocks.io/component-name=kafka-broker
Example Output:
NAME READY STATUS RESTARTS AGE
kafka-separated-cluster-kafka-broker-0 2/2 Running 0 24m
kafka-separated-cluster-kafka-broker-2 2/2 Running 0 2m1s
Key takeaways:
This provides granular cluster management while maintaining availability.