Kafka Cluster Lifecycle Management

This guide demonstrates how to manage a Kafka Cluster's operational state in KubeBlocks, including:

Stopping the cluster to conserve resources
Starting a stopped cluster
Restarting cluster components

These operations help optimize resource usage and reduce operational costs in Kubernetes environments.

Lifecycle management operations in KubeBlocks:

Operation	Effect	Use Case
Stop	Suspends cluster, retains storage	Cost savings, maintenance
Start	Resumes cluster operation	Restore service after pause
Restart	Recreates pods for component	Configuration changes, troubleshooting

Prerequisites

Before proceeding, ensure the following:

Environment Setup:
- A Kubernetes cluster is up and running.
- The kubectl CLI tool is configured to communicate with your cluster.
- KubeBlocks CLI and KubeBlocks Operator are installed. Follow the installation instructions here.
Namespace Preparation: To keep resources isolated, create a dedicated namespace for this tutorial:

kubectl create ns demo
namespace/demo created

Deploy a Kafka Cluster

KubeBlocks uses a declarative approach for managing Kafka Clusters. Below is an example configuration for deploying a Kafka Cluster with 3 components

Apply the following YAML configuration to deploy the cluster:


apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: kafka-separated-cluster
  namespace: demo
spec:
  terminationPolicy: Delete
  clusterDef: kafka
  topology: separated_monitor
  componentSpecs:
    - name: kafka-broker
      serviceVersion: 3.3.2
      services:
        - name: advertised-listener
          serviceType: ClusterIP # Valid options are: [ClusterIP, NodePort, LoadBalancer]
          podService: true
      replicas: 1
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      env:
        - name: KB_KAFKA_BROKER_HEAP
          value: "-XshowSettings:vm -XX:MaxRAMPercentage=100 -Ddepth=64"
        - name: KB_KAFKA_CONTROLLER_HEAP
          value: "-XshowSettings:vm -XX:MaxRAMPercentage=100 -Ddepth=64"
          # Whether to enable direct Pod IP address access mode.
          # - If set to 'true', Kafka clients will connect to Brokers using the Pod IP address directly.
          # - If set to 'false', Kafka clients will connect to Brokers using the Headless Service's FQDN, and service `advertised-listener` must be set with "podService: true".
        - name: KB_BROKER_DIRECT_POD_ACCESS
          value: "false"
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
        - name: metadata
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
    - name: kafka-controller
      serviceVersion: 3.3.2
      replicas: 1
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: metadata
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
    - name: kafka-exporter
      serviceVersion: 1.6.0
      replicas: 1
      resources:
        limits:
          cpu: "0.5"
          memory: "1Gi"
        requests:
          cpu: "0.1"
          memory: "0.2Gi"

NOTE

These three components will be created strictly in controller->broker->exporter order as defined in ClusterDefinition.

Verifying the Deployment

Monitor the cluster status until it transitions to the Running state:

kubectl get cluster kafka-separated-cluster -n demo -w

Expected Output:

kubectl get cluster kafka-separated-cluster -n demo
NAME                      CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
kafka-separated-cluster   kafka                Delete               Creating   13s
kafka-separated-cluster   kafka                Delete               Running    63s

Check the pod status and roles:

kubectl get pods -l app.kubernetes.io/instance=kafka-separated-cluster -n demo

Expected Output:

NAME                                         READY   STATUS    RESTARTS   AGE
kafka-separated-cluster-kafka-broker-0       2/2     Running   0          13m
kafka-separated-cluster-kafka-controller-0   2/2     Running   0          13m
kafka-separated-cluster-kafka-exporter-0     1/1     Running   0          12m

Once the cluster status becomes Running, your Kafka cluster is ready for use.

TIP

If you are creating the cluster for the very first time, it may take some time to pull images before running.

Cluster Lifecycle Operations

Stopping the Cluster

Stopping a Kafka Cluster in KubeBlocks will:

Terminates all running pods
Retains persistent storage (PVCs)
Maintains cluster configuration

This operation is ideal for:

Temporary cost savings
Maintenance windows
Development environment pauses

Option 1: OpsRequest API

Create a Stop operation request:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: kafka-separated-cluster-stop-ops
  namespace: demo
spec:
  clusterName: kafka-separated-cluster
  type: Stop

Option 2: Cluster API Patch

Modify the cluster spec directly by patching the stop field:


kubectl patch cluster kafka-separated-cluster -n demo --type='json' -p='[
{
  "op": "add",
  "path": "/spec/componentSpecs/0/stop",
  "value": true
},
{
  "op": "add",
  "path": "/spec/componentSpecs/1/stop",
  "value": true
},
{
  "op": "add",
  "path": "/spec/componentSpecs/2/stop",
  "value": true
}
]'

Verifying Cluster Stop

To confirm a successful stop operation:

Check cluster status transition:

kubectl get cluster kafka-separated-cluster -n demo -w

Example Output:

NAME                      CLUSTER-DEFINITION    TERMINATION-POLICY   STATUS     AGE
kafka-separated-cluster   kafka                 Delete               Stopping   16m3s
kafka-separated-cluster   kafka                 Delete               Stopped    16m55s

Verify no running pods:

kubectl get pods -l app.kubernetes.io/instance=kafka-separated-cluster -n demo

Example Output:


No resources found in demo namespace.

Confirm persistent volumes remain:

kubectl get pvc -l app.kubernetes.io/instance=kafka-separated-cluster -n demo

Example Output:

NAME                                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
data-kafka-separated-cluster-kafka-broker-0           Bound    pvc-ddd54e0f-414a-49ed-8e17-41e9f5082af1   20Gi       RWO            standard       <unset>                 14m
metadata-kafka-separated-cluster-kafka-broker-0       Bound    pvc-d63b7d80-cac5-41b9-b694-6a298921003b   1Gi        RWO            standard       <unset>                 14m
metadata-kafka-separated-cluster-kafka-controller-0   Bound    pvc-e6263eb1-405a-4090-b2bb-f92cca0ba36d   1Gi        RWO            standard       <unset>                 14m

Starting the Cluster

Starting a stopped Kafka Cluster:

Recreates all pods
Reattaches persistent storage
Restores service endpoints

Expected behavior:

Cluster returns to previous state
No data loss occurs
Services resume automatically

Initiate a Start operation request:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: kafka-separated-cluster-start-ops
  namespace: demo
spec:
  # Specifies the name of the Cluster resource that this operation is targeting.
  clusterName: kafka-separated-cluster
  type: Start

Modify the cluster spec to resume operation:

Set stop: false, or
Remove the stop field entirely


kubectl patch cluster kafka-separated-cluster -n demo --type='json' -p='[
{
  "op": "remove",
  "path": "/spec/componentSpecs/0/stop"
},
{
  "op": "remove",
  "path": "/spec/componentSpecs/1/stop"
},
{
  "op": "remove",
  "path": "/spec/componentSpecs/2/stop"
}
]'

Verifying Cluster Start

To confirm a successful start operation:

Check cluster status transition:

kubectl get cluster kafka-separated-cluster -n demo -w

Example Output:

NAME                      CLUSTER-DEFINITION     TERMINATION-POLICY   STATUS     AGE
kafka-separated-cluster   kafka                  Delete               Updating   24m
kafka-separated-cluster   kafka                  Delete               Running    24m
kafka-separated-cluster   kafka                  Delete               Running    24m

Verify pod recreation:

kubectl get pods -n demo -l app.kubernetes.io/instance=kafka-separated-cluster

Example Output:

NAME                                         READY   STATUS    RESTARTS   AGE
kafka-separated-cluster-kafka-broker-0       2/2     Running   0          2m4s
kafka-separated-cluster-kafka-controller-0   2/2     Running   0          104s
kafka-separated-cluster-kafka-exporter-0     1/1     Running   0          84s

Restarting Cluster

Restart operations provide:

Pod recreation without full cluster stop
Component-level granularity
Minimal service disruption

Use cases:

Configuration changes requiring restart
Resource refresh
Troubleshooting

Check Components

There are five components in Milvus Cluster. To get the list of components,


kubectl get cluster -n demo kafka-separated-cluster -oyaml | yq '.spec.componentSpecs[].name'

Expected Output:

kafka-controller
kafka-broker
kafka-exporter

Restart Proxy via OpsRequest API

List specific components to be restarted:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: kafka-separated-cluster-restart-ops
  namespace: demo
spec:
  clusterName: kafka-separated-cluster
  type: Restart
  restart:
  - componentName: kafka-broker

Verifying Restart Completion

To verify a successful component restart:

Track OpsRequest progress:

kubectl get opsrequest kafka-separated-cluster-restart-ops -n demo -w

Example Output:

NAME                                  TYPE      CLUSTER                   STATUS    PROGRESS   AGE
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Running   0/1        8s
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Running   1/1        22s
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Running   1/1        23s
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Succeed   1/1        23s

Check pod status:
```
kubectl get pods -n demo -l app.kubernetes.io/instance=kafka-separated-cluster
```
Note: Pods will show new creation timestamps after restart. Only pods belongs to component kafka-broker have been restarted.

Once the operation is complete, the cluster will return to the Running state.

Summary

In this guide, you learned how to:

Stop a Kafka Cluster to suspend operations while retaining persistent storage.
Start a stopped cluster to bring it back online.
Restart specific cluster components to recreate their Pods without stopping the entire cluster.

By managing the lifecycle of your Kafka Cluster, you can optimize resource utilization, reduce costs, and maintain flexibility in your Kubernetes environment. KubeBlocks provides a seamless way to perform these operations, ensuring high availability and minimal disruption.

Kafka Cluster Lifecycle Management

This guide demonstrates how to manage a Kafka Cluster's operational state in KubeBlocks, including:

Stopping the cluster to conserve resources
Starting a stopped cluster
Restarting cluster components

These operations help optimize resource usage and reduce operational costs in Kubernetes environments.

Lifecycle management operations in KubeBlocks:

Operation	Effect	Use Case
Stop	Suspends cluster, retains storage	Cost savings, maintenance
Start	Resumes cluster operation	Restore service after pause
Restart	Recreates pods for component	Configuration changes, troubleshooting

Prerequisites

Before proceeding, ensure the following:

Environment Setup:
- A Kubernetes cluster is up and running.
- The kubectl CLI tool is configured to communicate with your cluster.
- KubeBlocks CLI and KubeBlocks Operator are installed. Follow the installation instructions here.
Namespace Preparation: To keep resources isolated, create a dedicated namespace for this tutorial:

kubectl create ns demo
namespace/demo created

Deploy a Kafka Cluster

KubeBlocks uses a declarative approach for managing Kafka Clusters. Below is an example configuration for deploying a Kafka Cluster with 3 components

Apply the following YAML configuration to deploy the cluster:


apiVersion: apps.kubeblocks.io/v1
kind: Cluster
metadata:
  name: kafka-separated-cluster
  namespace: demo
spec:
  terminationPolicy: Delete
  clusterDef: kafka
  topology: separated_monitor
  componentSpecs:
    - name: kafka-broker
      serviceVersion: 3.3.2
      services:
        - name: advertised-listener
          serviceType: ClusterIP # Valid options are: [ClusterIP, NodePort, LoadBalancer]
          podService: true
      replicas: 1
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      env:
        - name: KB_KAFKA_BROKER_HEAP
          value: "-XshowSettings:vm -XX:MaxRAMPercentage=100 -Ddepth=64"
        - name: KB_KAFKA_CONTROLLER_HEAP
          value: "-XshowSettings:vm -XX:MaxRAMPercentage=100 -Ddepth=64"
          # Whether to enable direct Pod IP address access mode.
          # - If set to 'true', Kafka clients will connect to Brokers using the Pod IP address directly.
          # - If set to 'false', Kafka clients will connect to Brokers using the Headless Service's FQDN, and service `advertised-listener` must be set with "podService: true".
        - name: KB_BROKER_DIRECT_POD_ACCESS
          value: "false"
      volumeClaimTemplates:
        - name: data
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 20Gi
        - name: metadata
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
    - name: kafka-controller
      serviceVersion: 3.3.2
      replicas: 1
      resources:
        limits:
          cpu: "0.5"
          memory: "0.5Gi"
        requests:
          cpu: "0.5"
          memory: "0.5Gi"
      volumeClaimTemplates:
        - name: metadata
          spec:
            storageClassName: ""
            accessModes:
              - ReadWriteOnce
            resources:
              requests:
                storage: 1Gi
    - name: kafka-exporter
      serviceVersion: 1.6.0
      replicas: 1
      resources:
        limits:
          cpu: "0.5"
          memory: "1Gi"
        requests:
          cpu: "0.1"
          memory: "0.2Gi"

NOTE

These three components will be created strictly in controller->broker->exporter order as defined in ClusterDefinition.

Verifying the Deployment

Monitor the cluster status until it transitions to the Running state:

kubectl get cluster kafka-separated-cluster -n demo -w

Expected Output:

kubectl get cluster kafka-separated-cluster -n demo
NAME                      CLUSTER-DEFINITION   TERMINATION-POLICY   STATUS     AGE
kafka-separated-cluster   kafka                Delete               Creating   13s
kafka-separated-cluster   kafka                Delete               Running    63s

Check the pod status and roles:

kubectl get pods -l app.kubernetes.io/instance=kafka-separated-cluster -n demo

Expected Output:

NAME                                         READY   STATUS    RESTARTS   AGE
kafka-separated-cluster-kafka-broker-0       2/2     Running   0          13m
kafka-separated-cluster-kafka-controller-0   2/2     Running   0          13m
kafka-separated-cluster-kafka-exporter-0     1/1     Running   0          12m

Once the cluster status becomes Running, your Kafka cluster is ready for use.

TIP

If you are creating the cluster for the very first time, it may take some time to pull images before running.

Cluster Lifecycle Operations

Stopping the Cluster

Stopping a Kafka Cluster in KubeBlocks will:

Terminates all running pods
Retains persistent storage (PVCs)
Maintains cluster configuration

This operation is ideal for:

Temporary cost savings
Maintenance windows
Development environment pauses

Option 1: OpsRequest API

Create a Stop operation request:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: kafka-separated-cluster-stop-ops
  namespace: demo
spec:
  clusterName: kafka-separated-cluster
  type: Stop

Option 2: Cluster API Patch

Modify the cluster spec directly by patching the stop field:


kubectl patch cluster kafka-separated-cluster -n demo --type='json' -p='[
{
  "op": "add",
  "path": "/spec/componentSpecs/0/stop",
  "value": true
},
{
  "op": "add",
  "path": "/spec/componentSpecs/1/stop",
  "value": true
},
{
  "op": "add",
  "path": "/spec/componentSpecs/2/stop",
  "value": true
}
]'

Verifying Cluster Stop

To confirm a successful stop operation:

Check cluster status transition:

kubectl get cluster kafka-separated-cluster -n demo -w

Example Output:

NAME                      CLUSTER-DEFINITION    TERMINATION-POLICY   STATUS     AGE
kafka-separated-cluster   kafka                 Delete               Stopping   16m3s
kafka-separated-cluster   kafka                 Delete               Stopped    16m55s

Verify no running pods:

kubectl get pods -l app.kubernetes.io/instance=kafka-separated-cluster -n demo

Example Output:


No resources found in demo namespace.

Confirm persistent volumes remain:

kubectl get pvc -l app.kubernetes.io/instance=kafka-separated-cluster -n demo

Example Output:

NAME                                                  STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
data-kafka-separated-cluster-kafka-broker-0           Bound    pvc-ddd54e0f-414a-49ed-8e17-41e9f5082af1   20Gi       RWO            standard       <unset>                 14m
metadata-kafka-separated-cluster-kafka-broker-0       Bound    pvc-d63b7d80-cac5-41b9-b694-6a298921003b   1Gi        RWO            standard       <unset>                 14m
metadata-kafka-separated-cluster-kafka-controller-0   Bound    pvc-e6263eb1-405a-4090-b2bb-f92cca0ba36d   1Gi        RWO            standard       <unset>                 14m

Starting the Cluster

Starting a stopped Kafka Cluster:

Recreates all pods
Reattaches persistent storage
Restores service endpoints

Expected behavior:

Cluster returns to previous state
No data loss occurs
Services resume automatically

Initiate a Start operation request:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: kafka-separated-cluster-start-ops
  namespace: demo
spec:
  # Specifies the name of the Cluster resource that this operation is targeting.
  clusterName: kafka-separated-cluster
  type: Start

Modify the cluster spec to resume operation:

Set stop: false, or
Remove the stop field entirely


kubectl patch cluster kafka-separated-cluster -n demo --type='json' -p='[
{
  "op": "remove",
  "path": "/spec/componentSpecs/0/stop"
},
{
  "op": "remove",
  "path": "/spec/componentSpecs/1/stop"
},
{
  "op": "remove",
  "path": "/spec/componentSpecs/2/stop"
}
]'

Verifying Cluster Start

To confirm a successful start operation:

Check cluster status transition:

kubectl get cluster kafka-separated-cluster -n demo -w

Example Output:

NAME                      CLUSTER-DEFINITION     TERMINATION-POLICY   STATUS     AGE
kafka-separated-cluster   kafka                  Delete               Updating   24m
kafka-separated-cluster   kafka                  Delete               Running    24m
kafka-separated-cluster   kafka                  Delete               Running    24m

Verify pod recreation:

kubectl get pods -n demo -l app.kubernetes.io/instance=kafka-separated-cluster

Example Output:

NAME                                         READY   STATUS    RESTARTS   AGE
kafka-separated-cluster-kafka-broker-0       2/2     Running   0          2m4s
kafka-separated-cluster-kafka-controller-0   2/2     Running   0          104s
kafka-separated-cluster-kafka-exporter-0     1/1     Running   0          84s

Restarting Cluster

Restart operations provide:

Pod recreation without full cluster stop
Component-level granularity
Minimal service disruption

Use cases:

Configuration changes requiring restart
Resource refresh
Troubleshooting

Check Components

There are five components in Milvus Cluster. To get the list of components,


kubectl get cluster -n demo kafka-separated-cluster -oyaml | yq '.spec.componentSpecs[].name'

Expected Output:

kafka-controller
kafka-broker
kafka-exporter

Restart Proxy via OpsRequest API

List specific components to be restarted:


apiVersion: operations.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
  name: kafka-separated-cluster-restart-ops
  namespace: demo
spec:
  clusterName: kafka-separated-cluster
  type: Restart
  restart:
  - componentName: kafka-broker

Verifying Restart Completion

To verify a successful component restart:

Track OpsRequest progress:

kubectl get opsrequest kafka-separated-cluster-restart-ops -n demo -w

Example Output:

NAME                                  TYPE      CLUSTER                   STATUS    PROGRESS   AGE
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Running   0/1        8s
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Running   1/1        22s
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Running   1/1        23s
kafka-separated-cluster-restart-ops   Restart   kafka-separated-cluster   Succeed   1/1        23s

Check pod status:
```
kubectl get pods -n demo -l app.kubernetes.io/instance=kafka-separated-cluster
```
Note: Pods will show new creation timestamps after restart. Only pods belongs to component kafka-broker have been restarted.

Once the operation is complete, the cluster will return to the Running state.

Summary

In this guide, you learned how to:

Stop a Kafka Cluster to suspend operations while retaining persistent storage.
Start a stopped cluster to bring it back online.
Restart specific cluster components to recreate their Pods without stopping the entire cluster.