Scale a Kafka cluster
You can scale a Kafka cluster in two ways, vertical scaling and horizontal scaling.
Vertical scaling
You can vertically scale a cluster by changing resource requirements and limits (CPU and storage). For example, you can change the resource class from 1C2G to 2C4G by performing vertical scaling.
Before you start
Check whether the cluster status is Running
. Otherwise, the following operations may fail.
- kbcli
- kubectl
kbcli cluster list mycluster -n demo
>
NAME NAMESPACE CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS CREATED-TIME
mycluster demo kafka kafka-3.3.2 Delete Running Sep 27,2024 15:15 UTC+0800
kubectl -n demo get cluster mycluster
>
NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE
mycluster kafka kafka-3.3.2 Delete Running 19m
Steps
- kbcli
- OpsRequest
- Edit cluster YAML file
Configure the parameters
--components
,--memory
, and--cpu
and run the command.kbcli cluster vscale mycluster -n demo --components="broker" --memory="4Gi" --cpu="2"
--components
value can bebroker
orcontroller
.- broker: all nodes in the combined mode, or all the broker node in the separated node.
- controller: all the corresponding nodes in the separated mode.
--memory
describes the requested and limited size of the component memory.--cpu
describes the requested and limited size of the component CPU.
Validate the vertical scaling operation.
View the OpsRequest progress.
KubeBlocks outputs a command automatically for you to view the OpsRequest progress. The output includes the status of this OpsRequest and Pods. When the status is
Succeed
, this OpsRequest is completed.kbcli cluster describe-ops mycluster-verticalscaling-g67k9 -n demo
Check the cluster status.
kbcli cluster list mycluster -n demo
>
NAME NAMESPACE CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS CREATED-TIME
mycluster demo kafka kafka-3.3.2 Delete Updating Sep 27,2024 15:15 UTC+0800STATUS=Updating: it means the vertical scaling is in progress.
STATUS=Running: it means the vertical scaling operation has been applied.
STATUS=Abnormal: it means the vertical scaling is abnormal. The reason may be that the number of the normal instances is less than that of the total instance or the leader instance is running properly while others are abnormal.
To solve the problem, you can manually check whether this error is caused by insufficient resources. Then if AutoScaling is supported by the Kubernetes cluster, the system recovers when there are enough resources. Otherwise, you can create enough resources and troubleshoot with
kubectl describe
command.
Vertical scaling does not synchronize parameters related to CPU and memory and it is required to manually call the OpsRequest of configuration to change parameters accordingly. Refer to Configuration for instructions.
After the OpsRequest status is
Succeed
or the cluster status isRunning
again, check whether the corresponding resources change.kbcli cluster describe mycluster -n demo
Apply an OpsRequest to the specified cluster. Configure the parameters according to your needs.
kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: ops-vertical-scaling
namespace: demo
spec:
clusterRef: mycluster
type: VerticalScaling
verticalScaling:
- componentName: broker
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
EOFCheck the operation status to validate the vertical scaling.
kubectl get ops -n demo
>
NAMESPACE NAME TYPE CLUSTER STATUS PROGRESS AGE
demo ops-vertical-scaling VerticalScaling mycluster Succeed 3/3 6mIf an error occurs, you can troubleshoot with
kubectl describe ops -n demo
command to view the events of this operation.Check whether the corresponding resources change.
kubectl describe cluster mycluster -n demo
>
......
Component Specs:
Component Def Ref: kafka
Enabled Logs:
running
DisableExporter: true
Name: kafka
Replicas: 2
Resources:
Limits:
Cpu: 2
Memory: 4Gi
Requests:
Cpu: 1
Memory: 2Gi
Change the configuration of
spec.componentSpecs.resources
in the YAML file.spec.componentSpecs.resources
controls the requirement and limit of resources and changing them triggers a vertical scaling.apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
name: mycluster
namespace: demo
spec:
clusterDefinitionRef: kafka
clusterVersionRef: kafka-3.3.2
componentSpecs:
- name: broker
componentDefRef: broker
replicas: 1
resources: # Change the values of resources.
requests:
memory: "2Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "2"
volumeClaimTemplates:
- name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
terminationPolicy: DeleteCheck whether the corresponding resources change.
kubectl describe cluster mycluster -n demo
>
......
Component Specs:
Component Def Ref: kafka
Enabled Logs:
running
DisableExporter: true
Name: kafka
Replicas: 2
Resources:
Limits:
Cpu: 2
Memory: 4Gi
Requests:
Cpu: 1
Memory: 2Gi
Horizontal scaling
Horizontal scaling changes the amount of pods. For example, you can scale out replicas from three to five.
From v0.9.0, besides replicas, KubeBlocks also supports scaling in and out instances, refer to the Horizontal Scale tutorial for more details and examples.
Before you start
Check whether the cluster STATUS is
Running
. Otherwise, the following operations may fail.- kbcli
- kubectl
kbcli cluster list mycluster -n demo
>
NAME NAMESPACE CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS CREATED-TIME
mycluster demo kafka kafka-3.3.2 Delete Running Sep 27,2024 15:15 UTC+0800kubectl -n demo get cluster mycluster
>
NAME CLUSTER-DEFINITION VERSION TERMINATION-POLICY STATUS AGE
mycluster kafka kafka-3.3.2 Delete Running 19mYou are not recommended to perform horizontal scaling on the controller node, including the controller node both in combined mode and separated node.
When scaling in horizontally, you must know the topic partition storage. If the topic has only one replication, data loss may caused when you scale in broker.
Steps
- kbcli
- OpsRequest
- Edit cluster YAML file
Configure the parameters
--components
and--replicas
, and run the command.kbcli cluster hscale mycluster -n demo --components="broker" --replicas=3
--components
describes the component name ready for horizontal scaling.--replicas
describes the replica amount of the specified components. Edit the amount based on your demands to scale in or out replicas.
Validate the horizontal scaling operation.
View the OpsRequest progress.
KubeBlocks outputs a command automatically for you to view the OpsRequest progress. The output includes the status of this OpsRequest and Pods. When the status is
Succeed
, this OpsRequest is completed.kbcli cluster describe-ops mycluster-horizontalscaling-ffp9p -n demo
View the cluster satus.
kbcli cluster list mycluster -n demo
- STATUS=Updating: it means horizontal scaling is in progress.
- STATUS=Running: it means horizontal scaling has been applied.
Check whether the corresponding resources change.
kbcli cluster describe mycluster -n demo
Apply an OpsRequest to a specified cluster. Configure the parameters according to your needs.
The example below means adding two replicas.
kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: ops-horizontal-scaling
namespace: demo
spec:
clusterRef: mycluster
type: HorizontalScaling
horizontalScaling:
- componentName: broker
scaleOut:
replicaChanges: 2
EOFIf you want to scale in replicas, replace
scaleOut
withscaleIn
.The example below means deleting two replicas.
kubectl apply -f - <<EOF
apiVersion: apps.kubeblocks.io/v1alpha1
kind: OpsRequest
metadata:
name: ops-horizontal-scaling
namespace: demo
spec:
clusterRef: mycluster
type: HorizontalScaling
horizontalScaling:
- componentName: broker
scaleIn:
replicaChanges: 2
EOFCheck the operation status to validate the horizontal scaling.
kubectl get ops -n demo
>
NAMESPACE NAME TYPE CLUSTER STATUS PROGRESS AGE
demo ops-horizontal-scaling HorizontalScaling mycluster Succeed 3/3 6mIf an error occurs, you can troubleshoot with
kubectl describe ops -n demo
command to view the events of this operation.Check whether the corresponding resources change.
kubectl describe cluster mycluster -n demo
>
......
Component Specs:
Component Def Ref: kafka
Enabled Logs:
running
DisableExporter: true
Name: kafka
Replicas: 2
Resources:
Limits:
Cpu: 2
Memory: 4Gi
Requests:
Cpu: 1
Memory: 2Gi
Change the configuration of
spec.componentSpecs.replicas
in the YAML file.spec.componentSpecs.replicas
stands for the pod amount and changing this value triggers a horizontal scaling of a cluster.kubectl edit cluster mycluster -n demo
>
apiVersion: apps.kubeblocks.io/v1alpha1
kind: Cluster
metadata:
name: mycluster
namespace: demo
spec:
clusterDefinitionRef: kafka
clusterVersionRef: kafka-3.3.2
componentSpecs:
- name: broker
componentDefRef: broker
replicas: 2 # Change the amount
volumeClaimTemplates:
- name: data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
terminationPolicy: DeleteCheck whether the corresponding resources change.
kubectl describe cluster mycluster -n demo
>
......
Component Specs:
Component Def Ref: kafka
Enabled Logs:
running
DisableExporter: true
Name: kafka
Replicas: 2
Resources:
Limits:
Cpu: 2
Memory: 4Gi
Requests:
Cpu: 1
Memory: 2Gi
Handle the snapshot exception
If STATUS=ConditionsError
occurs during the horizontal scaling process, you can find the cause from cluster.status.condition.message
for troubleshooting.
In the example below, a snapshot exception occurs.
Status:
conditions:
- lastTransitionTime: "2023-02-08T04:20:26Z"
message: VolumeSnapshot/mycluster-kafka-scaling-dbqgp: Failed to set default snapshot
class with error cannot find default snapshot class
reason: ApplyResourcesFailed
status: "False"
type: ApplyResources
Reason
This exception occurs because the VolumeSnapshotClass
is not configured. This exception can be fixed after configuring VolumeSnapshotClass
, but the horizontal scaling cannot continue to run. It is because the wrong backup (volumesnapshot is generated by backup) and volumesnapshot generated before still exist. Delete these two wrong resources and then KubeBlocks re-generates new resources.
Steps:
Configure the VolumeSnapshotClass by running the command below.
kubectl create -f - <<EOF
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshotClass
metadata:
name: csi-aws-vsc
annotations:
snapshot.storage.kubernetes.io/is-default-class: "true"
driver: ebs.csi.aws.com
deletionPolicy: Delete
EOFDelete the wrong backup (volumesnapshot is generated by backup) and volumesnapshot resources.
kubectl delete backup -l app.kubernetes.io/instance=mycluster -n demo
kubectl delete volumesnapshot -l app.kubernetes.io/instance=mycluster -n demo
Result
The horizontal scaling continues after backup and volumesnapshot are deleted and the cluster restores to running status.