ZooKeeper Architecture in KubeBlocks

This page describes how KubeBlocks deploys an Apache ZooKeeper ensemble on Kubernetes — covering the resource hierarchy, pod internals, the ZAB consensus protocol, and traffic routing.

Application / Client

Write/coord zk-cluster-zookeeper:2181
Read (all nodes) zk-cluster-zookeeper-readable:2181

RW → roleSelector: leader

Read → all pods (no roleSelector)

Kubernetes Services

zk-cluster-zookeeper

ClusterIP · :2181 client · :8080 admin
selector: kubeblocks.io/role=leader
Endpoints auto-switch with leader

Leader

zk-cluster-zookeeper-readable

ClusterIP · :2181 client
no roleSelector — all pods
Distribute reads across all nodes

All Nodes

→ leader pod

→ any pod (load balanced)

ZooKeeper Pods · Worker Nodes

zookeeper-0LEADER

zookeeper

:2181 client · :2888 quorum · :3888 elect · :8080 admin

leader

PVC data-0 · /bitnami/zookeeper/data

zookeeper-1FOLLOWER

zookeeper

:2181 client · :2888 quorum · :3888 elect · :8080 admin

follower

PVC data-1 · /bitnami/zookeeper/data

zookeeper-2FOLLOWER

zookeeper

:2181 client · :2888 quorum · :3888 elect · :8080 admin

follower

PVC data-2 · /bitnami/zookeeper/data

ZAB Protocol (ZooKeeper Atomic Broadcast)all writes go through leader · broadcast to followers · majority ack required

Headless service — stable pod DNS for internal use (quorum, leader election, operator probes); not a client endpoint

Leader / Write Traffic

All-Node Read Traffic

Follower Pod

Persistent Storage

Resource Hierarchy

KubeBlocks models a ZooKeeper ensemble as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies the number of ensemble members, storage size, and resources
Component	Generated automatically; references a `ComponentDefinition` that describes container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running ZooKeeper server; each pod gets a unique ordinal (`myid`), a stable DNS name, and its own PVC

ZooKeeper requires an odd number of members (3, 5, or 7) to maintain a voting quorum. KubeBlocks assigns a unique myid to each pod, derived from its ordinal, which persists across restarts.

Containers Inside Each Pod

Every ZooKeeper pod runs one container:

Container	Port	Purpose
`zookeeper`	2181 (client), 2888 (quorum/follower), 3888 (leader election), 7000 (metrics), 8080 (admin)	ZooKeeper server participating in the ZAB consensus protocol and serving client requests; exposes Prometheus metrics natively on port 7000; roleProbe runs `/kubeblocks/scripts/roleprobe.sh` inside this container

Each pod mounts two PVCs:

data PVC → /bitnami/zookeeper/data — snapshot files and the ZooKeeper data tree
snapshot-log PVC → /bitnami/zookeeper/log — transaction log files

Node Roles

Role	Description
Leader	Coordinates all write transactions; broadcasts proposals to followers and observers; elected via the ZAB leader election protocol
Follower	Participates in voting for write quorum; serves client read requests locally; forwards writes to the leader
Observer	Non-voting member that replicates state from the leader; serves read requests; used to scale read throughput without affecting write quorum. Observer is not part of the default addon deployment — it requires explicit configuration in `zoo.cfg` and is not provisioned automatically by KubeBlocks.

High Availability via ZAB Protocol

ZooKeeper provides HA through the ZooKeeper Atomic Broadcast (ZAB) protocol, which guarantees total order of updates and crash-recovery:

ZAB Phase	Description
Leader election	On startup or after leader failure, servers exchange votes using a FastLeaderElection algorithm; the server with the most up-to-date transaction log and highest ID wins
Synchronization	The new leader synchronizes followers to bring them up to date before resuming normal operation
Broadcast	All write requests go through the leader; the leader sends a proposal to all followers; a write is committed when a quorum acknowledges it
Quorum	`(N/2) + 1` servers must be available for writes to succeed; reads can be served by any server

A 3-member ensemble tolerates 1 failure; a 5-member ensemble tolerates 2 failures.

Leader Election Process

When the leader becomes unavailable:

All remaining servers detect the missing heartbeat and enter leader election mode
Each server votes for the candidate with the highest zxid (transaction ID) and myid
The server that collects a quorum of votes becomes the new leader
The new leader synchronizes followers before resuming write operations
Leader election typically completes in 200 ms to 2 seconds under normal network conditions

Traffic Routing

KubeBlocks creates three services for each ZooKeeper ensemble:

Service	Type	Port	Selector
`{cluster}-zookeeper`	ClusterIP	2181 (client), 2888 (quorum), 8080 (admin)	`kubeblocks.io/role=leader`
`{cluster}-zookeeper-readable`	ClusterIP	2181 (client), 2888 (quorum), 8080 (admin)	all pods (no roleSelector)
`{cluster}-zookeeper-headless`	Headless	2181, 2888, 3888	all pods

Write / coordination traffic: connect to {cluster}-zookeeper:2181 — the Endpoints always point at the current leader. Write requests go directly to the leader without any forwarding overhead.
Read-heavy workloads: connect to {cluster}-zookeeper-readable:2181 — distributes client connections across all ensemble members (leader and followers). Followers serve read requests locally; write requests are still forwarded to the leader by the follower that receives them.

Port 7000 (Prometheus metrics) is exposed on each pod but is not included in any ClusterIP service. To scrape metrics, connect directly to the pod on port 7000, or configure a PodMonitor targeting port 7000 on the headless service.

Quorum and leader election traffic (ports 2888 and 3888) uses the headless service, where each ensemble member is individually addressable by its stable pod DNS name:

{pod-name}.{cluster}-zookeeper-headless.{namespace}.svc.cluster.local

The zoo.cfg configuration file references all peer addresses using these stable DNS names, ensuring correct cluster membership after pod restarts or rolling updates.

Automatic Failover

When a ZooKeeper ensemble member fails:

Member goes offline — peers detect the missing heartbeat within the session timeout (default 2× tick time)
Leader election (if the lost member was the leader) — surviving members elect a new leader in milliseconds to seconds
Write continuity — as long as a quorum remains available, all write and read operations continue normally
Pod recovery — when the failed pod restarts, it reads its myid from the PVC, contacts the leader, and syncs any missed transactions before rejoining the ensemble

System Accounts

KubeBlocks manages the following ZooKeeper system account. The password is auto-generated and stored in a Secret named {cluster}-{component}-account-admin (replace {cluster} and {component} with your Cluster metadata.name and the ZooKeeper component name, typically zookeeper).

Account	Role	Purpose
`admin`	Admin	Administrator user when ZooKeeper authentication is enabled (`ZOO_ENABLE_AUTH=yes`); credentials are injected into pods as `ZK_ADMIN_USER` and `ZK_ADMIN_PASSWORD` for authenticated client and administrative access

ZooKeeper Architecture in KubeBlocks

This page describes how KubeBlocks deploys an Apache ZooKeeper ensemble on Kubernetes — covering the resource hierarchy, pod internals, the ZAB consensus protocol, and traffic routing.

Application / Client

Write/coord zk-cluster-zookeeper:2181
Read (all nodes) zk-cluster-zookeeper-readable:2181

RW → roleSelector: leader

Read → all pods (no roleSelector)

Kubernetes Services

zk-cluster-zookeeper

ClusterIP · :2181 client · :8080 admin
selector: kubeblocks.io/role=leader
Endpoints auto-switch with leader

Leader

zk-cluster-zookeeper-readable

ClusterIP · :2181 client
no roleSelector — all pods
Distribute reads across all nodes

All Nodes

→ leader pod

→ any pod (load balanced)

ZooKeeper Pods · Worker Nodes

zookeeper-0LEADER

zookeeper

:2181 client · :2888 quorum · :3888 elect · :8080 admin

leader

PVC data-0 · /bitnami/zookeeper/data

zookeeper-1FOLLOWER

zookeeper

:2181 client · :2888 quorum · :3888 elect · :8080 admin

follower

PVC data-1 · /bitnami/zookeeper/data

zookeeper-2FOLLOWER

zookeeper

:2181 client · :2888 quorum · :3888 elect · :8080 admin

follower

PVC data-2 · /bitnami/zookeeper/data

ZAB Protocol (ZooKeeper Atomic Broadcast)all writes go through leader · broadcast to followers · majority ack required

Headless service — stable pod DNS for internal use (quorum, leader election, operator probes); not a client endpoint

Leader / Write Traffic

All-Node Read Traffic

Follower Pod

Persistent Storage

Resource Hierarchy

KubeBlocks models a ZooKeeper ensemble as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies the number of ensemble members, storage size, and resources
Component	Generated automatically; references a `ComponentDefinition` that describes container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running ZooKeeper server; each pod gets a unique ordinal (`myid`), a stable DNS name, and its own PVC

ZooKeeper requires an odd number of members (3, 5, or 7) to maintain a voting quorum. KubeBlocks assigns a unique myid to each pod, derived from its ordinal, which persists across restarts.

Containers Inside Each Pod

Every ZooKeeper pod runs one container:

Container	Port	Purpose
`zookeeper`	2181 (client), 2888 (quorum/follower), 3888 (leader election), 7000 (metrics), 8080 (admin)	ZooKeeper server participating in the ZAB consensus protocol and serving client requests; exposes Prometheus metrics natively on port 7000; roleProbe runs `/kubeblocks/scripts/roleprobe.sh` inside this container

Each pod mounts two PVCs:

data PVC → /bitnami/zookeeper/data — snapshot files and the ZooKeeper data tree
snapshot-log PVC → /bitnami/zookeeper/log — transaction log files

Node Roles

Role	Description
Leader	Coordinates all write transactions; broadcasts proposals to followers and observers; elected via the ZAB leader election protocol
Follower	Participates in voting for write quorum; serves client read requests locally; forwards writes to the leader
Observer	Non-voting member that replicates state from the leader; serves read requests; used to scale read throughput without affecting write quorum. Observer is not part of the default addon deployment — it requires explicit configuration in `zoo.cfg` and is not provisioned automatically by KubeBlocks.

High Availability via ZAB Protocol

ZooKeeper provides HA through the ZooKeeper Atomic Broadcast (ZAB) protocol, which guarantees total order of updates and crash-recovery:

ZAB Phase	Description
Leader election	On startup or after leader failure, servers exchange votes using a FastLeaderElection algorithm; the server with the most up-to-date transaction log and highest ID wins
Synchronization	The new leader synchronizes followers to bring them up to date before resuming normal operation
Broadcast	All write requests go through the leader; the leader sends a proposal to all followers; a write is committed when a quorum acknowledges it
Quorum	`(N/2) + 1` servers must be available for writes to succeed; reads can be served by any server

A 3-member ensemble tolerates 1 failure; a 5-member ensemble tolerates 2 failures.

Leader Election Process

When the leader becomes unavailable:

All remaining servers detect the missing heartbeat and enter leader election mode
Each server votes for the candidate with the highest zxid (transaction ID) and myid
The server that collects a quorum of votes becomes the new leader
The new leader synchronizes followers before resuming write operations
Leader election typically completes in 200 ms to 2 seconds under normal network conditions

Traffic Routing

KubeBlocks creates three services for each ZooKeeper ensemble:

Service	Type	Port	Selector
`{cluster}-zookeeper`	ClusterIP	2181 (client), 2888 (quorum), 8080 (admin)	`kubeblocks.io/role=leader`
`{cluster}-zookeeper-readable`	ClusterIP	2181 (client), 2888 (quorum), 8080 (admin)	all pods (no roleSelector)
`{cluster}-zookeeper-headless`	Headless	2181, 2888, 3888	all pods

Write / coordination traffic: connect to {cluster}-zookeeper:2181 — the Endpoints always point at the current leader. Write requests go directly to the leader without any forwarding overhead.
Read-heavy workloads: connect to {cluster}-zookeeper-readable:2181 — distributes client connections across all ensemble members (leader and followers). Followers serve read requests locally; write requests are still forwarded to the leader by the follower that receives them.

Quorum and leader election traffic (ports 2888 and 3888) uses the headless service, where each ensemble member is individually addressable by its stable pod DNS name:

{pod-name}.{cluster}-zookeeper-headless.{namespace}.svc.cluster.local

The zoo.cfg configuration file references all peer addresses using these stable DNS names, ensuring correct cluster membership after pod restarts or rolling updates.

Automatic Failover

When a ZooKeeper ensemble member fails:

Member goes offline — peers detect the missing heartbeat within the session timeout (default 2× tick time)
Leader election (if the lost member was the leader) — surviving members elect a new leader in milliseconds to seconds
Write continuity — as long as a quorum remains available, all write and read operations continue normally
Pod recovery — when the failed pod restarts, it reads its myid from the PVC, contacts the leader, and syncs any missed transactions before rejoining the ensemble

System Accounts

Account	Role	Purpose
`admin`	Admin	Administrator user when ZooKeeper authentication is enabled (`ZOO_ENABLE_AUTH=yes`); credentials are injected into pods as `ZK_ADMIN_USER` and `ZK_ADMIN_PASSWORD` for authenticated client and administrative access