PostgreSQL High Availability Architecture

This page describes how KubeBlocks deploys a PostgreSQL high-availability (HA) cluster on Kubernetes — covering the resource hierarchy, pod internals, traffic routing, and automatic failover.

Application / Client

Read/Write pg-cluster-postgresql-postgresql:5432
Connection Pool pg-cluster-postgresql-postgresql:6432 (pgbouncer)

RW traffic → roleSelector: primary

Kubernetes Services

pg-cluster-postgresql-postgresql

ClusterIP · :5432 / :6432
selector: kubeblocks.io/role=primary
Endpoints auto-switch with primary

ReadWrite

→ primary pod only

Pods · Worker Nodes

postgresql-0PRIMARY

postgresql (Patroni)

:5432 pg · :8008 patroni API

leader

pgbouncer

:6432 conn pool

dbctl (role probe)

:5001 /v1.0/getrole

pg-exporter

:9187 metrics

PVCdata-0 · 20Gi

postgresql-1REPLICA

postgresql (Patroni)

:5432 pg · :8008 patroni API

replica

pgbouncer

:6432 conn pool

dbctl (role probe)

:5001 /v1.0/getrole

pg-exporter

:9187 metrics

PVCdata-1 · 20Gi

postgresql-2REPLICA

postgresql (Patroni)

:5432 pg · :8008 patroni API

replica

pgbouncer

:6432 conn pool

dbctl (role probe)

:5001 /v1.0/getrole

pg-exporter

:9187 metrics

PVCdata-2 · 20Gi

↔Streaming Replication (WAL)primary-0 → replica-1 · replica-2 | sync / async configurable

Headless service — stable pod DNS for internal use (replication, HA heartbeat, operator probes); not a client endpoint

each Patroni agent reads/writes K8s API via :8008

Patroni DCS · K8s API

ConfigMap {scope}-configcluster config · TTL 30s

ConfigMap {scope}leader lease · heartbeat

Secret account-*system account passwords

Poll every 10s · TTL 30s

Leader election via K8s lock

Failover → service re-routes

Primary / RW Traffic

Replica Pod

Patroni DCS (K8s API)

Persistent Storage

Resource Hierarchy

KubeBlocks models a PostgreSQL cluster as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies topology, replicas, storage, and resources
Component	Generated automatically; references a `ComponentDefinition` that describes container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running instance; each pod gets a unique ordinal and its own PVC

Containers Inside Each Pod

Every PostgreSQL pod runs four containers:

Container	Port	Purpose
`postgresql` (Patroni + Spilo)	5432 (PG), 8008 (Patroni API)	Database engine with built-in HA via Patroni
`pgbouncer`	6432	Per-pod connection pool (always enabled, no on/off switch)
`dbctl`	5001	Role probe sidecar — KubeBlocks queries `GET /v1.0/getrole` every second to detect the current role
`exporter`	9187	Prometheus metrics exporter

An init container (pg-init-container) runs postgres-pre-setup.sh before the main containers start to initialize the data directory and configuration files.

Each pod mounts its own PVC (data-{ordinal}, default 20 Gi) at /home/postgres/pgdata, providing independent persistent storage.

High Availability via Patroni

KubeBlocks uses Patroni for PostgreSQL HA. By default the Kubernetes API is used as the DCS (Distributed Configuration Store); etcd can optionally be used instead by providing a serviceRef to an etcd cluster.

Kubernetes Object	Purpose
`ConfigMap` `{scope}-config`	Cluster-wide Patroni configuration; TTL 30 s, loop interval 10 s
`ConfigMap` `{scope}`	Leader lease — the node holding this ConfigMap annotation is primary (`KUBERNETES_USE_CONFIGMAPS=true`)
`Secret` `account-*`	Auto-generated passwords for system accounts

On startup, every pod joins the Patroni cluster under the same scope. Patroni holds a leader election by atomically acquiring the leader annotation on the {scope} ConfigMap; the winner becomes primary and the others stream WAL as replicas.

Traffic Routing

KubeBlocks creates two services for each PostgreSQL cluster:

Service	Type	Ports	Selector
`{cluster}-postgresql-postgresql`	ClusterIP	5432 (PG), 6432 (pgbouncer)	`kubeblocks.io/role=primary`
`{cluster}-postgresql-headless`	Headless	—	all pods

The key mechanism is roleSelector: primary on the ClusterIP service. KubeBlocks probes each pod's role via dbctl every second and updates the pod label kubeblocks.io/role. The service's Endpoints therefore always point at the current primary — no VIP or external load balancer required.

Read/write traffic: connect to {cluster}-postgresql-postgresql:5432 or :6432 (pgbouncer)
Direct pod access (e.g., read replicas, Patroni heartbeats): use the headless service DNS pod-N.{cluster}-postgresql-headless.{namespace}.svc.cluster.local

NOTE

pgbouncer proxies connections to the local pod's PostgreSQL instance (its own pod IP), not to the primary. Traffic steering to the primary is handled by the ClusterIP service's role selector, not by pgbouncer.

Automatic Failover

When the primary pod fails, the following sequence restores service automatically:

Primary pod crashes (process death, node failure, network partition)
Patroni detects heartbeat timeout — the leader lease expires (TTL ≈ 30 s)
Replica acquires the lease — the first healthy replica to atomically update the leader annotation on the {scope} ConfigMap wins and promotes itself to primary
KubeBlocks detects the role change — the dbctl role probe returns primary for the new winner
Pod label updated — kubeblocks.io/role=primary is applied to the new primary pod
Service Endpoints switch — the ClusterIP service automatically routes traffic to the new primary

Total failover time is typically within 30–60 seconds, bounded by Patroni's TTL and the role probe interval.

For a planned switchover (e.g., maintenance), KubeBlocks calls the Patroni switchover API via switchover.sh, which performs a graceful demotion of the current primary and promotion of a chosen replica with no data loss.

System Accounts

KubeBlocks automatically creates and manages the following PostgreSQL system accounts. Passwords are auto-generated and stored in Secrets named {cluster}-{component}-account-{name}.

Account	Role	Purpose
`postgres`	Superuser	Default admin account
`kbadmin`	Superuser	KubeBlocks internal management
`kbdataprotection`	Superuser	Backup and restore operations
`kbprobe`	Monitor (`pg_monitor` role)	Health checks and liveness monitoring; role detection and `kubeblocks.io/role` label updates are driven by the `dbctl` sidecar probe, not this account
`kbmonitoring`	Monitor (`pg_monitor` role)	Prometheus metrics collection
`kbreplicator`	Replication	PostgreSQL streaming replication — standby connects to the primary using this account to pull WAL

PostgreSQL High Availability Architecture

This page describes how KubeBlocks deploys a PostgreSQL high-availability (HA) cluster on Kubernetes — covering the resource hierarchy, pod internals, traffic routing, and automatic failover.

Application / Client

Read/Write pg-cluster-postgresql-postgresql:5432
Connection Pool pg-cluster-postgresql-postgresql:6432 (pgbouncer)

RW traffic → roleSelector: primary

Kubernetes Services

pg-cluster-postgresql-postgresql

ClusterIP · :5432 / :6432
selector: kubeblocks.io/role=primary
Endpoints auto-switch with primary

ReadWrite

→ primary pod only

Pods · Worker Nodes

postgresql-0PRIMARY

postgresql (Patroni)

:5432 pg · :8008 patroni API

leader

pgbouncer

:6432 conn pool

dbctl (role probe)

:5001 /v1.0/getrole

pg-exporter

:9187 metrics

PVCdata-0 · 20Gi

postgresql-1REPLICA

postgresql (Patroni)

:5432 pg · :8008 patroni API

replica

pgbouncer

:6432 conn pool

dbctl (role probe)

:5001 /v1.0/getrole

pg-exporter

:9187 metrics

PVCdata-1 · 20Gi

postgresql-2REPLICA

postgresql (Patroni)

:5432 pg · :8008 patroni API

replica

pgbouncer

:6432 conn pool

dbctl (role probe)

:5001 /v1.0/getrole

pg-exporter

:9187 metrics

PVCdata-2 · 20Gi

↔Streaming Replication (WAL)primary-0 → replica-1 · replica-2 | sync / async configurable

Headless service — stable pod DNS for internal use (replication, HA heartbeat, operator probes); not a client endpoint

each Patroni agent reads/writes K8s API via :8008

Patroni DCS · K8s API

ConfigMap {scope}-configcluster config · TTL 30s

ConfigMap {scope}leader lease · heartbeat

Secret account-*system account passwords

Poll every 10s · TTL 30s

Leader election via K8s lock

Failover → service re-routes

Primary / RW Traffic

Replica Pod

Patroni DCS (K8s API)

Persistent Storage

Resource Hierarchy

KubeBlocks models a PostgreSQL cluster as a hierarchy of Kubernetes custom resources:

Cluster  →  Component  →  InstanceSet  →  Pod × N

Resource	Role
Cluster	User-facing declaration — specifies topology, replicas, storage, and resources
Component	Generated automatically; references a `ComponentDefinition` that describes container specs, lifecycle actions, and services
InstanceSet	KubeBlocks custom workload (replaces `StatefulSet`); manages pods with stable identities and role awareness
Pod	Actual running instance; each pod gets a unique ordinal and its own PVC

Containers Inside Each Pod

Every PostgreSQL pod runs four containers:

Container	Port	Purpose
`postgresql` (Patroni + Spilo)	5432 (PG), 8008 (Patroni API)	Database engine with built-in HA via Patroni
`pgbouncer`	6432	Per-pod connection pool (always enabled, no on/off switch)
`dbctl`	5001	Role probe sidecar — KubeBlocks queries `GET /v1.0/getrole` every second to detect the current role
`exporter`	9187	Prometheus metrics exporter

An init container (pg-init-container) runs postgres-pre-setup.sh before the main containers start to initialize the data directory and configuration files.

Each pod mounts its own PVC (data-{ordinal}, default 20 Gi) at /home/postgres/pgdata, providing independent persistent storage.

High Availability via Patroni

Kubernetes Object	Purpose
`ConfigMap` `{scope}-config`	Cluster-wide Patroni configuration; TTL 30 s, loop interval 10 s
`ConfigMap` `{scope}`	Leader lease — the node holding this ConfigMap annotation is primary (`KUBERNETES_USE_CONFIGMAPS=true`)
`Secret` `account-*`	Auto-generated passwords for system accounts

Traffic Routing

KubeBlocks creates two services for each PostgreSQL cluster:

Service	Type	Ports	Selector
`{cluster}-postgresql-postgresql`	ClusterIP	5432 (PG), 6432 (pgbouncer)	`kubeblocks.io/role=primary`
`{cluster}-postgresql-headless`	Headless	—	all pods

Read/write traffic: connect to {cluster}-postgresql-postgresql:5432 or :6432 (pgbouncer)
Direct pod access (e.g., read replicas, Patroni heartbeats): use the headless service DNS pod-N.{cluster}-postgresql-headless.{namespace}.svc.cluster.local

NOTE

Automatic Failover

When the primary pod fails, the following sequence restores service automatically:

Primary pod crashes (process death, node failure, network partition)
Patroni detects heartbeat timeout — the leader lease expires (TTL ≈ 30 s)
Replica acquires the lease — the first healthy replica to atomically update the leader annotation on the {scope} ConfigMap wins and promotes itself to primary
KubeBlocks detects the role change — the dbctl role probe returns primary for the new winner
Pod label updated — kubeblocks.io/role=primary is applied to the new primary pod
Service Endpoints switch — the ClusterIP service automatically routes traffic to the new primary

Total failover time is typically within 30–60 seconds, bounded by Patroni's TTL and the role probe interval.

System Accounts

KubeBlocks automatically creates and manages the following PostgreSQL system accounts. Passwords are auto-generated and stored in Secrets named {cluster}-{component}-account-{name}.

Account	Role	Purpose
`postgres`	Superuser	Default admin account
`kbadmin`	Superuser	KubeBlocks internal management
`kbdataprotection`	Superuser	Backup and restore operations
`kbprobe`	Monitor (`pg_monitor` role)	Health checks and liveness monitoring; role detection and `kubeblocks.io/role` label updates are driven by the `dbctl` sidecar probe, not this account
`kbmonitoring`	Monitor (`pg_monitor` role)	Prometheus metrics collection
`kbreplicator`	Replication	PostgreSQL streaming replication — standby connects to the primary using this account to pull WAL