Guides
High Availability
Configure replication, failover, and self-healing behavior.
Keldon uses PostgreSQL streaming replication with synchronous or asynchronous commit modes. The operator continuously monitors cluster health and performs automatic failover when the primary becomes unreachable.
Replication topology
A standard production cluster consists of one primary and two replicas:
spec:
instances: 3
minSyncReplicas: 1
maxSyncReplicas: 1
With minSyncReplicas: 1, the primary waits for at least one replica to acknowledge each write before returning success.
Failover behavior
When a primary failure is detected, the operator:
- Promotes the replica with the most advanced WAL position
- Reconfigures remaining replicas to stream from the new primary
- Updates the
-rwservice endpoint - Reports the failover event in cluster status
Typical failover completes in under 30 seconds.
Pod disruption budgets
The operator creates PDBs automatically to prevent Kubernetes from evicting all replicas at once during node maintenance.
Anti-affinity
By default, the operator schedules instances across different nodes. For stricter isolation across availability zones:
spec:
affinity:
topologyKey: topology.kubernetes.io/zone