Skip to content

ADR-0015: Distributed storage

Proposed
Status

proposed

Date

2026-03-09

Group

storage

Depends-on

ADR-0008, ADR-0009, ADR-0014

Context

The platform needs distributed storage that replicates data across nodes and availability zones (ADR-0009), runs hyperconverged on bare-metal (ADR-0014), and provides block, file, and object storage to tenant clusters via Kubernetes CSI.

Options

Option 1: Ceph (via Rook operator)

  • Pros: proven at scale (exabyte-class deployments); provides block (RBD), file (CephFS), and object (RGW) from one system; cross-AZ replication built in; Rook provides Kubernetes-native lifecycle management; open source (LGPL); supports both hyperconverged and disaggregated topologies

  • Cons: operationally complex; requires tuning for bare-metal performance; minimum 3 nodes per storage cluster; significant resource overhead (MON, OSD, MGR daemons)

Option 2: LINSTOR/DRBD

  • Pros: proven in European hosting (Linbit is Austrian); efficient synchronous block replication; low overhead; supports both hyperconverged and disaggregated

  • Cons: block storage only — no file or object; smaller community than Ceph; dual licensing (GPLv2/commercial); less Kubernetes-native than Rook

Option 3: Longhorn

  • Pros: simple to deploy and operate; Kubernetes-native; CNCF project

  • Cons: block storage only — no file or object; not proven at large scale; no cross-AZ replication; limited to hyperconverged

Option 4: Specialized tools per storage tier (e.g. LINSTOR for block, MinIO for object, CephFS for file)

  • Pros: best-of-breed per tier; each component optimized for its workload

  • Cons: three systems to operate, monitor, and upgrade; no unified management; cross-tier consistency is the operator’s problem; higher operational complexity

Option 5: OpenEBS (Mayastor)

  • Pros: NVMe-optimized; Kubernetes-native; CNCF project

  • Cons: primarily block storage; less mature than Ceph; smaller community; limited large-scale production references

Decision

Ceph via Rook. Ceph is the only option that provides block, file, and object storage from a single system at our target scale. LINSTOR is a credible block storage alternative with European roots, but lacks file and object storage — we would need additional systems (e.g. MinIO for object), increasing operational complexity. The specialized-tools-per-tier approach was considered but rejected: operating three storage systems is significantly more complex than one. Longhorn and OpenEBS are not proven at the scale we need (ADR-0002). Cross-AZ replication satisfies the 3-AZ requirement (ADR-0009). Rook handles Ceph lifecycle within Kubernetes.

Consequences

  • Rook operator manages Ceph cluster lifecycle per tenant or shared infrastructure

  • Storage replication is configured across 3 AZs

  • Object storage (Ceph RGW) may eliminate the need for a separate object storage solution

  • Ceph tuning and monitoring is a first-class operational concern

  • Backup and disaster recovery strategy must account for Ceph’s replication model (separate ADR)