Skip to content

PHASE 5 — REPLICATION IMPLEMENTATION

Status

  • Phase: 5
  • State: Implementation
  • Authority: Normative
  • Depends on:
  • Phase 0–4 (Frozen and Audited)
  • Replication semantics defined in Phase 2B
  • Observability & Explanation infrastructure from Phase 4

Phase 5 implements replication exactly as specified. It introduces no new replication semantics.


1. Purpose of Phase 5

Phase 5 exists to answer one question:

“Can AeroDB’s replication semantics, already proven on paper,
be implemented faithfully, observably, and deterministically?”

This phase turns frozen replication specifications into running code.


2. What Phase 5 Is (and Is Not)

Phase 5 IS

  • A faithful implementation of Phase 2B replication semantics
  • Single-writer, primary/replica replication
  • WAL-based log shipping
  • Snapshot-based bootstrap
  • Deterministic replica recovery
  • Observable and explainable replication state

Phase 5 IS NOT

  • A redesign of replication
  • Leader election
  • Automatic failover
  • Multi-primary writes
  • Consensus protocols (Raft/Paxos)
  • Availability-first behavior
  • Performance tuning beyond correctness

If it changes replication meaning, it does not belong in Phase 5.


3. Replication Model (Frozen Reference)

Phase 5 implements the following unchanged model:

  • One Primary
  • One or more Replicas
  • CommitId authority resides only on the Primary
  • Replicas apply WAL as a strict prefix
  • Replicas never acknowledge writes
  • Replica reads are allowed only when MVCC-safe

All of this is already defined. Phase 5 makes it real.


4. Phase 5 Core Principles

P5-1: Semantics Are Frozen

Replication behavior is governed by: - REPL_VISION.md - REPL_INVARIANTS.md - REPL_PROOFS.md

Phase 5 MUST NOT: - Add rules - Remove rules - Infer rules

Implementation follows spec verbatim.


P5-2: Determinism Over Availability

Replication prioritizes: 1. Correctness 2. Determinism 3. Explicit failure 4. Availability (last)

If replication cannot proceed safely: - It must stop - It must explain why


P5-3: Explicit Failure Over Silent Degradation

Replication failures MUST be: - Detected - Surfaced - Observable - Explainable

There is no “best effort” replication.


5. Phase 5 Scope

5.1 In Scope

Phase 5 implements:

  • WAL segment shipping from Primary to Replicas
  • Replica WAL validation (checksums, ordering)
  • Replica WAL application
  • Snapshot bootstrap (MVCC cut + WAL resume)
  • Replica restart and recovery
  • Replica read-safety enforcement
  • Replication observability
  • Replication explanations (why a replica is / is not safe)

5.2 Explicitly Out of Scope

Phase 5 does NOT include:

  • Automatic role changes
  • Promotion of replicas
  • Failover orchestration
  • Split-brain handling beyond detection
  • Network encryption
  • Authentication / authorization
  • Dynamic cluster membership

These belong to future phases.


6. Interaction with Earlier Phases

Phase 1 (Core)

  • WAL format is unchanged
  • Storage format is unchanged
  • Recovery logic is reused

Phase 2A (MVCC)

  • Snapshot isolation rules remain authoritative
  • Replica reads use identical MVCC rules

Phase 3 (Performance)

  • Replication may benefit from existing optimizations
  • No new optimizations are introduced

Phase 4 (DX)

  • Replication state is fully observable
  • Replication behavior is explainable
  • No replication control via DX

7. Phase 5 Invariants (High-Level)

Phase 5 MUST preserve:

  • WAL prefix rule
  • CommitId monotonicity
  • Deterministic replay
  • Snapshot consistency
  • Replica derivability
  • No acknowledged write loss
  • No silent divergence

Violation of any invariant is a Phase 5 failure.


8. Phase 5 Success Criteria

Phase 5 is successful if:

  • A replica can be bootstrapped from scratch
  • A replica can crash and recover deterministically
  • Replica reads are allowed only when safe
  • Replication lag is observable
  • Replication failures are explicit and explainable
  • Primary behavior is unchanged

All of this must be provable via tests and explanations.


9. Phase 5 Failure Philosophy

Replication failure is not exceptional. It is expected and must be handled explicitly.

If replication cannot guarantee correctness: - The replica must refuse to serve reads - The system must explain why

Availability is optional. Correctness is not.


10. Deliverables of Phase 5

Phase 5 produces:

  • Replication runtime implementation
  • Replica bootstrap mechanism
  • Replica recovery logic
  • Replication observability endpoints
  • Replication explanation artifacts
  • Exhaustive replication tests

No UI or operational tooling is required in this phase.


11. Phase 5 Completion Rules

Phase 5 is complete when:

  • All replication specs are implemented
  • All replication invariants are enforced
  • All failure modes are tested
  • Replication explanations are accurate
  • Phase 0–4 behavior remains unchanged

Only then may replication be considered “implemented”.


12. Guiding Statement

Phase 5 does not make AeroDB highly available.
It makes AeroDB correctly replicated.

Correct replication is more valuable than fast replication. Explainable replication is more valuable than opaque replication.


END OF DOCUMENT