REPLICATION_VISION.md¶
AeroDB Phase 2B — Replication Vision¶
Status¶
- This document is authoritative
- It defines intent, scope, and boundaries
- No implementation details appear here
- Phase-1 and MVCC semantics are frozen and inviolable
1. Purpose of Replication in AeroDB¶
Replication exists to extend AeroDB from a single-node, correctness-complete database into a multi-node system without weakening any existing guarantees.
Replication in AeroDB is not a scaling shortcut. It is a durability, availability, and correctness extension.
The primary goal is state continuity, not throughput.
2. Non-Negotiable Continuity¶
Replication must preserve all existing guarantees:
Phase-1 (unchanged)¶
- No acknowledged write is ever lost
- WAL is the sole durability authority
- Recovery is deterministic
- Corruption is detected, never repaired silently
- Queries are bounded and deterministic
- Snapshot, checkpoint, backup, restore semantics remain intact
Phase-2 MVCC (unchanged)¶
- CommitId total order is authoritative
- Visibility is deterministic and snapshot-based
- Write atomicity is absolute
- GC rules are proof-based and WAL-governed
- MVCC semantics are frozen
Replication consumes these properties. It does not reinterpret them.
3. Replication Philosophy¶
Replication in AeroDB is:
- Deterministic — same WAL ⇒ same state everywhere
- Explicit — no hidden quorum rules or timing tricks
- Fail-stop — uncertainty results in refusal, not guesses
- History-preserving — replicas never rewrite or “heal” history
- MVCC-aware — version semantics are preserved exactly
Replication correctness is valued above availability.
4. What Replication Means in AeroDB¶
At a high level, replication introduces:
-
A single Primary
-
Sole authority for accepting writes
- Sole assigner of CommitIds
-
One or more Replicas
-
Receive ordered state from the Primary
- Never generate new history
-
Deterministic State Propagation
-
WAL-based replication
- Explicit snapshot transfer when needed
Replication is state replication, not command replication.
5. Explicit Non-Goals¶
Replication in this phase does not attempt to provide:
- Multi-leader writes
- Automatic leader election
- Write availability during primary loss
- Eventual consistency
- Conflict resolution
- Byzantine fault tolerance
If a behavior is not specified, it does not exist.
6. Replication Safety Model¶
Replication must obey the following high-level safety rules:
-
Single Writer Rule Only one node may acknowledge writes at any time.
-
CommitId Authority Rule CommitIds are assigned only by the Primary.
-
History Prefix Rule A Replica’s state must always be a prefix of the Primary’s history.
-
No Guessing Rule If ordering, completeness, or authority is uncertain → replication halts.
-
No Invisible Divergence Rule Divergence must be detectable and fatal, never silent.
7. Replication and Durability¶
Replication does not redefine durability.
-
A write is durable when:
-
WAL fsync semantics are satisfied per policy
- Replication may strengthen durability guarantees
- Replication must never weaken them
Acknowledgement semantics will be defined explicitly later.
8. Replication and MVCC¶
Replication must preserve:
- CommitId ordering
- Version immutability
- Snapshot isolation
- GC eligibility rules
A Replica must produce identical visibility outcomes for the same read view boundary.
MVCC correctness is replicated, not recomputed.
9. Crash & Failure Perspective¶
Replication must be correct under:
- Primary crash
- Replica crash
- Network partition
- Partial WAL transfer
- Snapshot transfer interruption
There are no undefined states. Every failure must map to a specified outcome.
10. Design Deferrals¶
This document intentionally does not define:
- WAL shipping mechanics
- Snapshot vs log transfer rules
- Replica read permissions
- Failure handling algorithms
- Promotion or leadership changes
Those belong in subsequent replication documents.
11. Vision Summary¶
Replication in AeroDB aims to:
- Extend correctness across nodes
- Preserve determinism end-to-end
- Make failure explicit and survivable
- Avoid all heuristic behavior
- Remain compatible with frozen MVCC semantics
Replication is not an optimization. It is correctness, extended in space.