Skip to content

FAILOVER_ARCHITECTURE.md — Failover & Promotion

Status

  • Phase: 6
  • Authority: Normative
  • Depends on:
  • FAILOVER_VISION.md
  • FAILOVER_SCOPE.md
  • FAILOVER_INVARIANTS.md
  • Frozen Dependencies: Phases 0–5

1. Purpose

This document defines the logical architecture for Phase 6.

It specifies: - New logical components - Authority boundaries - Interaction points with frozen subsystems

It does not define: - Concrete data structures - Algorithms - APIs - Performance optimizations

Architecture here is descriptive, not prescriptive.


2. Architectural Principle

Phase 6 architecture follows one rule:

Failover logic observes and validates existing state;
it does not create new authority sources.

All authoritative state remains governed by: - WAL - Replication state (Phase 5) - Recovery logic (Phase 0–5)

Phase 6 introduces decision logic, not new truth.


3. New Logical Components

Phase 6 introduces the following logical-only components.

3.1 Promotion Controller

Responsibility - Coordinates promotion attempts - Orchestrates validation and decision flow - Emits observability and explanation data

Non-Responsibilities - Does not write WAL - Does not modify replication protocol - Does not infer liveness - Does not retry automatically

The Promotion Controller is purely coordinating.


3.2 Promotion Validator

Responsibility - Evaluates whether promotion is allowed - Validates all Phase 6 safety invariants - Produces an explicit allow/deny decision

Inputs - Replica replication state - WAL position and metadata - Known primary authority state

Outputs - Deterministic validation result - Explicit failure reasons if denied

Validator logic MUST be: - Deterministic - Side-effect free - Fully explainable


3.3 Authority Transition Manager

Responsibility - Applies an approved authority transition - Ensures atomicity of authority change - Integrates with existing replication state

Constraints - Must not overlap authority - Must be crash-safe - Must leave recovery unambiguous

This component performs state transition only after validation succeeds.


4. Interaction with Existing Subsystems

4.1 Replication Subsystem (Phase 5)

Phase 6: - Reads replication role and state - Validates replica readiness - Updates replication role only through allowed transitions

Phase 6 MUST NOT: - Modify replication protocol - Introduce new replication states without explicit definition - Alter Phase 5 authority checks


4.2 WAL & Recovery (Phases 0–3)

Phase 6: - Treats WAL as the sole durability authority - Uses WAL metadata to validate safety - Relies on existing recovery semantics

Phase 6 MUST NOT modify WAL: - Add WAL records - Change WAL formats

Phase 6 MAY use non-WAL durability for authority transition: - A single fsynced marker file (metadata/authority_transition.marker) - This marker is atomic: present = new authority, absent = old authority - The marker file is the SOLE authority state durability mechanism - This does NOT constitute a new WAL record or format change

Phase 6 MUST NOT: - Add recovery-time heuristics


4.3 MVCC & Visibility (Phase 2)

Phase 6: - Preserves all MVCC visibility rules - Ensures promotion does not alter snapshot semantics

Phase 6 MUST NOT: - Reinterpret commit identities - Adjust visibility boundaries - Introduce new MVCC states


4.4 Observability & Explanation (Phase 4)

Phase 6: - Emits observability events for promotion attempts - Produces explanation artifacts for decisions

Phase 6 MUST NOT: - Use observability data to guide decisions - Introduce feedback loops

Observability remains passive.


5. Authority Boundaries

5.1 Authority Sources (Unchanged)

The only authoritative sources remain: - WAL - Replication state - Recovery state

Phase 6 does not add authority; it rebinds authority explicitly.


5.2 Decision vs Authority

Important separation:

  • Decision: Phase 6 logic (can approve or deny)
  • Authority: Existing core systems (can commit or reject)

Phase 6 decisions are advisory until applied atomically.


6. Crash & Recovery Integration

Phase 6 architecture MUST ensure:

  • No partial authority transition is durable
  • Recovery always reconstructs a single authority state
  • Promotion outcomes are unambiguous after restart

All crash safety relies on existing recovery guarantees.


7. Explicitly Forbidden Architectural Patterns

Phase 6 MUST NOT introduce:

  • Background promotion threads
  • Heartbeat-driven logic
  • Time-based transitions
  • Distributed consensus subsystems
  • Hidden shared state between nodes

Any such pattern violates Phase 6 scope.


8. Architectural Completeness Criteria

This architecture is complete when:

  • Every promotion path maps to components above
  • Every failure path has a single handling point
  • No component has mixed authority roles
  • All interactions with frozen phases are read-only or explicit

9. Architectural Stability Rule

Once Phase 6 is frozen:

No new architectural component may be added
without a new phase.


END OF DOCUMENT