PHASE 3 — PERFORMANCE (CORRECTNESS-PRESERVING OPTIMIZATIONS)¶
Status¶
- Phase: 3
- State: Design-first, proof-gated
- Prerequisites:
- Phase 1: Core Storage & Correctness — Frozen
- Phase 2A: MVCC — Frozen
- Phase 2B: Replication Semantics — Frozen
This document is authoritative for Phase 3 intent and scope.
1. Purpose of Phase 3¶
Phase 3 exists to improve performance without altering semantics.
This phase introduces mechanical optimizations only, under strict rules: - No behavior changes - No semantic relaxation - No heuristic shortcuts - No timing assumptions
Every optimization must be: 1. Explicitly specified 2. Correctness-proven 3. Opt-in (compile-time or config-gated) 4. Independently testable 5. Reversible without data impact
If an optimization cannot be proven correct, it is forbidden.
2. Definition of “Correctness-Preserving Performance”¶
An optimization is correctness-preserving if and only if:
- All externally observable behavior is bit-for-bit equivalent
- All failure modes remain detectable and explicit
- All crash-recovery outcomes remain deterministic
- All MVCC visibility rules remain unchanged
- All replication invariants remain unchanged
- All acknowledged writes maintain identical durability guarantees
Performance improvements must not introduce: - New states - New timing dependencies - New partial-commit behavior - New implicit ordering rules
3. Absolute Non-Goals¶
Phase 3 explicitly does NOT include:
- New features
- New APIs
- New consistency levels
- New isolation semantics
- New replication modes
- New storage formats
- New heuristics
- Adaptive or ML-based behavior
- Background threads that influence correctness
If a change would be marketed as a “feature”, it does not belong in Phase 3.
4. Frozen Foundations (Non-Negotiable)¶
Phase 3 MUST treat the following as immutable law:
4.1 Phase 1 Guarantees¶
- No acknowledged write is ever lost
- Corruption is detected, never repaired silently
- Recovery is deterministic
- Queries are bounded and deterministic
- Snapshots are read-only and manifest-driven
- Checkpoints bound WAL growth
- Observability never affects behavior
4.2 MVCC Guarantees¶
- CommitId authority is WAL-governed
- Snapshot isolation semantics are fixed
- Version chains are immutable
- Visibility rules are deterministic
- Garbage collection is WAL-governed
- Crash behavior is exhaustively defined
4.3 Replication Guarantees¶
- Single-writer invariant
- CommitId authority exists only on Primary
- WAL prefix rule is inviolable
- Replica state is always derivable
- No silent divergence
- Deterministic restart semantics
No optimization may reinterpret, soften, or bypass these guarantees.
5. Allowed Optimization Classes¶
Only the following classes of optimizations are allowed in Phase 3:
5.1 Mechanical Reordering (Proven Equivalent)¶
Examples: - Grouping independent fsync calls - Combining buffer flushes
Rules: - Logical order must remain identical - Failure boundaries must be preserved - Acknowledgment semantics must be unchanged
5.2 Redundant Work Elimination¶
Examples: - Avoiding duplicate checksum computation - Memoizing schema validation results
Rules: - Inputs must be provably identical - Cached results must be immutable - Cache invalidation must be explicit and total
5.3 Read-Only Fast Paths¶
Examples: - Zero-copy reads - Snapshot-local caching
Rules: - No writes - No visibility shortcuts - MVCC rules must be explicitly enforced
5.4 Deterministic Batching¶
Examples: - WAL record batching - Commit group formation
Rules: - No reordering across CommitId boundaries - No partial acknowledgment - Crash behavior must match unbatched execution
5.5 Memory Layout Optimization¶
Examples: - Cache-line alignment - Structure packing
Rules: - No semantic coupling to layout - No reliance on undefined behavior - No platform-specific correctness assumptions
6. Forbidden Optimization Patterns¶
The following are explicitly forbidden:
- Lazy durability
- Async acknowledgment
- Background retries
- Time-based flushing
- Adaptive thresholds
- Best-effort behavior
- Silent fallback paths
- “Usually safe” logic
- Hardware-specific correctness dependencies
If an optimization requires a disclaimer, it is not allowed.
7. Proof Requirements¶
Every Phase 3 optimization document MUST include:
- Baseline Semantics Section
- What the system does today
-
Why it is correct
-
Optimization Description
- Exact mechanical change
-
No implementation shortcuts
-
Invariant Preservation Proof
- Durability
- Determinism
- MVCC visibility
-
Replication safety
-
Failure Matrix
- Power loss
- Process crash
- Partial I/O
- Disk error
-
Replica disconnect (if applicable)
-
Equivalence Argument
-
Why optimized execution is observationally identical
-
Rollback Plan
- How optimization can be disabled
- Proof that disabling does not affect stored data
No code may be written without an accepted proof.
8. Observability Rules in Phase 3¶
Performance instrumentation: - May measure - May record - May expose metrics
It may never: - Influence control flow - Influence scheduling - Influence batching decisions - Influence retry behavior
Observability remains strictly passive.
9. Testing Requirements¶
Each optimization must introduce:
- Deterministic unit tests
- Crash-recovery equivalence tests
- Phase 1 regression tests
- MVCC regression tests
- Replication regression tests (if applicable)
All existing tests MUST pass unmodified.
If a test must change, the optimization is invalid.
10. Phase 3 Exit Criteria¶
Phase 3 is complete when:
- All selected optimizations are proven and implemented
- No correctness regressions exist
- No semantics are altered
- All optimizations are optional
- Documentation is complete and authoritative
Performance gains are secondary to proof quality.
11. Guiding Principle¶
AeroDB would rather be slow and correct than fast and ambiguous.
Phase 3 exists to make AeroDB faster only where correctness is untouched.
If there is doubt, we do not optimize.
END OF DOCUMENT