SEMANTIC EQUIVALENCE — PHASE 3¶

Status¶

Phase: 3
Authority: Normative
Scope: Equivalence definition for all Phase 3 optimizations
Dependencies:
PERF_VISION.md
PERF_INVARIANTS.md
PERF_PROOF_RULES.md
PERFORMANCE_BASELINE.md
CRITICAL_PATHS.md

No optimization is valid unless it is proven semantically equivalent under the definitions in this document.

1. Purpose¶

Phase 3 permits performance changes only if behavior is unchanged.

This document defines: - What “behavior” means in AeroDB - What is observable vs non-observable - What must remain identical - What differences are permitted - How equivalence is evaluated under failures

Equivalence is not similarity.
Equivalence is indistinguishability to all permitted observers.

2. Definition of Semantic Equivalence¶

Two executions are semantically equivalent if and only if:

For all permitted observers, under all allowed failure scenarios, the observable outcomes are identical.

This must hold for: - Successful execution - Partial execution - Crash and recovery - Restart and replay - Replication consumption

Performance differences are irrelevant.

3. Permitted Observers¶

Semantic equivalence is evaluated relative to the following observers:

3.1 Client Observer¶

A client may observe: - Acknowledgment timing (relative to fsync only) - Success or failure - Returned query results - Error codes

A client may NOT observe: - Internal scheduling - Intermediate states - Background activity

3.2 Crash-Recovery Observer¶

The recovery process may observe: - WAL contents - Snapshot contents - Checkpoints - Checksums

Recovery MUST produce identical state.

3.3 Replica Observer¶

A replica may observe: - WAL stream - Snapshot bootstrap data - Commit ordering - Visibility boundaries

Replica-visible state MUST be derivable identically.

3.4 Diagnostic Observer¶

Logs and metrics may observe: - Events - Counters - Timing

Diagnostics MUST NOT influence semantics and are not authoritative for equivalence.

4. Observable State Surface¶

The following are observable and must be equivalent:

4.1 Persistent State¶

WAL records (logical content and ordering)
Snapshot files
Snapshot manifests
Checkpoint markers

Binary equality is not required, but logical equivalence is mandatory.

4.2 Logical Database State¶

Document contents
MVCC version chains
CommitId assignments
Tombstones
Index-visible state

After recovery, state MUST be identical.

4.3 Visibility Semantics¶

Snapshot isolation behavior
Version visibility
Query results

A read must return the same result set.

4.4 Failure Outcomes¶

Which operations succeed
Which operations fail
Which operations are retried
Which operations are rolled back

Failures MUST be detected at the same semantic boundaries.

5. Non-Observable Differences (Permitted)¶

The following differences are allowed only if they are not observable:

Internal buffering
Memory layout
CPU instruction count
Syscall batching
Temporary allocations
Lock granularity (if determinism preserved)

If a difference becomes observable under any failure, it is forbidden.

6. Temporal Considerations¶

6.1 Time Is Not a Semantic Dimension¶

Equivalence MUST NOT depend on: - Wall-clock time - CPU speed - I/O latency - Scheduling fairness

An optimization that relies on “fast enough” behavior is invalid.

6.2 Ordering Is Semantic¶

Ordering of: - WAL records - CommitIds - Visibility transitions

is semantically observable and MUST be preserved.

7. Write Path Equivalence¶

Two executions are equivalent on the write path if:

Acknowledgment occurs after equivalent durability
CommitIds are assigned in the same order
WAL records represent the same logical commits
Partial writes fail identically

Batching is allowed only if it produces the same logical sequence.

8. Read Path Equivalence¶

Two executions are equivalent on the read path if:

The same snapshot sees the same versions
Version chain traversal yields the same result
Queries return identical result sets
Errors occur at the same semantic points

Caching is allowed only if visibility rules are re-applied.

9. MVCC Equivalence¶

MVCC behavior is equivalent if:

Snapshot isolation holds identically
No snapshot observes new or missing versions
CommitId visibility rules are unchanged
Garbage collection effects are identical

Optimizations MUST NOT: - Skip visibility checks - Pre-compute visibility without proof

10. Replication Equivalence¶

Replication behavior is equivalent if:

WAL stream is a strict prefix-equivalent sequence
Snapshots represent the same MVCC cut
Replicas reach the same logical state
Replica restarts behave identically

Any optimization affecting WAL emission MUST prove prefix preservation.

11. Failure Equivalence¶

For each failure point:

Baseline outcome MUST be described
Optimized outcome MUST match
Recovery MUST converge to identical state

Failure equivalence is evaluated at: - Before fsync - During WAL write - After fsync but before acknowledgment - During checkpoint - During snapshot - During replication transfer (if applicable)

12. Equivalence Under Disablement¶

An optimization is equivalent under disablement if:

Data written with optimization enabled can be read with it disabled
WAL replay behavior is identical
No migration or cleanup is required

Disablement MUST NOT alter semantics.

13. Prohibited Equivalence Arguments¶

The following arguments are invalid:

“Equivalent in practice”
“Equivalent under normal load”
“Equivalent assuming no crash”
“Equivalent because tests pass”
“Equivalent because performance improved”

Equivalence is structural, not empirical.

14. Equivalence Proof Obligations¶

Every optimization MUST:

Reference this document explicitly
State which equivalence clauses apply
Prove equivalence clause-by-clause
Include failure equivalence

Missing a clause is proof failure.

15. Final Rule¶

In AeroDB, two executions are equivalent
if no observer can ever tell them apart.

If an observer can distinguish them, the optimization is rejected.

END OF DOCUMENT