PHASE 7 FAILURE MODEL — CONTROL PLANE CRASH, ERROR, AND RECOVERY SEMANTICS¶
Project: AeroDB Phase: Phase 7 — Control Plane, Admin UI, and Operator Tooling Status: AUTHORITATIVE · NON-NEGOTIABLE
1. Purpose of This Document¶
This document defines the failure model for Phase 7.
Phase 7 failures are fundamentally different from data-plane failures:
- They MUST NOT affect data correctness
- They MUST NOT alter kernel state implicitly
- They MUST NOT create ambiguity about what has or has not executed
This document ensures that control-plane failure cannot become system failure.
2. Failure Philosophy¶
When the control plane fails, the system must remain unchanged.
Phase 7 failures are handled by:
- Explicit rejection
- Explicit termination
- Explicit operator re-issuance
Phase 7 never attempts to repair, retry, or complete an action after failure.
3. Failure Domains¶
Phase 7 defines the following failure domains:
- Client-side failures (UI / CLI)
- Control-plane process failures
- Network and transport failures
- Kernel rejection or failure
- Operator-induced errors
Each domain is handled explicitly and deterministically.
4. Client-Side Failures (UI / CLI)¶
4.1 Crash or Termination Before Request Submission¶
If the UI or CLI crashes before a request is submitted:
- No request reaches the kernel
- No state change occurs
- No audit record of execution is created
Outcome:
- System state is unchanged
- Operator must reissue the request
4.2 Crash After Request Submission, Before Confirmation¶
If a mutating action requires confirmation and the client crashes before confirmation:
- The request MUST NOT be executed
- No kernel action occurs
Outcome:
- Action is discarded
- Operator must restart and reissue
4.3 Client Crash After Confirmation, Before Response¶
If the client crashes after confirmation but before receiving a response:
- The execution outcome MUST be determined from kernel state
- The control plane MUST NOT guess
Rules:
- If kernel state reflects execution → action succeeded
- Otherwise → action did not occur
Client-side uncertainty does not alter system truth.
5. Control-Plane Process Failures¶
5.1 Crash Before Kernel Request Dispatch¶
If the control-plane process crashes before forwarding a request:
- No kernel request occurs
- No state change occurs
Outcome:
- Action did not execute
5.2 Crash During Kernel Request Dispatch¶
If the control-plane process crashes while dispatching a request:
- The request is treated as not executed unless kernel durability proves otherwise
Rules:
- No assumptions are allowed
- Kernel state is authoritative
5.3 Crash After Kernel Acknowledgement¶
If the control-plane process crashes after the kernel acknowledges execution:
- The action is considered executed
- Recovery MUST rely on kernel state only
The control plane does not attempt replay.
6. Network and Transport Failures¶
6.1 Request Transmission Failure¶
If a request fails to reach the kernel:
- The action MUST NOT execute
Outcome:
- Operator must reissue
6.2 Duplicate Requests¶
If duplicate requests are received:
Rules:
- The control plane MUST detect duplicates where possible
- Otherwise, the kernel MUST reject duplicates deterministically
Silent duplicate execution is forbidden.
7. Kernel Rejection and Failure¶
7.1 Kernel Rejection¶
If the kernel rejects a request:
- The rejection MUST be surfaced verbatim
- The control plane MUST NOT reinterpret or retry
Kernel rejection is final.
7.2 Kernel Failure¶
If the kernel fails (crash, fatal error) during request handling:
- The control plane MUST treat the request as not executed unless kernel durability proves otherwise
Operator must investigate kernel state explicitly.
8. Operator-Induced Errors¶
8.1 Invalid Requests¶
If an operator issues an invalid request:
- The request is rejected
- The rejection must include explanation
No attempt is made to auto-correct input.
8.2 Abandoned Operations¶
If an operator abandons an operation mid-flow:
- No partial execution may occur
- No cleanup action is triggered automatically
The system remains unchanged.
9. Failure Visibility and Audit¶
All failures MUST:
- Be observable
- Be logged
- Be explainable
Audit records MUST include:
- Failure type
- Time
- Affected request
- Outcome (executed / not executed)
10. Recovery Rules¶
Phase 7 recovery is trivial by design.
Rules:
- No in-flight state is recovered
- No partial execution is completed
- No retries are attempted
Recovery means restart and resume observation only.
11. Explicit Non-Recovery¶
Phase 7 MUST NOT:
- Attempt rollback
- Attempt roll-forward
- Attempt reconciliation
Those concepts belong to the kernel only.
12. Testing Requirements¶
Phase 7 failure handling MUST be tested for:
- Client crash scenarios
- Control-plane crash scenarios
- Network failure scenarios
- Kernel rejection propagation
Tests MUST assert:
- No partial execution
- No hidden retries
- Deterministic outcomes
13. Final Statement¶
Phase 7 failure handling exists to protect correctness by doing nothing.
It prefers:
- Rejection over uncertainty
- Restart over repair
- Human re-issuance over automation
If the control plane fails, the safest action is inaction.
This failure model is absolute.
END OF PHASE 7 FAILURE MODEL