diff --git a/docs/superpowers/specs/2026-05-22-task-evidence-validation-design.md b/docs/superpowers/specs/2026-05-22-task-evidence-validation-design.md index e3d260e..5bd54cc 100644 --- a/docs/superpowers/specs/2026-05-22-task-evidence-validation-design.md +++ b/docs/superpowers/specs/2026-05-22-task-evidence-validation-design.md @@ -49,6 +49,13 @@ evidence_gaps: list[str] recommended_revision_prompt: str ``` +`status` is the business decision field. `passed` is a compatibility boolean derived from `status`, not an independent source of truth. The mapping is: + +- `status == "accepted"` -> `passed=True` +- `status in {"rejected", "insufficient_evidence", "validator_error"}` -> `passed=False` + +Task mode, retry, and status transition logic must branch on `status`. `accepted` remains `passed and score >= 0.75` for backward compatibility, but new code should not infer failure from `passed=False` alone. + Rules: - `accepted`: the final answer is supported by available evidence and satisfies the task. The task enters `awaiting_feedback`. @@ -70,6 +77,14 @@ Task statuses: `needs_review` remains an open status for the active task API, but the UI should distinguish it from `running`. `failed`, `closed`, and `abandoned` are terminal. +Open status does not mean auto-runnable. The backend should split status semantics: + +- `is_open`: the task can still receive user feedback or revision. +- `is_execution_active`: the backend is currently running or validating work. +- `requires_user_action`: the task has stopped automatic execution and needs user input. + +`needs_review` should have `is_open=True`, `is_execution_active=False`, and `requires_user_action=True`. Schedulers, automatic retry loops, and active-task polling must not treat `needs_review` as a reason to continue execution. It should appear in the active task API only so the user can review, mark satisfied, revise, or abandon. + User feedback is authoritative: - `satisfied` closes the task.