diff --git a/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md b/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md new file mode 100644 index 0000000..a6f6855 --- /dev/null +++ b/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md @@ -0,0 +1,440 @@ +# Task Detail Live Execution Design + +## Purpose + +Task detail should be a live execution surface for ordinary users. It should answer "what is Beaver doing now?", "what has already happened?", "what changed because of a tool or agent result?", and "what can I inspect or accept?" without forcing the user to wait for a final answer. + +This page is not primarily a developer audit view. It should expose enough execution detail to create confidence, while keeping raw payloads, long tool output, and debug metadata behind progressive disclosure. + +## User Experience Principles + +- Show progress as a chronological card feed that grows while the task runs. +- Prefer user-facing explanations over raw internal event names. +- Show skill selection, tool usage, tool result, agent team activity, artifacts, and final result as first-class cards. +- Do not expose hidden chain-of-thought. Use brief action summaries such as "Beaver found the relevant files and will now inspect the API response shape." +- Keep the user oriented with a persistent task header and clear current status. +- Stop live updates once the task reaches a terminal state, while still allowing manual refresh. + +## Page Layout + +### Persistent Header + +The top header remains visible while scrolling and contains: + +- task title +- task status: open, running, awaiting acceptance, needs revision, closed, abandoned, error, or cancelled +- current stage label +- elapsed time +- compact progress summary +- link back to task list +- link to source conversation +- acceptance entry point when a run is ready for review + +### Main Timeline + +The main column is a chronological card feed. Cards append as execution events arrive. + +Expected card sequence: + +1. task created +2. planning started or completed +3. skill selected +4. tool call started +5. tool call finished +6. model next step +7. agent team created +8. sub-agent started +9. sub-agent progress +10. agent handoff +11. sub-agent finished +12. artifact created +13. result ready +14. acceptance recorded + +Cards should visually appear in order and keep enough prior context visible so the page feels like a live work log rather than a static report. + +### Side Rail + +The side rail contains compact, always-accessible context: + +- agent team map +- currently active agent or tool +- artifacts list +- latest warning or blocked state +- acceptance state + +On small screens, the side rail collapses below the header or into tabs. + +## Card Types + +### Task Created Card + +Shows that Beaver recognized the user message as a task. + +Fields: + +- task goal +- source session +- created time +- initial status + +### Plan Card + +Shows the execution approach. + +Fields: + +- mode: single agent or agent team +- planned stages +- attempt index +- strategy summary + +### Skill Card + +Shows which skill Beaver selected and why it matters. + +Fields: + +- skill name +- skill version if available +- user-facing reason +- capabilities or method guidance summary + +If multiple skills are selected, render one grouped card with individual rows. + +### Tool Call Card + +Shows that Beaver is using a tool. + +Fields: + +- tool name +- action summary +- actor name +- status: running, done, failed +- started time +- duration if completed + +Raw tool arguments are hidden by default. + +### Tool Result Card + +Shows what the tool found or produced. + +Fields: + +- success or failure +- result summary +- error message if any +- links to artifact or output +- expandable raw result + +### Next Step Card + +Shows Beaver's next user-visible action after interpreting a result. + +Fields: + +- short action explanation +- related prior card or run +- expected next event type when known + +This card must not contain private reasoning traces. + +### Agent Team Card + +Shows that Beaver created a multi-agent team. + +Fields: + +- team strategy +- agent count +- dependency shape +- agent names and assigned tasks + +### Sub-Agent Card + +Shows progress from an individual agent. + +Fields: + +- agent name +- assigned task +- current status +- progress text +- latest output summary + +### Agent Handoff Card + +Shows interaction between agents. + +Fields: + +- source agent +- target agent +- handoff reason +- summary of transferred result + +### Artifact Card + +Shows an output created during execution. + +Fields: + +- artifact title +- artifact type +- source agent or run +- created time +- open or download action +- summary or preview where safe + +### Error or Blocked Card + +Shows that execution hit a problem. + +Fields: + +- problem summary +- affected stage or tool +- whether Beaver can continue automatically +- action required from user if any + +### Final Result Card + +Shows the result that the user can review. + +Fields: + +- final answer or result summary +- important artifacts +- validation or evidence status when available +- accept, revise, and abandon actions + +## Realtime Behavior + +### Live Updates + +The page should subscribe to task-related process events while the task is active. The following updates should append or update cards in real time: + +- skill selected +- tool call started +- tool call finished +- agent team created +- sub-agent started +- sub-agent progress +- sub-agent finished +- agent handoff +- artifact created +- task result ready +- task error or blocked state +- acceptance recorded + +### Initial Load + +On page load, call `GET /api/tasks/{task_id}` and hydrate: + +- task metadata +- lifecycle events +- process runs +- process events +- process artifacts +- readable run messages +- existing feedback + +The frontend should build the initial card feed from these persisted records so a refreshed page reconstructs the same execution timeline. + +### Fallback Polling + +If WebSocket updates are unavailable, active tasks should poll `GET /api/tasks/{task_id}` every 3 to 5 seconds. + +Polling stops when the task reaches a terminal state: + +- closed +- abandoned +- cancelled +- error + +Manual refresh remains available. + +### Large Content Loading + +The following content should not be loaded or expanded by default: + +- raw tool arguments +- full tool output +- raw process event payloads +- full transcript +- memory retrieval trace +- debug metadata + +These belong behind "show details" controls or a later advanced view. + +## Backend Event Contract + +The existing task detail API already exposes useful primitives: + +- `process_runs` +- `process_events` +- `process_artifacts` +- `runs` +- `events` +- `skill_names` +- task metadata and feedback + +For a reliable user-facing timeline, backend events should become more explicit. Recommended event kinds: + +- `task_created` +- `task_planned` +- `skill_selected` +- `tool_call_started` +- `tool_call_finished` +- `agent_team_created` +- `agent_started` +- `agent_progress` +- `agent_handoff` +- `agent_finished` +- `artifact_created` +- `task_result_ready` +- `task_acceptance_recorded` +- `task_error` + +Each event should include: + +- `event_id` +- `task_id` +- `run_id` when applicable +- `parent_run_id` when applicable +- `actor_type` +- `actor_name` +- `kind` +- `status` +- `text` +- `created_at` +- compact `metadata` + +Metadata should contain structured fields for rendering, not only raw provider or tool payloads. + +## Frontend Rendering Model + +The frontend should normalize events into a `TaskTimelineCard` view model. + +Recommended fields: + +```ts +type TaskTimelineCard = { + id: string; + taskId: string; + runId?: string | null; + parentRunId?: string | null; + type: + | 'task_created' + | 'plan' + | 'skill' + | 'tool_call' + | 'tool_result' + | 'next_step' + | 'agent_team' + | 'agent_progress' + | 'agent_handoff' + | 'artifact' + | 'error' + | 'result' + | 'acceptance'; + title: string; + summary?: string; + actorName?: string; + status?: string; + createdAt: string; + relatedArtifactIds?: string[]; + details?: Record; +}; +``` + +This keeps rendering stable even if backend event payloads evolve. + +## Empty, Loading, and Error States + +### No Events Yet + +Show a task created card and a running placeholder: + +"Beaver is preparing the first step." + +### Waiting on Tool + +Show the active tool call card with a spinner and elapsed time. + +### Waiting on Agent + +Show the active agent card with its assigned task and current status. + +### Failed Tool + +Show an error card with a concise reason and whether Beaver is retrying or changing approach. + +### Lost Connection + +Keep existing cards visible and show a small reconnecting indicator. If reconnect fails, fall back to polling. + +## Acceptance Flow + +The final result card is the primary acceptance surface. + +Actions: + +- Accept: closes the task and can trigger skill learning. +- Needs revision: requires a comment, appends a new revision card, and starts another attempt in the same timeline. +- Abandon: closes the task as abandoned and preserves the execution history. + +After any acceptance action, the page should immediately update local UI state and refetch the task detail. + +## V1 Scope + +V1 includes: + +- persistent task header +- live chronological card feed +- skill cards +- tool call and result cards +- agent team card +- sub-agent progress cards +- artifact cards +- final result and acceptance card +- WebSocket-first updates with polling fallback +- collapsed raw details + +V1 excludes: + +- full administrator audit mode +- memory retrieval graph visualization +- raw provider request/response viewer +- advanced event payload debugger +- editable task graph + +## Implementation Notes + +The existing `tasks/[taskId]/page.tsx` already has useful pieces, but the main hierarchy should shift from phase groups and selected node detail to a timeline-first experience. + +Likely frontend modules: + +- `TaskLiveHeader` +- `TaskTimeline` +- `TaskTimelineCard` +- `TaskSideRail` +- `TaskAcceptanceCard` +- `buildTaskTimelineCards` + +Likely backend work: + +- emit explicit process events for skill selection and tool calls +- include user-facing text summaries in event metadata +- ensure task detail reconstruction uses persisted events +- expose enough run and actor metadata for agent team rendering + +## Self-Review + +- No placeholders remain. +- The design is scoped to ordinary-user task detail, not admin audit. +- Realtime requirements distinguish live updates from expandable heavy details. +- Backend event requirements are explicit enough for frontend implementation. +- V1 scope avoids memory graph and debug payload work.