Add task detail live execution design

2026-05-26 10:55:16 +08:00
parent 030bce8a60
commit 16347caf5e
1 changed files with 440 additions and 0 deletions
--- a/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md
+++ b/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md
@ -0,0 +1,440 @@
+# Task Detail Live Execution Design
+
+## Purpose
+
+Task detail should be a live execution surface for ordinary users. It should answer "what is Beaver doing now?", "what has already happened?", "what changed because of a tool or agent result?", and "what can I inspect or accept?" without forcing the user to wait for a final answer.
+
+This page is not primarily a developer audit view. It should expose enough execution detail to create confidence, while keeping raw payloads, long tool output, and debug metadata behind progressive disclosure.
+
+## User Experience Principles
+
+- Show progress as a chronological card feed that grows while the task runs.
+- Prefer user-facing explanations over raw internal event names.
+- Show skill selection, tool usage, tool result, agent team activity, artifacts, and final result as first-class cards.
+- Do not expose hidden chain-of-thought. Use brief action summaries such as "Beaver found the relevant files and will now inspect the API response shape."
+- Keep the user oriented with a persistent task header and clear current status.
+- Stop live updates once the task reaches a terminal state, while still allowing manual refresh.
+
+## Page Layout
+
+### Persistent Header
+
+The top header remains visible while scrolling and contains:
+
+- task title
+- task status: open, running, awaiting acceptance, needs revision, closed, abandoned, error, or cancelled
+- current stage label
+- elapsed time
+- compact progress summary
+- link back to task list
+- link to source conversation
+- acceptance entry point when a run is ready for review
+
+### Main Timeline
+
+The main column is a chronological card feed. Cards append as execution events arrive.
+
+Expected card sequence:
+
+1. task created
+2. planning started or completed
+3. skill selected
+4. tool call started
+5. tool call finished
+6. model next step
+7. agent team created
+8. sub-agent started
+9. sub-agent progress
+10. agent handoff
+11. sub-agent finished
+12. artifact created
+13. result ready
+14. acceptance recorded
+
+Cards should visually appear in order and keep enough prior context visible so the page feels like a live work log rather than a static report.
+
+### Side Rail
+
+The side rail contains compact, always-accessible context:
+
+- agent team map
+- currently active agent or tool
+- artifacts list
+- latest warning or blocked state
+- acceptance state
+
+On small screens, the side rail collapses below the header or into tabs.
+
+## Card Types
+
+### Task Created Card
+
+Shows that Beaver recognized the user message as a task.
+
+Fields:
+
+- task goal
+- source session
+- created time
+- initial status
+
+### Plan Card
+
+Shows the execution approach.
+
+Fields:
+
+- mode: single agent or agent team
+- planned stages
+- attempt index
+- strategy summary
+
+### Skill Card
+
+Shows which skill Beaver selected and why it matters.
+
+Fields:
+
+- skill name
+- skill version if available
+- user-facing reason
+- capabilities or method guidance summary
+
+If multiple skills are selected, render one grouped card with individual rows.
+
+### Tool Call Card
+
+Shows that Beaver is using a tool.
+
+Fields:
+
+- tool name
+- action summary
+- actor name
+- status: running, done, failed
+- started time
+- duration if completed
+
+Raw tool arguments are hidden by default.
+
+### Tool Result Card
+
+Shows what the tool found or produced.
+
+Fields:
+
+- success or failure
+- result summary
+- error message if any
+- links to artifact or output
+- expandable raw result
+
+### Next Step Card
+
+Shows Beaver's next user-visible action after interpreting a result.
+
+Fields:
+
+- short action explanation
+- related prior card or run
+- expected next event type when known
+
+This card must not contain private reasoning traces.
+
+### Agent Team Card
+
+Shows that Beaver created a multi-agent team.
+
+Fields:
+
+- team strategy
+- agent count
+- dependency shape
+- agent names and assigned tasks
+
+### Sub-Agent Card
+
+Shows progress from an individual agent.
+
+Fields:
+
+- agent name
+- assigned task
+- current status
+- progress text
+- latest output summary
+
+### Agent Handoff Card
+
+Shows interaction between agents.
+
+Fields:
+
+- source agent
+- target agent
+- handoff reason
+- summary of transferred result
+
+### Artifact Card
+
+Shows an output created during execution.
+
+Fields:
+
+- artifact title
+- artifact type
+- source agent or run
+- created time
+- open or download action
+- summary or preview where safe
+
+### Error or Blocked Card
+
+Shows that execution hit a problem.
+
+Fields:
+
+- problem summary
+- affected stage or tool
+- whether Beaver can continue automatically
+- action required from user if any
+
+### Final Result Card
+
+Shows the result that the user can review.
+
+Fields:
+
+- final answer or result summary
+- important artifacts
+- validation or evidence status when available
+- accept, revise, and abandon actions
+
+## Realtime Behavior
+
+### Live Updates
+
+The page should subscribe to task-related process events while the task is active. The following updates should append or update cards in real time:
+
+- skill selected
+- tool call started
+- tool call finished
+- agent team created
+- sub-agent started
+- sub-agent progress
+- sub-agent finished
+- agent handoff
+- artifact created
+- task result ready
+- task error or blocked state
+- acceptance recorded
+
+### Initial Load
+
+On page load, call `GET /api/tasks/{task_id}` and hydrate:
+
+- task metadata
+- lifecycle events
+- process runs
+- process events
+- process artifacts
+- readable run messages
+- existing feedback
+
+The frontend should build the initial card feed from these persisted records so a refreshed page reconstructs the same execution timeline.
+
+### Fallback Polling
+
+If WebSocket updates are unavailable, active tasks should poll `GET /api/tasks/{task_id}` every 3 to 5 seconds.
+
+Polling stops when the task reaches a terminal state:
+
+- closed
+- abandoned
+- cancelled
+- error
+
+Manual refresh remains available.
+
+### Large Content Loading
+
+The following content should not be loaded or expanded by default:
+
+- raw tool arguments
+- full tool output
+- raw process event payloads
+- full transcript
+- memory retrieval trace
+- debug metadata
+
+These belong behind "show details" controls or a later advanced view.
+
+## Backend Event Contract
+
+The existing task detail API already exposes useful primitives:
+
+- `process_runs`
+- `process_events`
+- `process_artifacts`
+- `runs`
+- `events`
+- `skill_names`
+- task metadata and feedback
+
+For a reliable user-facing timeline, backend events should become more explicit. Recommended event kinds:
+
+- `task_created`
+- `task_planned`
+- `skill_selected`
+- `tool_call_started`
+- `tool_call_finished`
+- `agent_team_created`
+- `agent_started`
+- `agent_progress`
+- `agent_handoff`
+- `agent_finished`
+- `artifact_created`
+- `task_result_ready`
+- `task_acceptance_recorded`
+- `task_error`
+
+Each event should include:
+
+- `event_id`
+- `task_id`
+- `run_id` when applicable
+- `parent_run_id` when applicable
+- `actor_type`
+- `actor_name`
+- `kind`
+- `status`
+- `text`
+- `created_at`
+- compact `metadata`
+
+Metadata should contain structured fields for rendering, not only raw provider or tool payloads.
+
+## Frontend Rendering Model
+
+The frontend should normalize events into a `TaskTimelineCard` view model.
+
+Recommended fields:
+
+```ts
+type TaskTimelineCard = {
+  id: string;
+  taskId: string;
+  runId?: string | null;
+  parentRunId?: string | null;
+  type:
+    | 'task_created'
+    | 'plan'
+    | 'skill'
+    | 'tool_call'
+    | 'tool_result'
+    | 'next_step'
+    | 'agent_team'
+    | 'agent_progress'
+    | 'agent_handoff'
+    | 'artifact'
+    | 'error'
+    | 'result'
+    | 'acceptance';
+  title: string;
+  summary?: string;
+  actorName?: string;
+  status?: string;
+  createdAt: string;
+  relatedArtifactIds?: string[];
+  details?: Record<string, unknown>;
+};
+```
+
+This keeps rendering stable even if backend event payloads evolve.
+
+## Empty, Loading, and Error States
+
+### No Events Yet
+
+Show a task created card and a running placeholder:
+
+"Beaver is preparing the first step."
+
+### Waiting on Tool
+
+Show the active tool call card with a spinner and elapsed time.
+
+### Waiting on Agent
+
+Show the active agent card with its assigned task and current status.
+
+### Failed Tool
+
+Show an error card with a concise reason and whether Beaver is retrying or changing approach.
+
+### Lost Connection
+
+Keep existing cards visible and show a small reconnecting indicator. If reconnect fails, fall back to polling.
+
+## Acceptance Flow
+
+The final result card is the primary acceptance surface.
+
+Actions:
+
+- Accept: closes the task and can trigger skill learning.
+- Needs revision: requires a comment, appends a new revision card, and starts another attempt in the same timeline.
+- Abandon: closes the task as abandoned and preserves the execution history.
+
+After any acceptance action, the page should immediately update local UI state and refetch the task detail.
+
+## V1 Scope
+
+V1 includes:
+
+- persistent task header
+- live chronological card feed
+- skill cards
+- tool call and result cards
+- agent team card
+- sub-agent progress cards
+- artifact cards
+- final result and acceptance card
+- WebSocket-first updates with polling fallback
+- collapsed raw details
+
+V1 excludes:
+
+- full administrator audit mode
+- memory retrieval graph visualization
+- raw provider request/response viewer
+- advanced event payload debugger
+- editable task graph
+
+## Implementation Notes
+
+The existing `tasks/[taskId]/page.tsx` already has useful pieces, but the main hierarchy should shift from phase groups and selected node detail to a timeline-first experience.
+
+Likely frontend modules:
+
+- `TaskLiveHeader`
+- `TaskTimeline`
+- `TaskTimelineCard`
+- `TaskSideRail`
+- `TaskAcceptanceCard`
+- `buildTaskTimelineCards`
+
+Likely backend work:
+
+- emit explicit process events for skill selection and tool calls
+- include user-facing text summaries in event metadata
+- ensure task detail reconstruction uses persisted events
+- expose enough run and actor metadata for agent team rendering
+
+## Self-Review
+
+- No placeholders remain.
+- The design is scoped to ordinary-user task detail, not admin audit.
+- Realtime requirements distinguish live updates from expandable heavy details.
+- Backend event requirements are explicit enough for frontend implementation.
+- V1 scope avoids memory graph and debug payload work.