Files
beaver_project/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md

441 lines
9.9 KiB
Markdown

# Task Detail Live Execution Design
## Purpose
Task detail should be a live execution surface for ordinary users. It should answer "what is Beaver doing now?", "what has already happened?", "what changed because of a tool or agent result?", and "what can I inspect or accept?" without forcing the user to wait for a final answer.
This page is not primarily a developer audit view. It should expose enough execution detail to create confidence, while keeping raw payloads, long tool output, and debug metadata behind progressive disclosure.
## User Experience Principles
- Show progress as a chronological card feed that grows while the task runs.
- Prefer user-facing explanations over raw internal event names.
- Show skill selection, tool usage, tool result, agent team activity, artifacts, and final result as first-class cards.
- Do not expose hidden chain-of-thought. Use brief action summaries such as "Beaver found the relevant files and will now inspect the API response shape."
- Keep the user oriented with a persistent task header and clear current status.
- Stop live updates once the task reaches a terminal state, while still allowing manual refresh.
## Page Layout
### Persistent Header
The top header remains visible while scrolling and contains:
- task title
- task status: open, running, awaiting acceptance, needs revision, closed, abandoned, error, or cancelled
- current stage label
- elapsed time
- compact progress summary
- link back to task list
- link to source conversation
- acceptance entry point when a run is ready for review
### Main Timeline
The main column is a chronological card feed. Cards append as execution events arrive.
Expected card sequence:
1. task created
2. planning started or completed
3. skill selected
4. tool call started
5. tool call finished
6. model next step
7. agent team created
8. sub-agent started
9. sub-agent progress
10. agent handoff
11. sub-agent finished
12. artifact created
13. result ready
14. acceptance recorded
Cards should visually appear in order and keep enough prior context visible so the page feels like a live work log rather than a static report.
### Side Rail
The side rail contains compact, always-accessible context:
- agent team map
- currently active agent or tool
- artifacts list
- latest warning or blocked state
- acceptance state
On small screens, the side rail collapses below the header or into tabs.
## Card Types
### Task Created Card
Shows that Beaver recognized the user message as a task.
Fields:
- task goal
- source session
- created time
- initial status
### Plan Card
Shows the execution approach.
Fields:
- mode: single agent or agent team
- planned stages
- attempt index
- strategy summary
### Skill Card
Shows which skill Beaver selected and why it matters.
Fields:
- skill name
- skill version if available
- user-facing reason
- capabilities or method guidance summary
If multiple skills are selected, render one grouped card with individual rows.
### Tool Call Card
Shows that Beaver is using a tool.
Fields:
- tool name
- action summary
- actor name
- status: running, done, failed
- started time
- duration if completed
Raw tool arguments are hidden by default.
### Tool Result Card
Shows what the tool found or produced.
Fields:
- success or failure
- result summary
- error message if any
- links to artifact or output
- expandable raw result
### Next Step Card
Shows Beaver's next user-visible action after interpreting a result.
Fields:
- short action explanation
- related prior card or run
- expected next event type when known
This card must not contain private reasoning traces.
### Agent Team Card
Shows that Beaver created a multi-agent team.
Fields:
- team strategy
- agent count
- dependency shape
- agent names and assigned tasks
### Sub-Agent Card
Shows progress from an individual agent.
Fields:
- agent name
- assigned task
- current status
- progress text
- latest output summary
### Agent Handoff Card
Shows interaction between agents.
Fields:
- source agent
- target agent
- handoff reason
- summary of transferred result
### Artifact Card
Shows an output created during execution.
Fields:
- artifact title
- artifact type
- source agent or run
- created time
- open or download action
- summary or preview where safe
### Error or Blocked Card
Shows that execution hit a problem.
Fields:
- problem summary
- affected stage or tool
- whether Beaver can continue automatically
- action required from user if any
### Final Result Card
Shows the result that the user can review.
Fields:
- final answer or result summary
- important artifacts
- validation or evidence status when available
- accept, revise, and abandon actions
## Realtime Behavior
### Live Updates
The page should subscribe to task-related process events while the task is active. The following updates should append or update cards in real time:
- skill selected
- tool call started
- tool call finished
- agent team created
- sub-agent started
- sub-agent progress
- sub-agent finished
- agent handoff
- artifact created
- task result ready
- task error or blocked state
- acceptance recorded
### Initial Load
On page load, call `GET /api/tasks/{task_id}` and hydrate:
- task metadata
- lifecycle events
- process runs
- process events
- process artifacts
- readable run messages
- existing feedback
The frontend should build the initial card feed from these persisted records so a refreshed page reconstructs the same execution timeline.
### Fallback Polling
If WebSocket updates are unavailable, active tasks should poll `GET /api/tasks/{task_id}` every 3 to 5 seconds.
Polling stops when the task reaches a terminal state:
- closed
- abandoned
- cancelled
- error
Manual refresh remains available.
### Large Content Loading
The following content should not be loaded or expanded by default:
- raw tool arguments
- full tool output
- raw process event payloads
- full transcript
- memory retrieval trace
- debug metadata
These belong behind "show details" controls or a later advanced view.
## Backend Event Contract
The existing task detail API already exposes useful primitives:
- `process_runs`
- `process_events`
- `process_artifacts`
- `runs`
- `events`
- `skill_names`
- task metadata and feedback
For a reliable user-facing timeline, backend events should become more explicit. Recommended event kinds:
- `task_created`
- `task_planned`
- `skill_selected`
- `tool_call_started`
- `tool_call_finished`
- `agent_team_created`
- `agent_started`
- `agent_progress`
- `agent_handoff`
- `agent_finished`
- `artifact_created`
- `task_result_ready`
- `task_acceptance_recorded`
- `task_error`
Each event should include:
- `event_id`
- `task_id`
- `run_id` when applicable
- `parent_run_id` when applicable
- `actor_type`
- `actor_name`
- `kind`
- `status`
- `text`
- `created_at`
- compact `metadata`
Metadata should contain structured fields for rendering, not only raw provider or tool payloads.
## Frontend Rendering Model
The frontend should normalize events into a `TaskTimelineCard` view model.
Recommended fields:
```ts
type TaskTimelineCard = {
id: string;
taskId: string;
runId?: string | null;
parentRunId?: string | null;
type:
| 'task_created'
| 'plan'
| 'skill'
| 'tool_call'
| 'tool_result'
| 'next_step'
| 'agent_team'
| 'agent_progress'
| 'agent_handoff'
| 'artifact'
| 'error'
| 'result'
| 'acceptance';
title: string;
summary?: string;
actorName?: string;
status?: string;
createdAt: string;
relatedArtifactIds?: string[];
details?: Record<string, unknown>;
};
```
This keeps rendering stable even if backend event payloads evolve.
## Empty, Loading, and Error States
### No Events Yet
Show a task created card and a running placeholder:
"Beaver is preparing the first step."
### Waiting on Tool
Show the active tool call card with a spinner and elapsed time.
### Waiting on Agent
Show the active agent card with its assigned task and current status.
### Failed Tool
Show an error card with a concise reason and whether Beaver is retrying or changing approach.
### Lost Connection
Keep existing cards visible and show a small reconnecting indicator. If reconnect fails, fall back to polling.
## Acceptance Flow
The final result card is the primary acceptance surface.
Actions:
- Accept: closes the task and can trigger skill learning.
- Needs revision: requires a comment, appends a new revision card, and starts another attempt in the same timeline.
- Abandon: closes the task as abandoned and preserves the execution history.
After any acceptance action, the page should immediately update local UI state and refetch the task detail.
## V1 Scope
V1 includes:
- persistent task header
- live chronological card feed
- skill cards
- tool call and result cards
- agent team card
- sub-agent progress cards
- artifact cards
- final result and acceptance card
- WebSocket-first updates with polling fallback
- collapsed raw details
V1 excludes:
- full administrator audit mode
- memory retrieval graph visualization
- raw provider request/response viewer
- advanced event payload debugger
- editable task graph
## Implementation Notes
The existing `tasks/[taskId]/page.tsx` already has useful pieces, but the main hierarchy should shift from phase groups and selected node detail to a timeline-first experience.
Likely frontend modules:
- `TaskLiveHeader`
- `TaskTimeline`
- `TaskTimelineCard`
- `TaskSideRail`
- `TaskAcceptanceCard`
- `buildTaskTimelineCards`
Likely backend work:
- emit explicit process events for skill selection and tool calls
- include user-facing text summaries in event metadata
- ensure task detail reconstruction uses persisted events
- expose enough run and actor metadata for agent team rendering
## Self-Review
- No placeholders remain.
- The design is scoped to ordinary-user task detail, not admin audit.
- Realtime requirements distinguish live updates from expandable heavy details.
- Backend event requirements are explicit enough for frontend implementation.
- V1 scope avoids memory graph and debug payload work.