beaver_project/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md

# Task Detail Live Execution Design

## Purpose

Task detail should be a live execution surface for ordinary users. It should answer "what is Beaver doing now?", "what has already happened?", "what changed because of a tool or agent result?", and "what can I inspect or accept?" without forcing the user to wait for a final answer.

This page is not primarily a developer audit view. It should expose enough execution detail to create confidence, while keeping raw payloads, long tool output, and debug metadata behind progressive disclosure.

## User Experience Principles

- Show progress as a chronological card feed that grows while the task runs.
- Prefer user-facing explanations over raw internal event names.
- Show skill selection, tool usage, tool result, agent team activity, artifacts, and final result as first-class cards.
- Do not expose hidden chain-of-thought. Use brief action summaries such as "Beaver found the relevant files and will now inspect the API response shape."
- Keep the user oriented with a persistent task header and clear current status.
- Stop live updates once the task reaches a terminal state, while still allowing manual refresh.

## Page Layout

### Persistent Header

The top header remains visible while scrolling and contains:

- task title
- task status: open, running, awaiting acceptance, needs revision, closed, abandoned, error, or cancelled
- current stage label
- elapsed time
- compact progress summary
- link back to task list
- link to source conversation
- acceptance entry point when a run is ready for review

### Main Timeline

The main column is a chronological card feed. Cards append as execution events arrive.

Expected card sequence:

1. task created
2. planning started or completed
3. skill selected
4. tool call started
5. tool call finished
6. model next step
7. agent team created
8. sub-agent started
9. sub-agent progress
10. agent handoff
11. sub-agent finished
12. artifact created
13. result ready
14. acceptance recorded

Cards should visually appear in order and keep enough prior context visible so the page feels like a live work log rather than a static report.

### Side Rail

The side rail contains compact, always-accessible context:

- agent team map
- currently active agent or tool
- artifacts list
- latest warning or blocked state
- acceptance state

On small screens, the side rail collapses below the header or into tabs.

## Card Types

### Task Created Card

Shows that Beaver recognized the user message as a task.

Fields:

- task goal
- source session
- created time
- initial status

### Plan Card

Shows the execution approach.

Fields:

- mode: single agent or agent team
- planned stages
- attempt index
- strategy summary

### Skill Card

Shows which skill Beaver selected and why it matters.

Fields:

- skill name
- skill version if available
- user-facing reason
- capabilities or method guidance summary

If multiple skills are selected, render one grouped card with individual rows.

### Tool Call Card

Shows that Beaver is using a tool.

Fields:

- tool name
- action summary
- actor name
- status: running, done, failed
- started time
- duration if completed

Raw tool arguments are hidden by default.

### Tool Result Card

Shows what the tool found or produced.

Fields:

- success or failure
- result summary
- error message if any
- links to artifact or output
- expandable raw result

### Next Step Card

Shows Beaver's next user-visible action after interpreting a result.

Fields:

- short action explanation
- related prior card or run
- expected next event type when known

This card must not contain private reasoning traces.

### Agent Team Card

Shows that Beaver created a multi-agent team.

Fields:

- team strategy
- agent count
- dependency shape
- agent names and assigned tasks

### Sub-Agent Card

Shows progress from an individual agent.

Fields:

- agent name
- assigned task
- current status
- progress text
- latest output summary

### Agent Handoff Card

Shows interaction between agents.

Fields:

- source agent
- target agent
- handoff reason
- summary of transferred result

### Artifact Card

Shows an output created during execution.

Fields:

- artifact title
- artifact type
- source agent or run
- created time
- open or download action
- summary or preview where safe

### Error or Blocked Card

Shows that execution hit a problem.

Fields:

- problem summary
- affected stage or tool
- whether Beaver can continue automatically
- action required from user if any

### Final Result Card

Shows the result that the user can review.

Fields:

- final answer or result summary
- important artifacts
- validation or evidence status when available
- accept, revise, and abandon actions

## Realtime Behavior

### Live Updates

The page should subscribe to task-related process events while the task is active. The following updates should append or update cards in real time:

- skill selected
- tool call started
- tool call finished
- agent team created
- sub-agent started
- sub-agent progress
- sub-agent finished
- agent handoff
- artifact created
- task result ready
- task error or blocked state
- acceptance recorded

### Initial Load

On page load, call `GET /api/tasks/{task_id}` and hydrate:

- task metadata
- lifecycle events
- process runs
- process events
- process artifacts
- readable run messages
- existing feedback

The frontend should build the initial card feed from these persisted records so a refreshed page reconstructs the same execution timeline.

### Fallback Polling

If WebSocket updates are unavailable, active tasks should poll `GET /api/tasks/{task_id}` every 3 to 5 seconds.

Polling stops when the task reaches a terminal state:

- closed
- abandoned
- cancelled
- error

Manual refresh remains available.

### Large Content Loading

The following content should not be loaded or expanded by default:

- raw tool arguments
- full tool output
- raw process event payloads
- full transcript
- memory retrieval trace
- debug metadata

These belong behind "show details" controls or a later advanced view.

## Backend Event Contract

The existing task detail API already exposes useful primitives:

- `process_runs`
- `process_events`
- `process_artifacts`
- `runs`
- `events`
- `skill_names`
- task metadata and feedback

For a reliable user-facing timeline, backend events should become more explicit. Recommended event kinds:

- `task_created`
- `task_planned`
- `skill_selected`
- `tool_call_started`
- `tool_call_finished`
- `agent_team_created`
- `agent_started`
- `agent_progress`
- `agent_handoff`
- `agent_finished`
- `artifact_created`
- `task_result_ready`
- `task_acceptance_recorded`
- `task_error`

Each event should include:

- `event_id`
- `task_id`
- `run_id` when applicable
- `parent_run_id` when applicable
- `actor_type`
- `actor_name`
- `kind`
- `status`
- `text`
- `created_at`
- compact `metadata`

Metadata should contain structured fields for rendering, not only raw provider or tool payloads.

## Frontend Rendering Model

The frontend should normalize events into a `TaskTimelineCard` view model.

Recommended fields:

```ts
type TaskTimelineCard = {
  id: string;
  taskId: string;
  runId?: string | null;
  parentRunId?: string | null;
  type:
    | 'task_created'
    | 'plan'
    | 'skill'
    | 'tool_call'
    | 'tool_result'
    | 'next_step'
    | 'agent_team'
    | 'agent_progress'
    | 'agent_handoff'
    | 'artifact'
    | 'error'
    | 'result'
    | 'acceptance';
  title: string;
  summary?: string;
  actorName?: string;
  status?: string;
  createdAt: string;
  relatedArtifactIds?: string[];
  details?: Record<string, unknown>;
};
```

This keeps rendering stable even if backend event payloads evolve.

## Empty, Loading, and Error States

### No Events Yet

Show a task created card and a running placeholder:

"Beaver is preparing the first step."

### Waiting on Tool

Show the active tool call card with a spinner and elapsed time.

### Waiting on Agent

Show the active agent card with its assigned task and current status.

### Failed Tool

Show an error card with a concise reason and whether Beaver is retrying or changing approach.

### Lost Connection

Keep existing cards visible and show a small reconnecting indicator. If reconnect fails, fall back to polling.

## Acceptance Flow

The final result card is the primary acceptance surface.

Actions:

- Accept: closes the task and can trigger skill learning.
- Needs revision: requires a comment, appends a new revision card, and starts another attempt in the same timeline.
- Abandon: closes the task as abandoned and preserves the execution history.

After any acceptance action, the page should immediately update local UI state and refetch the task detail.

## V1 Scope

V1 includes:

- persistent task header
- live chronological card feed
- skill cards
- tool call and result cards
- agent team card
- sub-agent progress cards
- artifact cards
- final result and acceptance card
- WebSocket-first updates with polling fallback
- collapsed raw details

V1 excludes:

- full administrator audit mode
- memory retrieval graph visualization
- raw provider request/response viewer
- advanced event payload debugger
- editable task graph

## Implementation Notes

The existing `tasks/[taskId]/page.tsx` already has useful pieces, but the main hierarchy should shift from phase groups and selected node detail to a timeline-first experience.

Likely frontend modules:

- `TaskLiveHeader`
- `TaskTimeline`
- `TaskTimelineCard`
- `TaskSideRail`
- `TaskAcceptanceCard`
- `buildTaskTimelineCards`

Likely backend work:

- emit explicit process events for skill selection and tool calls
- include user-facing text summaries in event metadata
- ensure task detail reconstruction uses persisted events
- expose enough run and actor metadata for agent team rendering

## Self-Review

- No placeholders remain.
- The design is scoped to ordinary-user task detail, not admin audit.
- Realtime requirements distinguish live updates from expandable heavy details.
- Backend event requirements are explicit enough for frontend implementation.
- V1 scope avoids memory graph and debug payload work.