Files
beaver_project/docs/superpowers/specs/2026-05-26-task-detail-live-execution-design.md

9.9 KiB

Task Detail Live Execution Design

Purpose

Task detail should be a live execution surface for ordinary users. It should answer "what is Beaver doing now?", "what has already happened?", "what changed because of a tool or agent result?", and "what can I inspect or accept?" without forcing the user to wait for a final answer.

This page is not primarily a developer audit view. It should expose enough execution detail to create confidence, while keeping raw payloads, long tool output, and debug metadata behind progressive disclosure.

User Experience Principles

  • Show progress as a chronological card feed that grows while the task runs.
  • Prefer user-facing explanations over raw internal event names.
  • Show skill selection, tool usage, tool result, agent team activity, artifacts, and final result as first-class cards.
  • Do not expose hidden chain-of-thought. Use brief action summaries such as "Beaver found the relevant files and will now inspect the API response shape."
  • Keep the user oriented with a persistent task header and clear current status.
  • Stop live updates once the task reaches a terminal state, while still allowing manual refresh.

Page Layout

Persistent Header

The top header remains visible while scrolling and contains:

  • task title
  • task status: open, running, awaiting acceptance, needs revision, closed, abandoned, error, or cancelled
  • current stage label
  • elapsed time
  • compact progress summary
  • link back to task list
  • link to source conversation
  • acceptance entry point when a run is ready for review

Main Timeline

The main column is a chronological card feed. Cards append as execution events arrive.

Expected card sequence:

  1. task created
  2. planning started or completed
  3. skill selected
  4. tool call started
  5. tool call finished
  6. model next step
  7. agent team created
  8. sub-agent started
  9. sub-agent progress
  10. agent handoff
  11. sub-agent finished
  12. artifact created
  13. result ready
  14. acceptance recorded

Cards should visually appear in order and keep enough prior context visible so the page feels like a live work log rather than a static report.

Side Rail

The side rail contains compact, always-accessible context:

  • agent team map
  • currently active agent or tool
  • artifacts list
  • latest warning or blocked state
  • acceptance state

On small screens, the side rail collapses below the header or into tabs.

Card Types

Task Created Card

Shows that Beaver recognized the user message as a task.

Fields:

  • task goal
  • source session
  • created time
  • initial status

Plan Card

Shows the execution approach.

Fields:

  • mode: single agent or agent team
  • planned stages
  • attempt index
  • strategy summary

Skill Card

Shows which skill Beaver selected and why it matters.

Fields:

  • skill name
  • skill version if available
  • user-facing reason
  • capabilities or method guidance summary

If multiple skills are selected, render one grouped card with individual rows.

Tool Call Card

Shows that Beaver is using a tool.

Fields:

  • tool name
  • action summary
  • actor name
  • status: running, done, failed
  • started time
  • duration if completed

Raw tool arguments are hidden by default.

Tool Result Card

Shows what the tool found or produced.

Fields:

  • success or failure
  • result summary
  • error message if any
  • links to artifact or output
  • expandable raw result

Next Step Card

Shows Beaver's next user-visible action after interpreting a result.

Fields:

  • short action explanation
  • related prior card or run
  • expected next event type when known

This card must not contain private reasoning traces.

Agent Team Card

Shows that Beaver created a multi-agent team.

Fields:

  • team strategy
  • agent count
  • dependency shape
  • agent names and assigned tasks

Sub-Agent Card

Shows progress from an individual agent.

Fields:

  • agent name
  • assigned task
  • current status
  • progress text
  • latest output summary

Agent Handoff Card

Shows interaction between agents.

Fields:

  • source agent
  • target agent
  • handoff reason
  • summary of transferred result

Artifact Card

Shows an output created during execution.

Fields:

  • artifact title
  • artifact type
  • source agent or run
  • created time
  • open or download action
  • summary or preview where safe

Error or Blocked Card

Shows that execution hit a problem.

Fields:

  • problem summary
  • affected stage or tool
  • whether Beaver can continue automatically
  • action required from user if any

Final Result Card

Shows the result that the user can review.

Fields:

  • final answer or result summary
  • important artifacts
  • validation or evidence status when available
  • accept, revise, and abandon actions

Realtime Behavior

Live Updates

The page should subscribe to task-related process events while the task is active. The following updates should append or update cards in real time:

  • skill selected
  • tool call started
  • tool call finished
  • agent team created
  • sub-agent started
  • sub-agent progress
  • sub-agent finished
  • agent handoff
  • artifact created
  • task result ready
  • task error or blocked state
  • acceptance recorded

Initial Load

On page load, call GET /api/tasks/{task_id} and hydrate:

  • task metadata
  • lifecycle events
  • process runs
  • process events
  • process artifacts
  • readable run messages
  • existing feedback

The frontend should build the initial card feed from these persisted records so a refreshed page reconstructs the same execution timeline.

Fallback Polling

If WebSocket updates are unavailable, active tasks should poll GET /api/tasks/{task_id} every 3 to 5 seconds.

Polling stops when the task reaches a terminal state:

  • closed
  • abandoned
  • cancelled
  • error

Manual refresh remains available.

Large Content Loading

The following content should not be loaded or expanded by default:

  • raw tool arguments
  • full tool output
  • raw process event payloads
  • full transcript
  • memory retrieval trace
  • debug metadata

These belong behind "show details" controls or a later advanced view.

Backend Event Contract

The existing task detail API already exposes useful primitives:

  • process_runs
  • process_events
  • process_artifacts
  • runs
  • events
  • skill_names
  • task metadata and feedback

For a reliable user-facing timeline, backend events should become more explicit. Recommended event kinds:

  • task_created
  • task_planned
  • skill_selected
  • tool_call_started
  • tool_call_finished
  • agent_team_created
  • agent_started
  • agent_progress
  • agent_handoff
  • agent_finished
  • artifact_created
  • task_result_ready
  • task_acceptance_recorded
  • task_error

Each event should include:

  • event_id
  • task_id
  • run_id when applicable
  • parent_run_id when applicable
  • actor_type
  • actor_name
  • kind
  • status
  • text
  • created_at
  • compact metadata

Metadata should contain structured fields for rendering, not only raw provider or tool payloads.

Frontend Rendering Model

The frontend should normalize events into a TaskTimelineCard view model.

Recommended fields:

type TaskTimelineCard = {
  id: string;
  taskId: string;
  runId?: string | null;
  parentRunId?: string | null;
  type:
    | 'task_created'
    | 'plan'
    | 'skill'
    | 'tool_call'
    | 'tool_result'
    | 'next_step'
    | 'agent_team'
    | 'agent_progress'
    | 'agent_handoff'
    | 'artifact'
    | 'error'
    | 'result'
    | 'acceptance';
  title: string;
  summary?: string;
  actorName?: string;
  status?: string;
  createdAt: string;
  relatedArtifactIds?: string[];
  details?: Record<string, unknown>;
};

This keeps rendering stable even if backend event payloads evolve.

Empty, Loading, and Error States

No Events Yet

Show a task created card and a running placeholder:

"Beaver is preparing the first step."

Waiting on Tool

Show the active tool call card with a spinner and elapsed time.

Waiting on Agent

Show the active agent card with its assigned task and current status.

Failed Tool

Show an error card with a concise reason and whether Beaver is retrying or changing approach.

Lost Connection

Keep existing cards visible and show a small reconnecting indicator. If reconnect fails, fall back to polling.

Acceptance Flow

The final result card is the primary acceptance surface.

Actions:

  • Accept: closes the task and can trigger skill learning.
  • Needs revision: requires a comment, appends a new revision card, and starts another attempt in the same timeline.
  • Abandon: closes the task as abandoned and preserves the execution history.

After any acceptance action, the page should immediately update local UI state and refetch the task detail.

V1 Scope

V1 includes:

  • persistent task header
  • live chronological card feed
  • skill cards
  • tool call and result cards
  • agent team card
  • sub-agent progress cards
  • artifact cards
  • final result and acceptance card
  • WebSocket-first updates with polling fallback
  • collapsed raw details

V1 excludes:

  • full administrator audit mode
  • memory retrieval graph visualization
  • raw provider request/response viewer
  • advanced event payload debugger
  • editable task graph

Implementation Notes

The existing tasks/[taskId]/page.tsx already has useful pieces, but the main hierarchy should shift from phase groups and selected node detail to a timeline-first experience.

Likely frontend modules:

  • TaskLiveHeader
  • TaskTimeline
  • TaskTimelineCard
  • TaskSideRail
  • TaskAcceptanceCard
  • buildTaskTimelineCards

Likely backend work:

  • emit explicit process events for skill selection and tool calls
  • include user-facing text summaries in event metadata
  • ensure task detail reconstruction uses persisted events
  • expose enough run and actor metadata for agent team rendering

Self-Review

  • No placeholders remain.
  • The design is scoped to ordinary-user task detail, not admin audit.
  • Realtime requirements distinguish live updates from expandable heavy details.
  • Backend event requirements are explicit enough for frontend implementation.
  • V1 scope avoids memory graph and debug payload work.