Files

steven_li fc9fd93c36 feat: 支持多语言提示词本地化和界面优化

- 添加 prompt_locale 参数支持简体中文、繁体中文和英文提示词本地化
- 移除内置 agents 配置以简化系统架构
- 更新 ContextBuilder 使用动态提示词模板而非硬编码内容
- 在 AgentLoop、Web 接口和 AgentService 中传递 locale 参数
- 添加输出语言指令确保用户界面内容按指定语言生成
- 扩展前端 LanguageSwitcher 组件支持三种语言选项
- 优化 Header 和侧边栏组件的响应式布局和文本截断处理
- 更新测试用例验证不同语言环境下的提示词正确性

2026-06-10 16:11:05 +08:00

14 KiB

Raw Permalink Blame History

PRD: Beaver Agent Sandbox

Date: 2026-06-09

Status: Product discovery draft for whole Beaver product

1. Summary

Beaver Agent Sandbox is a private-deployable workspace for enterprise Agent work. It lets users move from chat to managed tasks, execute work with files and tools, track evidence, accept or revise outputs, and turn successful work into reusable skills and memory.

The first product goal is to prove that Beaver can help a pilot team complete repeatable knowledge work with more control, traceability, and reuse than chat-only AI tools.

2. Contacts

Role	Owner	Comment
Product owner	TBD	Owns positioning, roadmap, pilot metrics, research
Engineering owner	TBD	Owns platform architecture and implementation quality
Design owner	TBD	Owns workspace, task, review, admin, and onboarding UX
Deployment owner	TBD	Owns Docker deployment, routing, instance lifecycle
Security/review owner	TBD	Owns tool policy, data boundaries, connector safety
Pilot owner	TBD	Owns customer/team selection and feedback loop

3. Background

Most enterprise AI experiments start with chat. Chat is useful, but it is weak at real work:

There is no durable task lifecycle.
It is hard to see what happened.
File and tool work is scattered.
Results are not formally accepted or rejected.
Successful workflows are not turned into reusable team capability.
Admins cannot easily control deployment, tools, memory, and connectors.

Beaver addresses this gap by acting as an Agent execution and governance layer. It combines a user workspace, task runtime, evidence timeline, file and tool operations, skill learning, scheduled work, connectors, and private multi-instance deployment.

Why now:

Teams are moving from AI demos to operational AI workflows.
Enterprise buyers need governance, not only model access.
Beaver already has enough implementation to support pilot workflows.
The next step is product packaging, validation, and operational hardening.

4. Objective

Objective

Prove Beaver can deliver trusted, repeatable Agent work for pilot teams.

Key Results

Key Result	Target
Time to first accepted task	Pilot user reaches first accepted task within first session
Accepted Agent Workflows	>=30 accepted tasks across pilot team within 30 days
Acceptance Rate	>=60% of completed task runs accepted
Evidence Coverage	>=90% of task runs show useful timeline/tool/artifact evidence
Skill Reuse	>=5 reusable skills created, >=3 reused at least twice
Deployment Repeatability	Fresh pilot deployment under 2 hours with documented steps
Critical Incidents	0 control-plane exposure, data leakage, or unintended external-write incidents

5. Market Segments

Primary Segment: Enterprise Teams Doing Repeatable Knowledge Work

Examples:

Project delivery teams.
Operations teams.
Internal strategy/research teams.
Technical support and engineering teams.
Customer success and sales operations teams.

Their work is a good fit when it is:

Repeated often.
Multi-step.
File-heavy.
Tool-heavy.
Needs review or approval.
Benefits from a traceable process.

Buyer Segment: AI Platform Owner / IT Leader

They need to provide AI capability without losing control over deployment, data, tools, and governance.

Admin Segment: Operator / Implementation Owner

They set up Beaver, manage model providers, monitor health, handle connectors, and support users.

Maintainer Segment: Skill Owner

They curate reusable skills and make sure published skills are safe, useful, and reviewable.

6. Value Propositions

For Workflow Teams

Beaver turns AI conversations into managed work. A request can become a task, produce artifacts, show evidence, and continue through revision until accepted.

For Platform Owners

Beaver offers a private Agent sandbox with instance boundaries, tool governance, skills, and operational controls.

For Admins

Beaver makes onboarding and operations more repeatable through auth portal, deploy control, routing, settings, status, and logs.

For Skill Maintainers

Beaver turns accepted work into reusable skills through candidate, draft, safety/eval, review, and publish flow.

For End Users

Beaver gives one place to chat, upload files, run tasks, preview outputs, review results, and reuse proven methods.

7. Solution

7.1 User Experience

First-Run Experience

User registers
  -> app instance is created
  -> user configures model provider
  -> user enters Beaver workspace
  -> user starts from a workflow template or chat
  -> Beaver creates or continues a task
  -> user accepts or revises the result

Requirements:

Registration and instance provisioning must show clear progress and errors.
Provider setup must be understandable and recoverable.
If provider setup is skipped, the app must clearly explain why Agent calls cannot run.

Daily User Workspace

Primary screens:

Chat workbench.
Task list and task details.
Files.
Notifications and scheduled work.
Skills and marketplace.
Tool management.
Settings/status/logs.

Core user loop:

Ask
  -> execute
  -> inspect evidence
  -> accept/revise
  -> reuse

Admin Experience

Admin needs:

See instance health.
Configure providers.
Configure channels/connectors.
Restart safely.
Inspect logs.
Manage tools and skills.
Understand failures.

7.2 Key Features

Authentication And Instance Provisioning

Requirements:

Users register or log in through auth portal.
Registration triggers an app-instance container.
Router maps instance host to container.
Provider onboarding can configure model provider after instance creation.

Acceptance criteria:

New user can reach a working instance.
Failed provisioning shows a recoverable error.
deploy-control and authz-service are not public surfaces.

Chat Workbench

Requirements:

Users can create/select sessions.
Users can send text and attachments.
Users can see Assistant messages, task cards, Agent run progress, and acceptance controls.
Users can jump from chat to task detail.

Acceptance criteria:

User can complete one full chat-to-task-to-accept flow.
Attachments can be uploaded and used.
Current task status is visible.

Task Lifecycle

Requirements:

System can distinguish ordinary chat from task requests.
Task can be created, run, continued, revised, accepted, abandoned, or deleted.
Task detail shows timeline, runs, tools, artifacts, result, and acceptance controls.

Acceptance criteria:

Task list and detail remain useful on mobile and desktop.
Acceptance actions are persisted.
Revision feedback continues the same task context.

Agent Team Execution

Requirements:

Complex tasks can be planned as sequence, parallel, or DAG execution.
Subtasks can bind skills or ephemeral guidance.
Main Agent synthesizes final answer from evidence.

Acceptance criteria:

Subtask results are visible and debuggable.
Failed team execution is shown without hiding partial evidence.

Files Workspace

Requirements:

Users can upload, create folders, browse, preview, download, and delete files.
Workspace roots stay understandable.
File operations are safe within instance boundaries.

Acceptance criteria:

Root and nested directories work.
Text/Markdown/image preview works.
Long file names do not break layout.

Tools And MCP

Requirements:

Admins can view, test, add, edit, delete, and refresh tools where supported.
Agent runtime can expose tools based on task and skill context.
Tool calls are recorded as evidence.

Acceptance criteria:

Tool detail and test flows work.
Dangerous tools are governed by policy before broad rollout.

Skills And Marketplace

Requirements:

Published skills can be listed, inspected, installed, uploaded, disabled, rolled back, or deleted where supported.
Accepted work can create skill candidates.
Candidates can become drafts.
Drafts require safety/eval/review gates before publish.
Marketplace supports discovery and install.

Acceptance criteria:

Candidate and draft flows do not reset UI state unexpectedly.
Publish requires review gates.
Published skill can be reused by later tasks.

Memory

Requirements:

Beaver can store long-term preferences, business knowledge, historical task learning, file/artifact memory, tool experience, and reusable workflows.
Before broad product use, users/admins need memory inspect/edit/delete/freeze controls.

Acceptance criteria for Memory Control Center MVP:

User can see what is remembered.
User can see source and last-used context.
User can edit, delete, or freeze memory.
Task detail can show when memory affected execution.

Scheduled Work And Notifications

Requirements:

Users can create scheduled jobs.
Scheduled runs can produce notifications or tasks.
Users can review, revise, or accept scheduled outputs.

Acceptance criteria:

Scheduled job can be created, toggled, run now, deleted.
Scheduled output can enter normal task review flow.

Connectors

Requirements:

Beaver can connect to external systems such as Outlook and selected IM/channel connectors.
Connector status, setup, errors, and reconnect path must be visible.
External writes require clear policy and safety boundary.

Acceptance criteria:

Pilot-safe connector list is documented.
External connector callbacks route correctly in multi-instance deployment.
Failed connector auth or setup is recoverable.

Settings, Status, Logs

Requirements:

Users/admins can configure provider, Agent settings, channels, and runtime.
Status page shows current app health.
Logs help operators diagnose failures.
Restart is confirmed before execution.

Acceptance criteria:

Provider save flow works.
Runtime restart flow is protected by confirmation.
Long config values do not break UI.

7.3 Technology

Frontend:

Next.js app inside app-instance/frontend.
App shell with chat, tasks, files, skills, marketplace, tools, connectors, settings, status, logs.

Backend:

Python Beaver backend inside app-instance/backend.
Unified beaver.engine for Agent runtime.
beaver.coordinator for multi-agent execution.
beaver.services for task, cron, process, and application orchestration.
beaver.tools for built-in/MCP tool execution.
beaver.skills for skill loading, learning, review, publishing.
beaver.memory for run memory, skills memory, long-term memory foundation.
beaver.interfaces for web, MCP, channels, CLI/gateway surfaces.

Deployment:

auth-portal.
authz-service.
deploy-control.
router-proxy.
app-instance.
Docker network and per-instance mounted runtime directories.

7.4 Data And Evidence

Important product data:

Users and auth handoff.
Instance registry.
Provider configuration.
Conversations and messages.
Tasks, task runs, run events, timeline events.
Tool calls and results.
Files and artifacts.
Skill receipts, candidates, drafts, safety/eval reports, reviews, published versions.
Memory records.
Scheduled jobs and scheduled runs.
Connector state and events.

Evidence principle:

Every meaningful Agent action should become explainable later.

7.5 Assumptions

The best first customers are teams with repeatable knowledge workflows.
Task acceptance is the right primary quality signal.
Private deployment is a benefit, not a barrier, for early enterprise pilots.
Teams will value skill/memory reuse after enough accepted tasks.
Admins can operate a Docker-based deployment with a clear runbook.
Memory must be controllable before it can be trusted.

7.6 Non-Goals For First Pilot

Broad public SaaS launch.
Full multi-tenant organization management.
Fully autonomous skill publishing.
Production external writes without clear review.
Complete enterprise RBAC.
Unlimited connector support.
Perfect long-term memory automation.
Replacing human review for high-risk work.

8. Release

Release 0: Internal Demo Readiness

Scope:

Clean local deployment.
Auth portal registration/login.
Provider onboarding.
Chat-to-task demo.
Task detail evidence.
File upload/preview.
Skills and marketplace demo.
Settings/status/logs.

Exit criteria:

Demo flow works on fresh environment.
Known limitations are documented.
No critical security/deployment issue.

Release 1: Pilot Workflow Release

Scope:

2-3 packaged workflows.
Task acceptance and evidence as main flow.
Files and selected tools.
Basic scheduled workflow.
One pilot-safe connector if stable.
Skill candidate/draft/review/publish.
Deployment runbook and support checklist.

Exit criteria:

Pilot team reaches >=30 accepted tasks in 30 days.
=5 reusable skills created.
0 critical incidents.
Deployment under 2 hours on fresh host.

Release 2: Governance And Reuse Release

Scope:

Evidence narrative.
Memory Control Center.
Skill replay/eval governance.
Admin health console.
Connector policy hardening.
Pilot scorecard.

Exit criteria:

Reviewers understand evidence.
Users can inspect and control memory.
Admins can diagnose provider/connector/runtime issues.
Skill reuse is visible in metrics.

Release 3: Expansion Release

Scope:

Team/workspace concepts if validated.
More connectors.
Audit export.
Cross-instance analytics.
Policy profiles.
Instance lifecycle automation.

Exit criteria:

Multiple teams can run without high support load.
Governance story supports enterprise buying process.

Open Questions

Is the first paying segment project teams, operations teams, engineering/support, or internal AI platform teams?
Should Beaver optimize for single-user instances first or team workspaces sooner?
Which connector is the safest and most valuable pilot connector?
What exact tool policy should apply in customer pilots?
What memory behavior should be on by default?
How much raw evidence should normal users see versus admins?
What is the backup/restore SLA for app instances?

Success Review Checklist

Can a new user get to first accepted task quickly?
Can a reviewer understand what the Agent did?
Can an admin recover from provider or connector errors?
Can a successful task become a reusable skill?
Can a pilot owner prove value with metrics?
Can security explain the deployment and tool boundaries?

14 KiB Raw Permalink Blame History

PRD: Beaver Agent Sandbox

1. Summary

2. Contacts

3. Background

4. Objective

Objective

Key Results

5. Market Segments

Primary Segment: Enterprise Teams Doing Repeatable Knowledge Work

Buyer Segment: AI Platform Owner / IT Leader

Admin Segment: Operator / Implementation Owner

Maintainer Segment: Skill Owner

6. Value Propositions

For Workflow Teams

For Platform Owners

For Admins

For Skill Maintainers

For End Users

7. Solution

7.1 User Experience

First-Run Experience

Daily User Workspace

Admin Experience

7.2 Key Features

Authentication And Instance Provisioning

Chat Workbench

Task Lifecycle

Agent Team Execution

Files Workspace

Tools And MCP

Skills And Marketplace

Memory

Scheduled Work And Notifications

Connectors

Settings, Status, Logs

7.3 Technology

7.4 Data And Evidence

7.5 Assumptions

7.6 Non-Goals For First Pilot

8. Release

Release 0: Internal Demo Readiness

Release 1: Pilot Workflow Release

Release 2: Governance And Reuse Release

Release 3: Expansion Release

Open Questions

Success Review Checklist

14 KiB

Raw Permalink Blame History