- 添加 prompt_locale 参数支持简体中文、繁体中文和英文提示词本地化 - 移除内置 agents 配置以简化系统架构 - 更新 ContextBuilder 使用动态提示词模板而非硬编码内容 - 在 AgentLoop、Web 接口和 AgentService 中传递 locale 参数 - 添加输出语言指令确保用户界面内容按指定语言生成 - 扩展前端 LanguageSwitcher 组件支持三种语言选项 - 优化 Header 和侧边栏组件的响应式布局和文本截断处理 - 更新测试用例验证不同语言环境下的提示词正确性
14 KiB
PRD: Beaver Agent Sandbox
Date: 2026-06-09
Status: Product discovery draft for whole Beaver product
1. Summary
Beaver Agent Sandbox is a private-deployable workspace for enterprise Agent work. It lets users move from chat to managed tasks, execute work with files and tools, track evidence, accept or revise outputs, and turn successful work into reusable skills and memory.
The first product goal is to prove that Beaver can help a pilot team complete repeatable knowledge work with more control, traceability, and reuse than chat-only AI tools.
2. Contacts
| Role | Owner | Comment |
|---|---|---|
| Product owner | TBD | Owns positioning, roadmap, pilot metrics, research |
| Engineering owner | TBD | Owns platform architecture and implementation quality |
| Design owner | TBD | Owns workspace, task, review, admin, and onboarding UX |
| Deployment owner | TBD | Owns Docker deployment, routing, instance lifecycle |
| Security/review owner | TBD | Owns tool policy, data boundaries, connector safety |
| Pilot owner | TBD | Owns customer/team selection and feedback loop |
3. Background
Most enterprise AI experiments start with chat. Chat is useful, but it is weak at real work:
- There is no durable task lifecycle.
- It is hard to see what happened.
- File and tool work is scattered.
- Results are not formally accepted or rejected.
- Successful workflows are not turned into reusable team capability.
- Admins cannot easily control deployment, tools, memory, and connectors.
Beaver addresses this gap by acting as an Agent execution and governance layer. It combines a user workspace, task runtime, evidence timeline, file and tool operations, skill learning, scheduled work, connectors, and private multi-instance deployment.
Why now:
- Teams are moving from AI demos to operational AI workflows.
- Enterprise buyers need governance, not only model access.
- Beaver already has enough implementation to support pilot workflows.
- The next step is product packaging, validation, and operational hardening.
4. Objective
Objective
Prove Beaver can deliver trusted, repeatable Agent work for pilot teams.
Key Results
| Key Result | Target |
|---|---|
| Time to first accepted task | Pilot user reaches first accepted task within first session |
| Accepted Agent Workflows | >=30 accepted tasks across pilot team within 30 days |
| Acceptance Rate | >=60% of completed task runs accepted |
| Evidence Coverage | >=90% of task runs show useful timeline/tool/artifact evidence |
| Skill Reuse | >=5 reusable skills created, >=3 reused at least twice |
| Deployment Repeatability | Fresh pilot deployment under 2 hours with documented steps |
| Critical Incidents | 0 control-plane exposure, data leakage, or unintended external-write incidents |
5. Market Segments
Primary Segment: Enterprise Teams Doing Repeatable Knowledge Work
Examples:
- Project delivery teams.
- Operations teams.
- Internal strategy/research teams.
- Technical support and engineering teams.
- Customer success and sales operations teams.
Their work is a good fit when it is:
- Repeated often.
- Multi-step.
- File-heavy.
- Tool-heavy.
- Needs review or approval.
- Benefits from a traceable process.
Buyer Segment: AI Platform Owner / IT Leader
They need to provide AI capability without losing control over deployment, data, tools, and governance.
Admin Segment: Operator / Implementation Owner
They set up Beaver, manage model providers, monitor health, handle connectors, and support users.
Maintainer Segment: Skill Owner
They curate reusable skills and make sure published skills are safe, useful, and reviewable.
6. Value Propositions
For Workflow Teams
Beaver turns AI conversations into managed work. A request can become a task, produce artifacts, show evidence, and continue through revision until accepted.
For Platform Owners
Beaver offers a private Agent sandbox with instance boundaries, tool governance, skills, and operational controls.
For Admins
Beaver makes onboarding and operations more repeatable through auth portal, deploy control, routing, settings, status, and logs.
For Skill Maintainers
Beaver turns accepted work into reusable skills through candidate, draft, safety/eval, review, and publish flow.
For End Users
Beaver gives one place to chat, upload files, run tasks, preview outputs, review results, and reuse proven methods.
7. Solution
7.1 User Experience
First-Run Experience
User registers
-> app instance is created
-> user configures model provider
-> user enters Beaver workspace
-> user starts from a workflow template or chat
-> Beaver creates or continues a task
-> user accepts or revises the result
Requirements:
- Registration and instance provisioning must show clear progress and errors.
- Provider setup must be understandable and recoverable.
- If provider setup is skipped, the app must clearly explain why Agent calls cannot run.
Daily User Workspace
Primary screens:
- Chat workbench.
- Task list and task details.
- Files.
- Notifications and scheduled work.
- Skills and marketplace.
- Tool management.
- Settings/status/logs.
Core user loop:
Ask
-> execute
-> inspect evidence
-> accept/revise
-> reuse
Admin Experience
Admin needs:
- See instance health.
- Configure providers.
- Configure channels/connectors.
- Restart safely.
- Inspect logs.
- Manage tools and skills.
- Understand failures.
7.2 Key Features
Authentication And Instance Provisioning
Requirements:
- Users register or log in through auth portal.
- Registration triggers an app-instance container.
- Router maps instance host to container.
- Provider onboarding can configure model provider after instance creation.
Acceptance criteria:
- New user can reach a working instance.
- Failed provisioning shows a recoverable error.
deploy-controlandauthz-serviceare not public surfaces.
Chat Workbench
Requirements:
- Users can create/select sessions.
- Users can send text and attachments.
- Users can see Assistant messages, task cards, Agent run progress, and acceptance controls.
- Users can jump from chat to task detail.
Acceptance criteria:
- User can complete one full chat-to-task-to-accept flow.
- Attachments can be uploaded and used.
- Current task status is visible.
Task Lifecycle
Requirements:
- System can distinguish ordinary chat from task requests.
- Task can be created, run, continued, revised, accepted, abandoned, or deleted.
- Task detail shows timeline, runs, tools, artifacts, result, and acceptance controls.
Acceptance criteria:
- Task list and detail remain useful on mobile and desktop.
- Acceptance actions are persisted.
- Revision feedback continues the same task context.
Agent Team Execution
Requirements:
- Complex tasks can be planned as sequence, parallel, or DAG execution.
- Subtasks can bind skills or ephemeral guidance.
- Main Agent synthesizes final answer from evidence.
Acceptance criteria:
- Subtask results are visible and debuggable.
- Failed team execution is shown without hiding partial evidence.
Files Workspace
Requirements:
- Users can upload, create folders, browse, preview, download, and delete files.
- Workspace roots stay understandable.
- File operations are safe within instance boundaries.
Acceptance criteria:
- Root and nested directories work.
- Text/Markdown/image preview works.
- Long file names do not break layout.
Tools And MCP
Requirements:
- Admins can view, test, add, edit, delete, and refresh tools where supported.
- Agent runtime can expose tools based on task and skill context.
- Tool calls are recorded as evidence.
Acceptance criteria:
- Tool detail and test flows work.
- Dangerous tools are governed by policy before broad rollout.
Skills And Marketplace
Requirements:
- Published skills can be listed, inspected, installed, uploaded, disabled, rolled back, or deleted where supported.
- Accepted work can create skill candidates.
- Candidates can become drafts.
- Drafts require safety/eval/review gates before publish.
- Marketplace supports discovery and install.
Acceptance criteria:
- Candidate and draft flows do not reset UI state unexpectedly.
- Publish requires review gates.
- Published skill can be reused by later tasks.
Memory
Requirements:
- Beaver can store long-term preferences, business knowledge, historical task learning, file/artifact memory, tool experience, and reusable workflows.
- Before broad product use, users/admins need memory inspect/edit/delete/freeze controls.
Acceptance criteria for Memory Control Center MVP:
- User can see what is remembered.
- User can see source and last-used context.
- User can edit, delete, or freeze memory.
- Task detail can show when memory affected execution.
Scheduled Work And Notifications
Requirements:
- Users can create scheduled jobs.
- Scheduled runs can produce notifications or tasks.
- Users can review, revise, or accept scheduled outputs.
Acceptance criteria:
- Scheduled job can be created, toggled, run now, deleted.
- Scheduled output can enter normal task review flow.
Connectors
Requirements:
- Beaver can connect to external systems such as Outlook and selected IM/channel connectors.
- Connector status, setup, errors, and reconnect path must be visible.
- External writes require clear policy and safety boundary.
Acceptance criteria:
- Pilot-safe connector list is documented.
- External connector callbacks route correctly in multi-instance deployment.
- Failed connector auth or setup is recoverable.
Settings, Status, Logs
Requirements:
- Users/admins can configure provider, Agent settings, channels, and runtime.
- Status page shows current app health.
- Logs help operators diagnose failures.
- Restart is confirmed before execution.
Acceptance criteria:
- Provider save flow works.
- Runtime restart flow is protected by confirmation.
- Long config values do not break UI.
7.3 Technology
Frontend:
- Next.js app inside
app-instance/frontend. - App shell with chat, tasks, files, skills, marketplace, tools, connectors, settings, status, logs.
Backend:
- Python Beaver backend inside
app-instance/backend. - Unified
beaver.enginefor Agent runtime. beaver.coordinatorfor multi-agent execution.beaver.servicesfor task, cron, process, and application orchestration.beaver.toolsfor built-in/MCP tool execution.beaver.skillsfor skill loading, learning, review, publishing.beaver.memoryfor run memory, skills memory, long-term memory foundation.beaver.interfacesfor web, MCP, channels, CLI/gateway surfaces.
Deployment:
auth-portal.authz-service.deploy-control.router-proxy.app-instance.- Docker network and per-instance mounted runtime directories.
7.4 Data And Evidence
Important product data:
- Users and auth handoff.
- Instance registry.
- Provider configuration.
- Conversations and messages.
- Tasks, task runs, run events, timeline events.
- Tool calls and results.
- Files and artifacts.
- Skill receipts, candidates, drafts, safety/eval reports, reviews, published versions.
- Memory records.
- Scheduled jobs and scheduled runs.
- Connector state and events.
Evidence principle:
Every meaningful Agent action should become explainable later.
7.5 Assumptions
- The best first customers are teams with repeatable knowledge workflows.
- Task acceptance is the right primary quality signal.
- Private deployment is a benefit, not a barrier, for early enterprise pilots.
- Teams will value skill/memory reuse after enough accepted tasks.
- Admins can operate a Docker-based deployment with a clear runbook.
- Memory must be controllable before it can be trusted.
7.6 Non-Goals For First Pilot
- Broad public SaaS launch.
- Full multi-tenant organization management.
- Fully autonomous skill publishing.
- Production external writes without clear review.
- Complete enterprise RBAC.
- Unlimited connector support.
- Perfect long-term memory automation.
- Replacing human review for high-risk work.
8. Release
Release 0: Internal Demo Readiness
Scope:
- Clean local deployment.
- Auth portal registration/login.
- Provider onboarding.
- Chat-to-task demo.
- Task detail evidence.
- File upload/preview.
- Skills and marketplace demo.
- Settings/status/logs.
Exit criteria:
- Demo flow works on fresh environment.
- Known limitations are documented.
- No critical security/deployment issue.
Release 1: Pilot Workflow Release
Scope:
- 2-3 packaged workflows.
- Task acceptance and evidence as main flow.
- Files and selected tools.
- Basic scheduled workflow.
- One pilot-safe connector if stable.
- Skill candidate/draft/review/publish.
- Deployment runbook and support checklist.
Exit criteria:
- Pilot team reaches >=30 accepted tasks in 30 days.
-
=5 reusable skills created.
- 0 critical incidents.
- Deployment under 2 hours on fresh host.
Release 2: Governance And Reuse Release
Scope:
- Evidence narrative.
- Memory Control Center.
- Skill replay/eval governance.
- Admin health console.
- Connector policy hardening.
- Pilot scorecard.
Exit criteria:
- Reviewers understand evidence.
- Users can inspect and control memory.
- Admins can diagnose provider/connector/runtime issues.
- Skill reuse is visible in metrics.
Release 3: Expansion Release
Scope:
- Team/workspace concepts if validated.
- More connectors.
- Audit export.
- Cross-instance analytics.
- Policy profiles.
- Instance lifecycle automation.
Exit criteria:
- Multiple teams can run without high support load.
- Governance story supports enterprise buying process.
Open Questions
- Is the first paying segment project teams, operations teams, engineering/support, or internal AI platform teams?
- Should Beaver optimize for single-user instances first or team workspaces sooner?
- Which connector is the safest and most valuable pilot connector?
- What exact tool policy should apply in customer pilots?
- What memory behavior should be on by default?
- How much raw evidence should normal users see versus admins?
- What is the backup/restore SLA for app instances?
Success Review Checklist
- Can a new user get to first accepted task quickly?
- Can a reviewer understand what the Agent did?
- Can an admin recover from provider or connector errors?
- Can a successful task become a reusable skill?
- Can a pilot owner prove value with metrics?
- Can security explain the deployment and tool boundaries?