- 添加 prompt_locale 参数支持简体中文、繁体中文和英文提示词本地化 - 移除内置 agents 配置以简化系统架构 - 更新 ContextBuilder 使用动态提示词模板而非硬编码内容 - 在 AgentLoop、Web 接口和 AgentService 中传递 locale 参数 - 添加输出语言指令确保用户界面内容按指定语言生成 - 扩展前端 LanguageSwitcher 组件支持三种语言选项 - 优化 Header 和侧边栏组件的响应式布局和文本截断处理 - 更新测试用例验证不同语言环境下的提示词正确性
12 KiB
Beaver Launch And Maintenance Runbook
Date: 2026-06-09
Scope: whole Beaver product.
1. Launch Principle
Launch Beaver through controlled pilots before broad rollout.
The product has a wide operational surface: auth, deployment control, routing, per-instance app containers, model providers, Agent runtime, tools, files, skills, memory, scheduled work, and connectors. A successful launch depends as much on reliability and trust as on feature completeness.
2. Launch Roles
| Role | Responsibility |
|---|---|
| Launch owner | Owns readiness, go/no-go, rollout phases |
| Deployment owner | Owns Docker images, network, router, instance lifecycle |
| Backend owner | Owns Agent runtime, tasks, tools, skills, cron, APIs |
| Frontend owner | Owns user-facing flows and UI verification |
| Security owner | Owns control-plane exposure, data boundaries, tool/connector policy |
| Pilot owner | Owns user onboarding, workflow selection, feedback, metrics |
| Support owner | Owns incident triage, runbook updates, user support |
3. Launch Phases
Phase 0: Local Internal Readiness
Audience: builders and internal testers.
Goals:
- Full local deployment works.
- Core demo flows are stable.
- Known risks are documented.
Required flows:
- Register/login.
- Provider onboarding.
- First chat response.
- Chat-to-task.
- Task acceptance/revision.
- File upload/preview/download/delete.
- Skill list/candidate/draft/review.
- Settings/status/restart.
Exit criteria:
- Fresh deployment run completed from docs.
- No P0 or launch-blocking P1 issues.
- Demo script works end to end.
Phase 1: Controlled Pilot
Audience: one internal team or one trusted customer team.
Goals:
- Validate real workflow value.
- Validate deployment and support process.
- Validate trust, evidence, and governance story.
Constraints:
- Narrow workflow scope.
- Narrow connector scope.
- Conservative tool policy.
- Human review for skill publishing.
- No opaque memory use for sensitive data.
Exit criteria:
-
=30 accepted tasks in 30 days.
-
=2 recurring workflows.
- 0 critical incidents.
- Deployment/support issues documented and reduced.
Phase 2: Expanded Pilot
Audience: more users in same team or a second pilot team.
Goals:
- Test repeatability across workflows.
- Introduce Memory Control Center or stricter memory policy if ready.
- Strengthen skill reuse and scheduled work.
Exit criteria:
- Skill reuse becomes visible.
- Admin can operate without developer pairing for common tasks.
- Evidence and report quality are accepted by workflow owner.
Phase 3: Production Candidate
Audience: broader customer or department rollout.
Goals:
- Stabilized deployment.
- Health monitoring.
- Incident response.
- Backup/restore process.
- Policy profiles.
Exit criteria:
- Launch owner, security owner, and deployment owner approve.
- Support process has clear ownership.
- Rollback and restore are rehearsed.
4. Pre-Launch Checklist
Deployment
- Images build successfully.
- Docker network exists.
- Router proxy starts.
- AuthZ service starts.
- Deploy control starts.
- Auth portal starts.
- App instance can be created.
- App instance route works through router proxy.
- Provider config can be written and instance restarted.
- Runtime directories are persistent.
- Public exposure limited to intended services.
Product Flows
- Register/login works.
- Provider onboarding works.
- Chat workbench loads.
- Task creation works.
- Task detail timeline works.
- Acceptance/revision/abandon works.
- Files page works.
- Tools page works for pilot tools.
- Skills page works.
- Marketplace install works if included.
- Cron/scheduled flow works if included.
- Connector flow works if included.
- Settings/status/logs work.
Governance
- Tool policy for pilot is documented.
- Connector side effects are understood.
- Skill publish gates are documented.
- Memory behavior is documented.
- Data retention expectations are documented.
- User-facing limitations are documented.
Support
- Pilot support channel exists.
- Incident owner assigned.
- Logs and health checks are accessible.
- Backup/restore expectations are clear.
- Known issues list exists.
5. Monitoring
Product Metrics
| Metric | Owner | Cadence |
|---|---|---|
| Accepted tasks | Pilot owner | Weekly |
| Acceptance rate | Product owner | Weekly |
| Revision rate | Product owner | Weekly |
| Active workflows | Pilot owner | Weekly |
| Skill candidates and reuse | Product owner | Weekly |
| Scheduled run success | Backend owner | Weekly |
| Time to first accepted task | Product/design | Per onboarding |
Operational Metrics
| Metric | Owner | Alert |
|---|---|---|
| Instance creation failures | Deployment owner | >10% during pilot |
| Router route failures | Deployment owner | Any repeated failure |
| Provider setup failures | Support owner | >20% of onboarded users |
| Task run failures | Backend owner | >20% for 2 days |
| WebSocket/runtime disconnects | Backend/frontend | Repeated user-visible failures |
| File operation failures | Backend owner | Any permission/path issue |
| Tool execution failures | Backend owner | Repeated by tool category |
| Cron failures | Backend owner | Any critical scheduled workflow missed |
| Connector failures | Integration owner | Failed auth or unintended write |
Security Metrics
| Metric | Alert |
|---|---|
| Control-plane public exposure | Immediate P0 |
| Cross-instance data access | Immediate P0 |
| Unintended external write | Immediate P0 |
| Credential leak in logs/report | Immediate P0 |
| Unsafe skill publish | P1, or P0 if external action risk |
6. Health Checks
Control Plane
- Auth portal reachable.
- AuthZ service reachable internally.
- Deploy control reachable internally with token.
- Router proxy has generated routes.
- Instance registry is readable and consistent.
App Instance
- Frontend loads.
- Backend
/api/statusresponds. - WebSocket works.
- Provider config present.
- Workspace path mounted.
- Initial skills present.
- Logs accessible.
Product Runtime
- Chat request succeeds.
- Task run succeeds.
- File API succeeds.
- Tool registry loads.
- Skills list loads.
- Cron scheduler active if enabled.
- Connector status loads if enabled.
7. Incident Response
P0: Control Plane Exposed
Examples:
deploy-controlaccessible from public internet.authz-serviceaccessible from public internet.- Internal token leaked.
Actions:
- Remove public route/firewall exposure.
- Rotate affected tokens.
- Review access logs.
- Confirm no unauthorized instance operations.
- Update deployment checklist.
P0: Cross-Instance Data Leak
Examples:
- Instance A reads Instance B workspace.
- Router sends user to wrong instance.
- Shared connector callback writes to wrong instance.
Actions:
- Disable affected route or instance.
- Preserve logs and registry.
- Identify path/host/callback mapping failure.
- Patch and add regression test.
- Notify affected stakeholders.
P0: Unintended External Action
Examples:
- Email or IM message sent unexpectedly.
- Calendar invite created unexpectedly.
- External system updated without user intent.
Actions:
- Disable connector or tool.
- Preserve task/tool evidence.
- Identify initiating task, tool, arguments, user, connector account.
- Patch policy or confirmation gate.
- Add test case and update pilot policy.
P1: New User Cannot Reach Instance
Actions:
- Check auth portal logs.
- Check authz register flow.
- Check deploy-control register/configure flow.
- Check instance registry.
- Check router route generation.
- Check container state and app logs.
P1: Provider Config Broken
Actions:
- Check settings/status.
- Confirm config path and provider fields.
- Test provider credentials.
- Restart instance if config was changed.
- Improve onboarding copy if user error.
P1: Task Runtime Failing
Actions:
- Check backend logs.
- Check provider availability.
- Check tool registry.
- Check task event timeline.
- Reproduce with minimal chat request.
- Mark affected pilot workflow as paused if repeated.
P2: UI Flow Confusing
Actions:
- Record screen and user quote.
- Add to UX issue list.
- Determine whether it blocks pilot success.
- Fix copy/layout if low effort.
8. Maintenance Cadence
Daily During Pilot
- Check critical incidents.
- Check instance health.
- Check failed task runs.
- Check support channel.
- Review provider/connector errors.
Weekly
- Review accepted tasks and acceptance rate.
- Review workflow success/failure.
- Review skill candidates and reuse.
- Review deployment issues.
- Review security/tool/connector events.
- Update known issues and runbook.
Monthly
- Rehearse fresh deployment.
- Review backup/restore approach.
- Review memory and skill governance.
- Review connector roadmap.
- Review pilot ROI and expansion decision.
Quarterly
- Revisit product positioning.
- Revisit architecture scaling assumptions.
- Decide team workspace / RBAC roadmap.
- Review security model and policy profiles.
9. Backup And Restore
Minimum data to preserve:
authz-service/runtime/dataapp-instance/runtime/instancesapp-instance/runtime/registryrouter-proxy/runtime/conf.d
Per instance:
beaver-home/config.jsonbeaver-home/web_auth_users.jsonbeaver-home/workspace/- skill and runtime state under instance data.
Pilot requirements:
- Document manual backup command.
- Document manual restore procedure.
- Test restore for at least one non-production instance before expanded pilot.
10. Change Management
Before changing any of these, require launch owner review:
- Routing/proxy config.
- AuthZ issuer/internal URL.
- Deploy token names or values.
- Instance registry format.
- Workspace mount paths.
- Provider config schema.
- Tool execution policy.
- Connector callback routing.
- Skill publish gates.
- Memory default behavior.
11. Rollback
Rollback options:
- Roll back frontend/backend image for app instances.
- Disable specific connector.
- Disable scheduled job execution.
- Disable skill learning worker.
- Disable skill publish.
- Fall back to chat-only mode for affected workflow.
- Remove public route to affected instance.
- Restore instance data from backup.
Rollback triggers:
- P0 incident.
- Repeated instance creation failure.
- Repeated task runtime failure blocking pilot work.
- Provider config issue affecting most users.
- Connector side-effect risk.
- UI issue blocking first accepted task.
12. Launch Communication
Internal
Beaver is launching as a controlled Agent execution pilot. The launch goal is not maximum feature breadth. The goal is to prove repeatable AI-assisted work with task acceptance, evidence, and reuse.
Pilot Users
Use Beaver for selected workflows where you need a concrete output. Review each result. Accept it if usable, request revision if it is close, or abandon it if it is not worth continuing. Your feedback is the signal that helps Beaver improve and reuse work.
Admins
Treat Beaver as an app platform with a control plane and per-instance runtime. Keep deploy-control and authz private. Monitor instance health, provider config, tool behavior, and connector side effects.
13. Known Limitations To Disclose
- Memory is not yet fully productized with user controls.
- Connector maturity varies by provider.
- The first pilot should use a narrow set of workflows.
- Some operations may still require engineering support.
- Skill learning needs human review before publish.
- Multi-user organization features are not the first pilot focus.
14. Go / No-Go Criteria
Go if:
- Fresh deployment works.
- First accepted task flow works.
- Evidence timeline is readable enough for pilot.
- Tool and connector policy is documented.
- Support owner is assigned.
- No critical security issue is open.
No-go if:
- Control-plane exposure risk is unresolved.
- Cross-instance isolation is unverified.
- Provider onboarding fails for most users.
- Task runtime is unreliable.
- Pilot workflow is not defined.
- No one owns incidents or support.