feat(app): 移除内置agents并添加CORS支持和技能上传优化

移除了agents/registry.json中的所有内置agents配置,将agents数组清空。
为web应用添加了CORS中间件支持,允许指定的前端地址跨域访问。
重构了技能上传功能,增加了LLM重写机制,自动规范化上传的技能格式。
新增了工具名称提取逻辑,从技能正文中自动识别Required Tools段落。
更新了技能学习候选者和草稿的载荷结构,添加评估报告统计信息。
修改了意图路由技能的说明,改进任务状态管理逻辑。
This commit is contained in:
2026-06-12 13:25:20 +08:00
parent fc9fd93c36
commit 8aeb97a5fc
76 changed files with 3382 additions and 553 deletions

View File

@ -8,7 +8,7 @@ Beaver is an enterprise Agent sandbox and execution platform. It combines privat
- [Business Strategy HTML](./index.html): business-style product discovery, strategy canvas, target users, segmentation, and competitors.
- [Product PRD HTML](./product-prd.html): product PRD, outcome roadmap, module job stories, WWA backlog items, and test scenarios.
- [Product Discovery Report](./product-discovery-report.md): product understanding, users, JTBD, opportunities, assumptions, experiments, priorities, metrics, and 30/90 day recommendations.
- [Product Discovery Report](./product-discovery-report.md): product understanding, users, JTBD, opportunities, assumptions, experiments, priorities, and 30/90 day recommendations.
- [Product Architecture Brief](./product-architecture-brief.md): product-facing architecture across auth, deployment control, routing, app instances, frontend, backend, Agent runtime, tools, skills, memory, files, connectors, and operations.
- [PRD](./PRD-beaver-agent-sandbox.md): full-product PRD for the Beaver Agent Sandbox.
- [Validation Plan](./validation-plan.md): customer, product, technical, security, usability, and business validation plan.

View File

@ -738,7 +738,6 @@
<a href="#personas">用户画像</a>
<a href="#behavior">行为分群</a>
<a href="#competitors">竞品</a>
<a href="#metrics">验收指标</a>
</nav>
</div>
</header>
@ -758,7 +757,7 @@
<div class="kpi"><span>产品主线</span><b>执行</b>不是聊天</div>
<div class="kpi"><span>商业切口</span><b>团队</b>知识工作</div>
<div class="kpi"><span>核心壁垒</span><b>复用</b>技能与记忆</div>
<div class="kpi"><span>试点指标</span><b>验收</b>真实任务</div>
<div class="kpi"><span>价值判断</span><b>交付</b>真实任务</div>
</div>
</div>
@ -853,10 +852,9 @@
<article class="card accent-amber"><span class="tag amber">3. Relative Costs</span><h3>不打最低价,强调可控价值</h3><p>Beaver 应走“私有部署 + 执行治理 + 复用资产”的高价值路线,而不是和通用 SaaS 聊天工具比低价。</p></article>
<article class="card"><span class="tag">4. Value Proposition</span><h3>从回答到交付</h3><p>BeforeAI 输出散落在聊天里How任务化执行、工具证据、用户验收After产物可交付经验可沉淀。</p></article>
<article class="card"><span class="tag">5. Trade-offs</span><h3>明确不做什么</h3><p>不先做大众聊天 SaaS不先铺满所有连接器不默认自动发布技能不在无控制台前大规模启用敏感长期记忆。</p></article>
<article class="card"><span class="tag">6. Metrics</span><h3>北极星是“已验收工作”</h3><p>核心指标不是消息数,而是每个试点团队每周完成并被接受的 Agent 工作数。季度 OMTM首批试点的已验收任务数</p></article>
<article class="card"><span class="tag">7. Growth</span><h3>销售驱动 + 试点转扩展</h3><p>先通过高价值工作流试点进入客户,再从一个团队扩展到部门,最后以技能、模板、连接器和治理能力形成扩张</p></article>
<article class="card"><span class="tag">8. Capabilities</span><h3>需要补强的能力</h3><p>工作流模板、证据叙事、Memory Control Center、Admin Health Console、连接器安全策略、技能评估门禁</p></article>
<article class="card"><span class="tag">9. Can't / Won't</span><h3>护城河来自运行闭环</h3><p>单个聊天 UI 容易复制;难复制的是私有实例、任务证据、验收反馈、技能记忆沉淀和客户真实工作流数据。</p></article>
<article class="card"><span class="tag">6. Growth</span><h3>销售驱动 + 试点转扩展</h3><p>先通过高价值工作流试点进入客户,再从一个团队扩展到部门,最后以技能、模板、连接器和治理能力形成扩张</p></article>
<article class="card"><span class="tag">7. Capabilities</span><h3>需要补强的能力</h3><p>工作流模板、证据叙事、Memory Control Center、Admin Health Console、连接器安全策略、技能评估门禁</p></article>
<article class="card"><span class="tag">8. Can't / Won't</span><h3>护城河来自运行闭环</h3><p>单个聊天 UI 容易复制;难复制的是私有实例、任务证据、验收反馈、技能记忆沉淀和客户真实工作流数据</p></article>
</div>
</section>
@ -1209,29 +1207,12 @@
<li>不要先做所有人的通用 AI 助手。</li>
<li>不要和 Dify/Stack AI 正面比“谁更会搭 Agent”。</li>
<li>不要过早承诺所有连接器和完全自治。</li>
<li>不要把验收指标、路线图和上线计划放在前面抢主线。</li>
<li>不要把路线图和上线计划放在前面抢产品发现主线。</li>
</ul>
</article>
</div>
</section>
<section id="metrics">
<div class="section-head">
<div>
<div class="eyebrow">Acceptance Metrics</div>
<h2>验收指标放在最后</h2>
</div>
<p>这些指标只作为后续试点验收的出口,不在当前页面前半段展开路线图和上线维护。</p>
</div>
<div class="grid-4">
<div class="kpi"><span>北极星</span><b>已验收任务</b>每周/每团队</div>
<div class="kpi"><span>30 天目标</span><b>30+</b>真实验收任务</div>
<div class="kpi"><span>复用目标</span><b>5</b>技能,其中 3 个复用</div>
<div class="kpi"><span>安全目标</span><b>0</b>关键事故</div>
</div>
</section>
<section id="sources">
<div class="section-head">
<div>

View File

@ -87,7 +87,6 @@ For product pilots:
| Connector maturity varies by channel | Customer demos must avoid overpromising |
| Multi-instance deployment is powerful but operationally sensitive | Pilot success depends on stable setup and clear runbooks |
| Skill learning needs strong governance | Reuse can become risk if publishing is weak |
| Metrics are not yet productized | Hard to prove pilot value without baseline and target |
| Customer research is not yet captured | Current roadmap is inferred from implementation and product judgment |
## User Segments
@ -345,51 +344,6 @@ Opportunity 3: I need successful work to become reusable.
| Production writes through connectors without review | Trust risk |
| Complex enterprise RBAC before pilot validation | May overbuild before segment clarity |
## Metrics Dashboard
### North Star Metric
Accepted Agent Workflows:
> Number of AI-assisted tasks or scheduled workflows accepted by users per active pilot team per week.
Why this metric: it captures real delivered value better than messages sent, tokens used, or model calls.
### Input Metrics
| Metric | Definition | Target For Pilot |
| --- | --- | --- |
| Task Creation Rate | Tasks created / active users / week | Increasing weekly |
| Acceptance Rate | Accepted task runs / completed task runs | >=60% in pilot |
| Revision Rate | Runs needing revision / completed runs | Track down over time |
| Evidence Coverage | Task runs with timeline/tool/artifact evidence / task runs | >=90% |
| Skill Candidate Rate | Accepted tasks producing candidates / accepted tasks | >=20% after week 2 |
| Skill Reuse Rate | Runs activating published pilot skills / task runs | >=15% after skills exist |
| Scheduled Success Rate | Accepted scheduled outputs / scheduled runs | >=50% for selected workflows |
| Deployment Success Time | Fresh deployment time to first working user | <2 hours for pilot |
### Guardrail Metrics
| Metric | Alert |
| --- | --- |
| Critical tool/security incident | Any occurrence |
| Instance creation failure rate | >10% in pilot |
| Provider configuration failure rate | >20% |
| Task run failure rate | >20% for 2 consecutive days |
| Connector side-effect incident | Any unintended external write |
| User file permission/storage incident | Any cross-user or cross-instance leak |
| p95 task completion latency | Exceeds pilot workflow tolerance |
### Business Metrics
- Pilot activation: teams reaching first accepted task.
- Time to first accepted task.
- Weekly active task users.
- Repeated workflow count.
- Skill reuse per team.
- Customer-reported time saved.
- Pilot conversion intent.
## Customer Research Plan
No direct interview transcripts were provided. Research should start immediately before locking roadmap.
@ -454,7 +408,7 @@ We are studying how teams move AI from chat into real work. We are not asking wh
1. Pick 2-3 pilot workflows: project brief, weekly report, document review, support triage, or file processing.
2. Run fresh deployment rehearsal from README/deployment guide and record gaps.
3. Define pilot metrics and instrument accepted tasks, revisions, skill candidates, skill reuse, and run failures.
3. Define pilot learning questions and instrument the events needed to answer them.
4. Create a task evidence narrative prototype on top of existing timeline data.
5. Package pilot workflow templates as skills or documented demos.
6. Validate provider onboarding with 3 non-engineer users.

View File

@ -733,7 +733,7 @@
<span class="tag green">2. Contacts</span>
<h3>关键角色</h3>
<ul>
<li>产品负责人:定义首批场景、验收指标和模块优先级。</li>
<li>产品负责人:定义首批场景、试点问题和模块优先级。</li>
<li>工程负责人:保证实例、任务、工具、技能和连接器架构可落地。</li>
<li>设计负责人:保证工作台、任务详情、技能审核和配置体验可理解。</li>
<li>运维负责人:保证部署、路由、日志、备份和故障恢复可执行。</li>