feat(app): 移除内置agents并添加CORS支持和技能上传优化

移除了agents/registry.json中的所有内置agents配置，将agents数组清空。为web应用添加了CORS中间件支持，允许指定的前端地址跨域访问。重构了技能上传功能，增加了LLM重写机制，自动规范化上传的技能格式。新增了工具名称提取逻辑，从技能正文中自动识别Required Tools段落。更新了技能学习候选者和草稿的载荷结构，添加评估报告统计信息。修改了意图路由技能的说明，改进任务状态管理逻辑。
2026-06-12 13:25:20 +08:00
parent fc9fd93c36
commit 8aeb97a5fc
76 changed files with 3382 additions and 553 deletions
--- a/docs/product-discovery/beaver/README.md
+++ b/docs/product-discovery/beaver/README.md
@ -8,7 +8,7 @@ Beaver is an enterprise Agent sandbox and execution platform. It combines privat

 - [Business Strategy HTML](./index.html): business-style product discovery, strategy canvas, target users, segmentation, and competitors.
 - [Product PRD HTML](./product-prd.html): product PRD, outcome roadmap, module job stories, WWA backlog items, and test scenarios.
- [Product Discovery Report](./product-discovery-report.md): product understanding, users, JTBD, opportunities, assumptions, experiments, priorities, metrics, and 30/90 day recommendations.
+- [Product Discovery Report](./product-discovery-report.md): product understanding, users, JTBD, opportunities, assumptions, experiments, priorities, and 30/90 day recommendations.
 - [Product Architecture Brief](./product-architecture-brief.md): product-facing architecture across auth, deployment control, routing, app instances, frontend, backend, Agent runtime, tools, skills, memory, files, connectors, and operations.
 - [PRD](./PRD-beaver-agent-sandbox.md): full-product PRD for the Beaver Agent Sandbox.
 - [Validation Plan](./validation-plan.md): customer, product, technical, security, usability, and business validation plan.
--- a/docs/product-discovery/beaver/index.html
+++ b/docs/product-discovery/beaver/index.html
@ -738,7 +738,6 @@
        <a href="#personas">用户画像</a>
        <a href="#behavior">行为分群</a>
        <a href="#competitors">竞品</a>
-        <a href="#metrics">验收指标</a>
      </nav>
    </div>
  </header>
@ -758,7 +757,7 @@
          <div class="kpi"><span>产品主线</span><b>执行</b>不是聊天</div>
          <div class="kpi"><span>商业切口</span><b>团队</b>知识工作</div>
          <div class="kpi"><span>核心壁垒</span><b>复用</b>技能与记忆</div>
-          <div class="kpi"><span>试点指标</span><b>验收</b>真实任务</div>
+          <div class="kpi"><span>价值判断</span><b>交付</b>真实任务</div>
        </div>
      </div>

@ -853,10 +852,9 @@
        <article class="card accent-amber"><span class="tag amber">3. Relative Costs</span><h3>不打最低价，强调可控价值</h3><p>Beaver 应走“私有部署 + 执行治理 + 复用资产”的高价值路线，而不是和通用 SaaS 聊天工具比低价。</p></article>
        <article class="card"><span class="tag">4. Value Proposition</span><h3>从回答到交付</h3><p>Before：AI 输出散落在聊天里；How：任务化执行、工具证据、用户验收；After：产物可交付，经验可沉淀。</p></article>
        <article class="card"><span class="tag">5. Trade-offs</span><h3>明确不做什么</h3><p>不先做大众聊天 SaaS；不先铺满所有连接器；不默认自动发布技能；不在无控制台前大规模启用敏感长期记忆。</p></article>
-        <article class="card"><span class="tag">6. Metrics</span><h3>北极星是“已验收工作”</h3><p>核心指标不是消息数，而是每个试点团队每周完成并被接受的 Agent 工作数。季度 OMTM：首批试点的已验收任务数。</p></article>
-        <article class="card"><span class="tag">7. Growth</span><h3>销售驱动 + 试点转扩展</h3><p>先通过高价值工作流试点进入客户，再从一个团队扩展到部门，最后以技能、模板、连接器和治理能力形成扩张。</p></article>
-        <article class="card"><span class="tag">8. Capabilities</span><h3>需要补强的能力</h3><p>工作流模板、证据叙事、Memory Control Center、Admin Health Console、连接器安全策略、技能评估门禁。</p></article>
-        <article class="card"><span class="tag">9. Can't / Won't</span><h3>护城河来自运行闭环</h3><p>单个聊天 UI 容易复制；难复制的是私有实例、任务证据、验收反馈、技能记忆沉淀和客户真实工作流数据。</p></article>
+        <article class="card"><span class="tag">6. Growth</span><h3>销售驱动 + 试点转扩展</h3><p>先通过高价值工作流试点进入客户，再从一个团队扩展到部门，最后以技能、模板、连接器和治理能力形成扩张。</p></article>
+        <article class="card"><span class="tag">7. Capabilities</span><h3>需要补强的能力</h3><p>工作流模板、证据叙事、Memory Control Center、Admin Health Console、连接器安全策略、技能评估门禁。</p></article>
+        <article class="card"><span class="tag">8. Can't / Won't</span><h3>护城河来自运行闭环</h3><p>单个聊天 UI 容易复制；难复制的是私有实例、任务证据、验收反馈、技能记忆沉淀和客户真实工作流数据。</p></article>
      </div>
    </section>

@ -1209,29 +1207,12 @@
            <li>不要先做所有人的通用 AI 助手。</li>
            <li>不要和 Dify/Stack AI 正面比“谁更会搭 Agent”。</li>
            <li>不要过早承诺所有连接器和完全自治。</li>
-            <li>不要把验收指标、路线图和上线计划放在前面抢主线。</li>
+            <li>不要把路线图和上线计划放在前面抢产品发现主线。</li>
          </ul>
        </article>
      </div>
    </section>

-    <section id="metrics">
-      <div class="section-head">
-        <div>
-          <div class="eyebrow">Acceptance Metrics</div>
-          <h2>验收指标放在最后</h2>
-        </div>
-        <p>这些指标只作为后续试点验收的出口，不在当前页面前半段展开路线图和上线维护。</p>
-      </div>
-
-      <div class="grid-4">
-        <div class="kpi"><span>北极星</span><b>已验收任务</b>每周/每团队</div>
-        <div class="kpi"><span>30 天目标</span><b>30+</b>真实验收任务</div>
-        <div class="kpi"><span>复用目标</span><b>5</b>技能，其中 3 个复用</div>
-        <div class="kpi"><span>安全目标</span><b>0</b>关键事故</div>
-      </div>
-    </section>
-
    <section id="sources">
      <div class="section-head">
        <div>
--- a/docs/product-discovery/beaver/product-discovery-report.md
+++ b/docs/product-discovery/beaver/product-discovery-report.md
@ -87,7 +87,6 @@ For product pilots:
 | Connector maturity varies by channel | Customer demos must avoid overpromising |
 | Multi-instance deployment is powerful but operationally sensitive | Pilot success depends on stable setup and clear runbooks |
 | Skill learning needs strong governance | Reuse can become risk if publishing is weak |
-| Metrics are not yet productized | Hard to prove pilot value without baseline and target |
 | Customer research is not yet captured | Current roadmap is inferred from implementation and product judgment |

 ## User Segments
@ -345,51 +344,6 @@ Opportunity 3: I need successful work to become reusable.
 | Production writes through connectors without review | Trust risk |
 | Complex enterprise RBAC before pilot validation | May overbuild before segment clarity |

-## Metrics Dashboard
-
-### North Star Metric
-
-Accepted Agent Workflows:
-
-> Number of AI-assisted tasks or scheduled workflows accepted by users per active pilot team per week.
-
-Why this metric: it captures real delivered value better than messages sent, tokens used, or model calls.
-
-### Input Metrics
-
-| Metric | Definition | Target For Pilot |
-| --- | --- | --- |
-| Task Creation Rate | Tasks created / active users / week | Increasing weekly |
-| Acceptance Rate | Accepted task runs / completed task runs | >=60% in pilot |
-| Revision Rate | Runs needing revision / completed runs | Track down over time |
-| Evidence Coverage | Task runs with timeline/tool/artifact evidence / task runs | >=90% |
-| Skill Candidate Rate | Accepted tasks producing candidates / accepted tasks | >=20% after week 2 |
-| Skill Reuse Rate | Runs activating published pilot skills / task runs | >=15% after skills exist |
-| Scheduled Success Rate | Accepted scheduled outputs / scheduled runs | >=50% for selected workflows |
-| Deployment Success Time | Fresh deployment time to first working user | <2 hours for pilot |
-
-### Guardrail Metrics
-
-| Metric | Alert |
-| --- | --- |
-| Critical tool/security incident | Any occurrence |
-| Instance creation failure rate | >10% in pilot |
-| Provider configuration failure rate | >20% |
-| Task run failure rate | >20% for 2 consecutive days |
-| Connector side-effect incident | Any unintended external write |
-| User file permission/storage incident | Any cross-user or cross-instance leak |
-| p95 task completion latency | Exceeds pilot workflow tolerance |
-
-### Business Metrics
-
- Pilot activation: teams reaching first accepted task.
- Time to first accepted task.
- Weekly active task users.
- Repeated workflow count.
- Skill reuse per team.
- Customer-reported time saved.
- Pilot conversion intent.
-
 ## Customer Research Plan

 No direct interview transcripts were provided. Research should start immediately before locking roadmap.
@ -454,7 +408,7 @@ We are studying how teams move AI from chat into real work. We are not asking wh

 1. Pick 2-3 pilot workflows: project brief, weekly report, document review, support triage, or file processing.
 2. Run fresh deployment rehearsal from README/deployment guide and record gaps.
-3. Define pilot metrics and instrument accepted tasks, revisions, skill candidates, skill reuse, and run failures.
+3. Define pilot learning questions and instrument the events needed to answer them.
 4. Create a task evidence narrative prototype on top of existing timeline data.
 5. Package pilot workflow templates as skills or documented demos.
 6. Validate provider onboarding with 3 non-engineer users.
--- a/docs/product-discovery/beaver/product-prd.html
+++ b/docs/product-discovery/beaver/product-prd.html
@ -733,7 +733,7 @@
          <span class="tag green">2. Contacts</span>
          <h3>关键角色</h3>
          <ul>
-            <li>产品负责人：定义首批场景、验收指标和模块优先级。</li>
+            <li>产品负责人：定义首批场景、试点问题和模块优先级。</li>
            <li>工程负责人：保证实例、任务、工具、技能和连接器架构可落地。</li>
            <li>设计负责人：保证工作台、任务详情、技能审核和配置体验可理解。</li>
            <li>运维负责人：保证部署、路由、日志、备份和故障恢复可执行。</li>