feat(app): 移除内置agents并添加CORS支持和技能上传优化
移除了agents/registry.json中的所有内置agents配置,将agents数组清空。 为web应用添加了CORS中间件支持,允许指定的前端地址跨域访问。 重构了技能上传功能,增加了LLM重写机制,自动规范化上传的技能格式。 新增了工具名称提取逻辑,从技能正文中自动识别Required Tools段落。 更新了技能学习候选者和草稿的载荷结构,添加评估报告统计信息。 修改了意图路由技能的说明,改进任务状态管理逻辑。
This commit is contained in:
@ -8,7 +8,7 @@ Beaver is an enterprise Agent sandbox and execution platform. It combines privat
|
||||
|
||||
- [Business Strategy HTML](./index.html): business-style product discovery, strategy canvas, target users, segmentation, and competitors.
|
||||
- [Product PRD HTML](./product-prd.html): product PRD, outcome roadmap, module job stories, WWA backlog items, and test scenarios.
|
||||
- [Product Discovery Report](./product-discovery-report.md): product understanding, users, JTBD, opportunities, assumptions, experiments, priorities, metrics, and 30/90 day recommendations.
|
||||
- [Product Discovery Report](./product-discovery-report.md): product understanding, users, JTBD, opportunities, assumptions, experiments, priorities, and 30/90 day recommendations.
|
||||
- [Product Architecture Brief](./product-architecture-brief.md): product-facing architecture across auth, deployment control, routing, app instances, frontend, backend, Agent runtime, tools, skills, memory, files, connectors, and operations.
|
||||
- [PRD](./PRD-beaver-agent-sandbox.md): full-product PRD for the Beaver Agent Sandbox.
|
||||
- [Validation Plan](./validation-plan.md): customer, product, technical, security, usability, and business validation plan.
|
||||
|
||||
@ -738,7 +738,6 @@
|
||||
<a href="#personas">用户画像</a>
|
||||
<a href="#behavior">行为分群</a>
|
||||
<a href="#competitors">竞品</a>
|
||||
<a href="#metrics">验收指标</a>
|
||||
</nav>
|
||||
</div>
|
||||
</header>
|
||||
@ -758,7 +757,7 @@
|
||||
<div class="kpi"><span>产品主线</span><b>执行</b>不是聊天</div>
|
||||
<div class="kpi"><span>商业切口</span><b>团队</b>知识工作</div>
|
||||
<div class="kpi"><span>核心壁垒</span><b>复用</b>技能与记忆</div>
|
||||
<div class="kpi"><span>试点指标</span><b>验收</b>真实任务</div>
|
||||
<div class="kpi"><span>价值判断</span><b>交付</b>真实任务</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
@ -853,10 +852,9 @@
|
||||
<article class="card accent-amber"><span class="tag amber">3. Relative Costs</span><h3>不打最低价,强调可控价值</h3><p>Beaver 应走“私有部署 + 执行治理 + 复用资产”的高价值路线,而不是和通用 SaaS 聊天工具比低价。</p></article>
|
||||
<article class="card"><span class="tag">4. Value Proposition</span><h3>从回答到交付</h3><p>Before:AI 输出散落在聊天里;How:任务化执行、工具证据、用户验收;After:产物可交付,经验可沉淀。</p></article>
|
||||
<article class="card"><span class="tag">5. Trade-offs</span><h3>明确不做什么</h3><p>不先做大众聊天 SaaS;不先铺满所有连接器;不默认自动发布技能;不在无控制台前大规模启用敏感长期记忆。</p></article>
|
||||
<article class="card"><span class="tag">6. Metrics</span><h3>北极星是“已验收工作”</h3><p>核心指标不是消息数,而是每个试点团队每周完成并被接受的 Agent 工作数。季度 OMTM:首批试点的已验收任务数。</p></article>
|
||||
<article class="card"><span class="tag">7. Growth</span><h3>销售驱动 + 试点转扩展</h3><p>先通过高价值工作流试点进入客户,再从一个团队扩展到部门,最后以技能、模板、连接器和治理能力形成扩张。</p></article>
|
||||
<article class="card"><span class="tag">8. Capabilities</span><h3>需要补强的能力</h3><p>工作流模板、证据叙事、Memory Control Center、Admin Health Console、连接器安全策略、技能评估门禁。</p></article>
|
||||
<article class="card"><span class="tag">9. Can't / Won't</span><h3>护城河来自运行闭环</h3><p>单个聊天 UI 容易复制;难复制的是私有实例、任务证据、验收反馈、技能记忆沉淀和客户真实工作流数据。</p></article>
|
||||
<article class="card"><span class="tag">6. Growth</span><h3>销售驱动 + 试点转扩展</h3><p>先通过高价值工作流试点进入客户,再从一个团队扩展到部门,最后以技能、模板、连接器和治理能力形成扩张。</p></article>
|
||||
<article class="card"><span class="tag">7. Capabilities</span><h3>需要补强的能力</h3><p>工作流模板、证据叙事、Memory Control Center、Admin Health Console、连接器安全策略、技能评估门禁。</p></article>
|
||||
<article class="card"><span class="tag">8. Can't / Won't</span><h3>护城河来自运行闭环</h3><p>单个聊天 UI 容易复制;难复制的是私有实例、任务证据、验收反馈、技能记忆沉淀和客户真实工作流数据。</p></article>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
@ -1209,29 +1207,12 @@
|
||||
<li>不要先做所有人的通用 AI 助手。</li>
|
||||
<li>不要和 Dify/Stack AI 正面比“谁更会搭 Agent”。</li>
|
||||
<li>不要过早承诺所有连接器和完全自治。</li>
|
||||
<li>不要把验收指标、路线图和上线计划放在前面抢主线。</li>
|
||||
<li>不要把路线图和上线计划放在前面抢产品发现主线。</li>
|
||||
</ul>
|
||||
</article>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="metrics">
|
||||
<div class="section-head">
|
||||
<div>
|
||||
<div class="eyebrow">Acceptance Metrics</div>
|
||||
<h2>验收指标放在最后</h2>
|
||||
</div>
|
||||
<p>这些指标只作为后续试点验收的出口,不在当前页面前半段展开路线图和上线维护。</p>
|
||||
</div>
|
||||
|
||||
<div class="grid-4">
|
||||
<div class="kpi"><span>北极星</span><b>已验收任务</b>每周/每团队</div>
|
||||
<div class="kpi"><span>30 天目标</span><b>30+</b>真实验收任务</div>
|
||||
<div class="kpi"><span>复用目标</span><b>5</b>技能,其中 3 个复用</div>
|
||||
<div class="kpi"><span>安全目标</span><b>0</b>关键事故</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<section id="sources">
|
||||
<div class="section-head">
|
||||
<div>
|
||||
|
||||
@ -87,7 +87,6 @@ For product pilots:
|
||||
| Connector maturity varies by channel | Customer demos must avoid overpromising |
|
||||
| Multi-instance deployment is powerful but operationally sensitive | Pilot success depends on stable setup and clear runbooks |
|
||||
| Skill learning needs strong governance | Reuse can become risk if publishing is weak |
|
||||
| Metrics are not yet productized | Hard to prove pilot value without baseline and target |
|
||||
| Customer research is not yet captured | Current roadmap is inferred from implementation and product judgment |
|
||||
|
||||
## User Segments
|
||||
@ -345,51 +344,6 @@ Opportunity 3: I need successful work to become reusable.
|
||||
| Production writes through connectors without review | Trust risk |
|
||||
| Complex enterprise RBAC before pilot validation | May overbuild before segment clarity |
|
||||
|
||||
## Metrics Dashboard
|
||||
|
||||
### North Star Metric
|
||||
|
||||
Accepted Agent Workflows:
|
||||
|
||||
> Number of AI-assisted tasks or scheduled workflows accepted by users per active pilot team per week.
|
||||
|
||||
Why this metric: it captures real delivered value better than messages sent, tokens used, or model calls.
|
||||
|
||||
### Input Metrics
|
||||
|
||||
| Metric | Definition | Target For Pilot |
|
||||
| --- | --- | --- |
|
||||
| Task Creation Rate | Tasks created / active users / week | Increasing weekly |
|
||||
| Acceptance Rate | Accepted task runs / completed task runs | >=60% in pilot |
|
||||
| Revision Rate | Runs needing revision / completed runs | Track down over time |
|
||||
| Evidence Coverage | Task runs with timeline/tool/artifact evidence / task runs | >=90% |
|
||||
| Skill Candidate Rate | Accepted tasks producing candidates / accepted tasks | >=20% after week 2 |
|
||||
| Skill Reuse Rate | Runs activating published pilot skills / task runs | >=15% after skills exist |
|
||||
| Scheduled Success Rate | Accepted scheduled outputs / scheduled runs | >=50% for selected workflows |
|
||||
| Deployment Success Time | Fresh deployment time to first working user | <2 hours for pilot |
|
||||
|
||||
### Guardrail Metrics
|
||||
|
||||
| Metric | Alert |
|
||||
| --- | --- |
|
||||
| Critical tool/security incident | Any occurrence |
|
||||
| Instance creation failure rate | >10% in pilot |
|
||||
| Provider configuration failure rate | >20% |
|
||||
| Task run failure rate | >20% for 2 consecutive days |
|
||||
| Connector side-effect incident | Any unintended external write |
|
||||
| User file permission/storage incident | Any cross-user or cross-instance leak |
|
||||
| p95 task completion latency | Exceeds pilot workflow tolerance |
|
||||
|
||||
### Business Metrics
|
||||
|
||||
- Pilot activation: teams reaching first accepted task.
|
||||
- Time to first accepted task.
|
||||
- Weekly active task users.
|
||||
- Repeated workflow count.
|
||||
- Skill reuse per team.
|
||||
- Customer-reported time saved.
|
||||
- Pilot conversion intent.
|
||||
|
||||
## Customer Research Plan
|
||||
|
||||
No direct interview transcripts were provided. Research should start immediately before locking roadmap.
|
||||
@ -454,7 +408,7 @@ We are studying how teams move AI from chat into real work. We are not asking wh
|
||||
|
||||
1. Pick 2-3 pilot workflows: project brief, weekly report, document review, support triage, or file processing.
|
||||
2. Run fresh deployment rehearsal from README/deployment guide and record gaps.
|
||||
3. Define pilot metrics and instrument accepted tasks, revisions, skill candidates, skill reuse, and run failures.
|
||||
3. Define pilot learning questions and instrument the events needed to answer them.
|
||||
4. Create a task evidence narrative prototype on top of existing timeline data.
|
||||
5. Package pilot workflow templates as skills or documented demos.
|
||||
6. Validate provider onboarding with 3 non-engineer users.
|
||||
|
||||
@ -733,7 +733,7 @@
|
||||
<span class="tag green">2. Contacts</span>
|
||||
<h3>关键角色</h3>
|
||||
<ul>
|
||||
<li>产品负责人:定义首批场景、验收指标和模块优先级。</li>
|
||||
<li>产品负责人:定义首批场景、试点问题和模块优先级。</li>
|
||||
<li>工程负责人:保证实例、任务、工具、技能和连接器架构可落地。</li>
|
||||
<li>设计负责人:保证工作台、任务详情、技能审核和配置体验可理解。</li>
|
||||
<li>运维负责人:保证部署、路由、日志、备份和故障恢复可执行。</li>
|
||||
|
||||
Reference in New Issue
Block a user