Files

steven_li 8aeb97a5fc feat(app): 移除内置agents并添加CORS支持和技能上传优化

移除了agents/registry.json中的所有内置agents配置，将agents数组清空。
为web应用添加了CORS中间件支持，允许指定的前端地址跨域访问。
重构了技能上传功能，增加了LLM重写机制，自动规范化上传的技能格式。
新增了工具名称提取逻辑，从技能正文中自动识别Required Tools段落。
更新了技能学习候选者和草稿的载荷结构，添加评估报告统计信息。
修改了意图路由技能的说明，改进任务状态管理逻辑。

2026-06-12 13:25:20 +08:00

25 KiB

Raw Blame History

Beaver Product Discovery Report

Date: 2026-06-09

Product stage: existing product

Scope: the whole Beaver product, including deployment, runtime, UI, Agent execution, tasks, files, tools, skills, memory, connectors, scheduled work, governance, validation, launch, and maintenance.

Executive Summary

Beaver is an enterprise Agent sandbox and execution platform. Its product promise is to move AI from "chat that gives answers" to "controlled Agent work that creates deliverables, records evidence, asks for acceptance, and turns accepted work into reusable capability."

The strongest product wedge is not another chatbot UI. It is the full execution loop:

user request
  -> task recognition
  -> Agent/team execution
  -> tool and file work
  -> evidence timeline
  -> user acceptance or revision
  -> skill and memory learning
  -> future reuse

The current codebase already supports major parts of this loop: multi-instance Docker deployment, auth portal, app instances, chat workbench, task center, task details, user acceptance, files, tools, skills, skill learning, marketplace, settings, connectors, scheduled jobs, and backend Agent team orchestration. The next product challenge is packaging these capabilities into a clear buyer story, validating the highest-value use cases, hardening operational reliability, and making governance understandable to non-engineer stakeholders.

Product Summary

Product Description

Beaver is a private-deployable Agent workspace for teams that need AI to perform work, not only answer questions. A user can chat, upload files, trigger tasks, review execution evidence, accept or revise results, manage tools, install or publish skills, configure model providers, connect external systems, and run scheduled work.

Target Users

Segment	Primary Need	Why Beaver Fits
Enterprise AI platform owner	Provide controlled Agent capability to teams	Private deployment, per-instance boundaries, tools, skills, governance
Knowledge workflow team	Finish recurring multi-step work faster	Task execution, files, tools, acceptance, scheduled work
Project / delivery team	Produce and revise deliverables with traceability	Task timeline, artifacts, evidence, revision loop
Engineering / support team	Use AI with files, commands, logs, and review	Tool execution, task evidence, multi-agent planning
Operations / admin	Configure models, users, connectors, and instances	Auth portal, deploy control, settings, status, logs
Skill owner / reviewer	Turn successful work into reusable methods	Skill candidates, drafts, safety/eval reports, review, publish

Current Feature Map

Domain	Current State	Product Meaning
Auth and onboarding	Auth portal, register/login, model provider onboarding	Users can enter a controlled workspace
Multi-instance deployment	Deploy control creates isolated app-instance containers; router proxy routes by host	Enables per-user or per-team sandboxing
Chat workbench	Conversations, attachments, task cards, current task progress, acceptance controls	Main user workspace
Task runtime	Auto task recognition, task creation, runs, timeline, status, acceptance	Converts chat into managed work
Agent execution	Unified engine, main agent, sub-agent/team execution, sequence/parallel/DAG coordinator	Handles complex work beyond one response
Tools	Built-in tools, MCP tools, tool management UI	Lets Agents act on files, web, terminal, integrations
Files	Workspace file browser, upload, preview, download, delete	Gives AI and users a shared working surface
Skills	Published skills, candidates, drafts, safety/eval, review, publish	Turns accepted work into reusable methods
Marketplace	Skill discovery/install flow	Foundation for capability distribution
Memory	Backend long-term memory foundation exists, product integration still incomplete	Future compounding personalization and organization knowledge
Scheduled work	Cron jobs, notifications, scheduled task flows	Moves from reactive chat to proactive work
Connectors	Outlook and external connector architecture; Feishu/Weixin-related sidecar paths	Brings Agent into real business channels
Settings/status/logs	Provider config, agent config, channel config, runtime status, restart	Admin control and troubleshooting

Current Value Proposition

For enterprise teams:

Beaver provides a private Agent workspace where AI work is executed, tracked, reviewed, and reused. It gives teams the speed of AI assistance with the control needed for real business workflows.

For product pilots:

Beaver is strongest when a team has recurring knowledge work that crosses files, tools, systems, and reviews.

Current Challenges

Challenge	Why It Matters
Product breadth is large	Buyers may not understand what to adopt first
Memory is partly backend-ready but not fully productized	"越用越懂" story needs visible control
Connector maturity varies by channel	Customer demos must avoid overpromising
Multi-instance deployment is powerful but operationally sensitive	Pilot success depends on stable setup and clear runbooks
Skill learning needs strong governance	Reuse can become risk if publishing is weak
Customer research is not yet captured	Current roadmap is inferred from implementation and product judgment

User Segments

Segment 1: Enterprise AI Platform Owner

They need to safely introduce Agent capability into an organization. Their concern is not whether an LLM can answer a question; it is whether teams can use it without losing control of data, tools, cost, and quality.

Segment 2: Workflow Owner

They own a recurring process such as weekly reporting, project status, proposal drafting, research, operations follow-up, support triage, or document review. They want less manual coordination and more repeatable output.

Segment 3: Individual Knowledge Worker

They want one workspace where they can chat, upload files, run tools, generate artifacts, and continue a task until the output is usable.

Segment 4: Admin / Operator

They need to create instances, configure models, monitor status, debug logs, manage connectors, and keep deployment safe.

Segment 5: Skill Maintainer

They curate reusable skills, review drafts, evaluate safety, publish stable versions, and prevent low-quality automation from spreading.

JTBD

User	Job Story	Current Alternative	Beaver Outcome
Platform owner	When teams ask for AI tools, I want a controlled Agent workspace so they can experiment without unmanaged SaaS sprawl	ChatGPT accounts, custom scripts, internal demos	Private, governed Agent workspace
Workflow owner	When a recurring process takes many manual steps, I want AI to execute and track it so my team can review the result	Manual docs, spreadsheets, Slack/email coordination	Task with timeline, artifacts, acceptance
Knowledge worker	When I ask AI to produce something, I want to revise and accept it as work, not just receive a message	Chat thread and copy/paste	Task lifecycle and deliverable loop
Admin	When a user registers, I want a workspace created and routed automatically so onboarding is repeatable	Manual container setup	Auth portal + deploy control + router proxy
Skill maintainer	When a task succeeds, I want to turn its method into a reusable skill so future tasks improve	Prompt docs, tribal knowledge	Skill candidate/draft/review/publish
Security reviewer	When Agents use tools, I want evidence and boundaries so I can audit behavior	Opaque model/tool calls	Tool traces, task evidence, instance sandbox

Opportunity Areas

Opportunity scores are qualitative estimates from current docs and product context. They need validation with customer interviews and pilot data.

Opportunity	Importance	Current Satisfaction	Opportunity Score	Notes
I need AI outputs to become reviewable tasks, not loose chat replies	0.95	0.30	0.67	Core wedge
I need evidence of what the Agent did	0.90	0.35	0.59	Governance driver
I need repeatable workflows to become reusable skills	0.85	0.40	0.51	Learning moat
I need private deployment and instance boundaries	0.90	0.45	0.50	Enterprise adoption
I need AI to work across files, tools, and external systems	0.85	0.45	0.47	Workflow depth
I need proactive scheduled work, not only reactive chat	0.70	0.45	0.39	Expansion opportunity
I need memory that I can inspect and control	0.80	0.25	0.60	High future leverage

Top opportunities:

Make AI work reviewable and acceptable.
Make process evidence and governance visible.
Turn accepted work into reusable skills and memory.

Product Positioning

Recommended primary positioning:

Beaver is an enterprise Agent execution and governance platform for repeatable knowledge work.

Supporting message:

It gives teams a private Agent sandbox where AI can use tools, manage files, execute tasks, record evidence, ask for acceptance, and learn reusable skills from approved work.

Avoid positioning Beaver as:

A generic chatbot.
A pure model gateway.
A standalone RPA replacement.
A developer-only Agent framework.
A marketplace-only skill product.

Competitive Frame

Category	Strength	Gap Beaver Addresses
AI chat apps	Fast answers and content generation	Weak task lifecycle, evidence, acceptance, and reuse
RPA / automation	Repeatable process execution	Rigid flows, harder natural-language adaptation
Agent frameworks	Developer flexibility	Missing complete user workspace and governance surface
Internal scripts	Fast local automation	Poor product UX, auditability, onboarding, and scaling
Enterprise AI platforms	Governance and admin	Often weak on task-level execution and skill learning loop

Product Ideas

Generated from PM, design, and engineering perspectives.

PM Ideas

Pilot Workflow Templates: package 3-5 high-value workflows such as weekly report, project brief, support triage, document review.
Team Workspace Mode: group multiple users under one organization workspace with shared skills and controlled memory.
Governance Scorecard: show evidence coverage, accepted tasks, skill reuse, failed runs, and tool risk.
Skill Quality Lifecycle: strengthen candidate -> draft -> safety -> eval -> review -> publish -> version rollback.
ROI Dashboard: measure time saved, accepted tasks, revision rounds, reusable skill adoption.

Design Ideas

Work Inbox: unify tasks, scheduled runs, notifications, and pending reviews.
Task Evidence Narrative: convert raw events into readable "what happened" timeline.
Memory Control Center: show what Beaver remembers, why, source, confidence, and edit/delete controls.
First-Run Product Tour: guide a new user from provider setup to first accepted task.
Admin Health Console: one page for instance, provider, connector, queue, and runtime health.

Engineering Ideas

Tenant/Workspace Policy Profiles: control allowed tools, connectors, memory behavior, and publish gates per deployment.
Connector Sandbox Layer: test external channel actions without touching production systems.
Unified Evidence Schema: normalize task, tool, artifact, skill, memory, and connector events.
Replay-Based Skill Evaluation: evaluate skill drafts against historical accepted runs.
Instance Lifecycle Automation: suspend, resume, backup, restore, rotate secrets, inspect health.

Top 5 product ideas to pursue:

Rank	Idea	Why Selected	Assumptions
1	Pilot Workflow Templates	Gives customers a concrete starting point	Initial buyers share common workflows
2	Task Evidence Narrative	Makes governance understandable	Reviewers value readable evidence
3	Memory Control Center	Unlocks long-term differentiation	Users trust memory if they can inspect/control it
4	Governance Scorecard	Helps buyers justify adoption	Platform owners need measurable proof
5	Instance Lifecycle Automation	Reduces pilot operational risk	Deployments will grow beyond a few instances

Key Assumptions

Assumption	Category	Impact	Uncertainty
Enterprise teams feel enough pain with chat-only AI to adopt an Agent workspace	Value	High	Medium
Task acceptance is a meaningful quality signal	Value	High	Medium
Users will tolerate a task workflow instead of expecting instant chat only	Usability	High	Medium
Per-instance deployment is operationally acceptable for early customers	Feasibility	High	Medium
Workflow owners can identify repeatable tasks worth piloting	Value	High	Low
Skill reuse creates visible productivity gains	Business Viability	High	High
Memory control is required before customers trust long-term memory	Trust	High	Medium
Connectors are necessary for customer stickiness	Value	Medium	Medium
Admins can manage model provider configuration without heavy support	Usability	Medium	Medium
The team can maintain broad product surface without quality drift	Team Capability	High	High

Prioritized Assumptions

P0 Validate Immediately

Assumption	Why It Matters	What Could Go Wrong	Validation
Customers prefer task-based AI execution over chat-only for real work	Core product wedge	Users see tasks as overhead	Run 3 workflow pilots and compare chat-only vs task loop
Evidence timeline increases trust	Governance story depends on it	Evidence is too technical or noisy	Reviewer usability test with task timelines
Private multi-instance deployment is acceptable	Adoption depends on ops fit	Setup too fragile or expensive	Deploy pilot on fresh Linux host and measure time/errors
Accepted tasks can generate reusable skills that users value	Learning loop depends on this	Skills are low quality or unused	Track reuse of skills from accepted pilot tasks

P1 Important

Assumption	Why It Matters	Validation
Memory control center is required before broad rollout	Trust and differentiation	Interview pilot admins and users
Connectors drive retention	External systems make workflows real	Compare pilot workflows with and without Outlook/IM connectors
Scheduled work creates high-value usage	Moves Beaver from reactive to proactive	Test weekly report and reminder workflows
Marketplace/skill distribution is a buyer requirement	Scaling reuse across teams	Ask platform owners during procurement

P2 Later

Assumption	Why It Matters	Validation
Multi-user team workspace is required for first paid pilots	Could reshape architecture	Validate with buyer interviews
Fine-grained per-tool policies are needed in UI	Admin complexity	Observe support requests
Cross-instance organization analytics is needed early	Enterprise reporting	Validate after 2-3 pilots

Opportunity Solution Tree

Desired outcome:

Within 90 days, prove that a pilot team can complete repeatable AI-assisted work with acceptance, evidence, and reuse: at least 30 accepted tasks, 5 reusable skills, 2 recurring workflows, and 0 critical deployment/security incidents.

Outcome: Trusted repeatable Agent work in pilot teams

Opportunity 1: I need AI outputs to become reviewable deliverables.
  Solution 1.1: Task lifecycle with acceptance and revision.
    Experiment: Run a project brief workflow and measure accepted output rate.
  Solution 1.2: Task details page with evidence narrative.
    Experiment: Ask reviewers to reconstruct what happened from timeline.
  Solution 1.3: Work Inbox for pending reviews and scheduled outputs.
    Experiment: Fake-door navigation item and measure clicks/asks.

Opportunity 2: I need confidence that Agent tool use is controlled.
  Solution 2.1: Tool traces and artifact timeline.
    Experiment: Security review of 5 real tasks.
  Solution 2.2: Admin health and policy console.
    Experiment: Operator performs setup/debug checklist on fresh instance.
  Solution 2.3: Connector sandbox and side-effect journals.
    Experiment: Test external send/reply flows in sandbox mode.

Opportunity 3: I need successful work to become reusable.
  Solution 3.1: Skill candidate -> draft -> review -> publish.
    Experiment: Convert 5 accepted tasks into skills and track reuse.
  Solution 3.2: Memory Control Center.
    Experiment: Prototype memory review UI and test trust/comprehension.
  Solution 3.3: Pilot workflow templates.
    Experiment: Package 3 templates and measure first-task success rate.

Validation Experiments

Assumption	Hypothesis	Experiment	Duration	Success Criteria
Task loop beats chat-only	Users complete more usable work with task acceptance than plain chat	Same workflow performed in chat-only and Beaver task loop	1 week	Beaver output accepted in fewer revision rounds
Evidence creates trust	Reviewers can understand and audit what happened	Give 5 timelines to reviewers	2 days	>=80% identify tools, artifacts, result, and risk
Deployment is pilot-ready	Fresh host setup is repeatable	Deploy on clean Linux/WSL2 machine using docs	1 day	Setup under 2 hours with no undocumented step
Skills create reuse	Accepted tasks can become useful skills	Convert 5 pilot tasks into skills	2 weeks	3 skills reused at least twice
Memory needs control UI	Users trust memory more with inspect/edit/delete	Clickable prototype or simple page	3 days	>=80% say they would enable memory with controls
Scheduled work matters	Recurring workflows create repeat usage	Weekly report or reminder pilot	2-4 weeks	At least 2 recurring jobs run and get accepted outputs

Feature Prioritization

Must Have

Feature	Impact	Effort	Risk	Reason
Auth portal and instance onboarding	High	High	Medium	Required for any user to start
Provider configuration flow	High	Medium	Medium	Model access is prerequisite
Chat workbench	High	High	Medium	Primary user surface
Task lifecycle and acceptance	High	High	Medium	Core differentiation
Task timeline/evidence	High	High	Medium	Governance and review
Files workspace	High	Medium	Medium	Most real workflows need files
Tool management	High	Medium	High	Agents need controlled action surface
Skills review/publish	High	High	High	Reuse loop
Settings/status/logs	High	Medium	Medium	Operational support
Basic deployment guide/runbook	High	Medium	Medium	Pilot readiness

Should Have

Feature	Impact	Effort	Risk	Reason
Pilot workflow templates	High	Medium	Low	Creates adoption path
Evidence narrative layer	High	Medium	Medium	Makes audit readable
Memory Control Center	High	High	Medium	Unlocks long-term trust
Skill replay/eval hardening	High	High	High	Makes learning safer
Scheduled workflow polish	Medium	Medium	Medium	Supports proactive use cases
Connector onboarding polish	Medium	High	High	Needed for real systems
Admin health console	Medium	Medium	Medium	Reduces support load

Could Have

Feature	Reason
Multi-user organization workspace	Valuable, but changes scope and permissions
Cross-instance analytics	Useful after multiple deployments
Fine-grained policy UI	Need policy demand before UI complexity
Audit export	Strong sales support, not first pilot blocker
Cost/quality model router	Useful after usage volume grows

Not Yet

Feature	Reason
Broad public SaaS launch	Product and ops need pilot hardening first
Fully autonomous publish of skills	Human review should remain mandatory
Production writes through connectors without review	Trust risk
Complex enterprise RBAC before pilot validation	May overbuild before segment clarity

Customer Research Plan

No direct interview transcripts were provided. Research should start immediately before locking roadmap.

Participants

5 knowledge workers with recurring document/report/research workflows.
3 workflow owners or team leads.
3 enterprise AI platform/admin stakeholders.
2 security or IT reviewers.
2 engineers/operators who would deploy and maintain Beaver.

Questions

What recurring work is painful enough to delegate to an Agent?
What would make an AI output "acceptable" instead of just "interesting"?
What evidence do you need to trust Agent work?
What systems must the Agent connect to for the workflow to matter?
What would make you stop a pilot?
What memory or reuse behavior feels helpful vs risky?
What does a successful 30-day pilot need to prove?

Interview Guide

Opening

We are studying how teams move AI from chat into real work. We are not asking whether you like an idea. We want examples of work you recently did.

Current Behavior

Walk me through the last time you used AI for a real work deliverable.
What happened after the AI gave an answer?
What did you copy, edit, verify, or redo manually?
Who reviewed the result?

Pain

What was the slowest or most annoying part?
What made the output hard to trust?
What tools or files were involved?
What evidence did you need but did not have?

Reuse

Have you repeated a similar workflow since then?
Did you reuse prompts, templates, scripts, or notes?
What would make that reuse safe for a team?

Governance

What AI actions would need approval?
What data or tools should be off limits?
Who needs to see the history of what happened?

Pilot

Which one workflow would you test first?
What result would make you expand usage?
What failure would make you stop?

Recommended Next 30 Days

Pick 2-3 pilot workflows: project brief, weekly report, document review, support triage, or file processing.
Run fresh deployment rehearsal from README/deployment guide and record gaps.
Define pilot learning questions and instrument the events needed to answer them.
Create a task evidence narrative prototype on top of existing timeline data.
Package pilot workflow templates as skills or documented demos.
Validate provider onboarding with 3 non-engineer users.
Run security review for file boundaries, tool execution, connectors, and deploy-control exposure.
Decide which connector(s) are pilot-safe.

Recommended Next 90 Days

Complete Memory Control Center MVP.
Harden skill learning with replay/eval and publish gates.
Add Admin Health Console for provider, instance, connector, task queue, and runtime status.
Improve instance lifecycle: suspend, resume, backup, restore, rotate secrets.
Add customer-facing pilot scorecard.
Formalize tool/connector policy profiles.
Expand pilot from one workflow to one department.
Build audit export after evidence narrative stabilizes.

Biggest Risks

Risk	Severity	Mitigation
Product appears too broad and hard to adopt	High	Lead with pilot workflows and task loop
Deployment complexity blocks pilots	High	Rehearsed runbook, health checks, support checklist
Agent actions cause unintended side effects	Critical	Conservative tool policy, explicit connector sandboxing, evidence logs
Task evidence is too technical	High	Evidence narrative and reviewer testing
Skill learning publishes poor methods	High	Human review, safety/eval, replay validation
Memory feels creepy or uncontrollable	High	Memory control UI before broad enablement
Team spreads effort across too many modules	High	Prioritize task loop, evidence, skills, deployment reliability

Recommended Immediate Actions

Reframe all main product docs around Beaver as an Agent execution and governance platform.
Treat Skill Replay Eval as a subfeature under the skill governance loop.
Build the next roadmap around pilot workflows, not isolated modules.
Make accepted tasks the main success metric.
Productize memory and evidence before adding many new connectors.
Prove deployment repeatability before selling broad private deployments.

25 KiB Raw Blame History