Files
beaver_project/docs/superpowers/plans/2026-06-15-plugin-skill-mirroring.md

59 KiB

Plugin Skill Mirroring And Upgrade Learning Implementation Plan

For agentic workers: REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (- [ ]) syntax for tracking.

Goal: Add declarative Beaver plugins whose skills are mirrored as normal managed skills, learn normally, and merge plugin upgrades through the existing safety, replay evaluation, review, publish, and rollback lifecycle.

Architecture: A new beaver.plugins package discovers and validates beaver.plugin.json, computes content and full-tree hashes, persists enable/sync state, and stages immutable upstream/version trees before atomic promotion under a workspace write lock. Plugin upgrades become deterministic plugin_skill_update learning candidates using old upstream, current local, and new upstream inputs; the existing learning pipeline remains the only path for update publication, with sync-time reconciliation repairing failed state acknowledgements.

Tech Stack: Python dataclasses and file-backed JSON stores, existing SkillSpecStore and skill-learning pipeline, FastAPI, pytest, Next.js/TypeScript, existing shadcn UI components.


Scope

This plan implements declarative skill plugins only. Do not add Python plugin entrypoints, hooks, providers, channels, dependency installation, or marketplace download support. Plugin-provided tools continue to use MCP.

File Structure

Create focused plugin modules:

  • app-instance/backend/beaver/plugins/models.py: manifest, discovery, state, and sync result dataclasses.
  • app-instance/backend/beaver/plugins/manifest.py: JSON parsing, identifier validation, and contained-path validation.
  • app-instance/backend/beaver/plugins/hashing.py: canonical skill-content and full-tree hashing.
  • app-instance/backend/beaver/plugins/tree_merge.py: deterministic three-way supporting-file merge plans.
  • app-instance/backend/beaver/plugins/state.py: atomic .beaver/plugins/state.json persistence.
  • app-instance/backend/beaver/plugins/discovery.py: scan workspace and configured plugin roots.
  • app-instance/backend/beaver/plugins/transaction.py: same-filesystem staging and immutable directory promotion.
  • app-instance/backend/beaver/plugins/skills.py: initial mirror, update classification, candidate creation, reconciliation, pause/resume, disable, and adopt.
  • app-instance/backend/beaver/plugins/__init__.py: public exports.
  • app-instance/backend/beaver/foundation/utils/file_lock.py: reentrant cross-process workspace write lock.

Modify skill lifecycle modules:

  • app-instance/backend/beaver/skills/specs/models.py: add upstream snapshot and draft provenance models.
  • app-instance/backend/beaver/skills/specs/storage.py: persist immutable upstream snapshots and safely copy supporting files.
  • app-instance/backend/beaver/memory/skills/store.py: lock candidate existence checks and JSONL mutations.
  • app-instance/backend/beaver/skills/drafts/service.py: create plugin update drafts.
  • app-instance/backend/beaver/skills/learning/service.py: synthesize plugin_skill_update.
  • app-instance/backend/beaver/skills/learning/synthesizer.py: three-way plugin merge prompt and result.
  • app-instance/backend/beaver/skills/learning/eval.py: plugin merge preservation report.
  • app-instance/backend/beaver/skills/learning/pipeline.py: acknowledge successful plugin update publication.
  • app-instance/backend/beaver/skills/publisher/service.py: carry draft provenance into published versions.

Modify runtime and management surfaces:

  • app-instance/backend/beaver/foundation/config/schema.py
  • app-instance/backend/beaver/foundation/config/loader.py
  • app-instance/backend/beaver/engine/loader.py
  • app-instance/backend/beaver/interfaces/web/app.py
  • app-instance/frontend/types/index.ts
  • app-instance/frontend/lib/api.ts
  • app-instance/frontend/app/(app)/skills/page.tsx

Add tests:

  • app-instance/backend/tests/unit/test_plugin_manifest.py
  • app-instance/backend/tests/unit/test_plugin_hashing.py
  • app-instance/backend/tests/unit/test_plugin_state.py
  • app-instance/backend/tests/unit/test_workspace_write_lock.py
  • app-instance/backend/tests/unit/test_plugin_skill_storage.py
  • app-instance/backend/tests/unit/test_plugin_skill_sync.py
  • app-instance/backend/tests/unit/test_plugin_skill_learning.py
  • app-instance/backend/tests/unit/test_plugin_runtime.py
  • app-instance/backend/tests/unit/test_plugin_web_api.py
  • app-instance/frontend/lib/plugin-api.test.ts

Task 1: Add Plugin Configuration And Manifest Models

Files:

  • Create: app-instance/backend/beaver/plugins/models.py

  • Create: app-instance/backend/beaver/plugins/manifest.py

  • Create: app-instance/backend/beaver/plugins/hashing.py

  • Create: app-instance/backend/beaver/plugins/__init__.py

  • Modify: app-instance/backend/beaver/foundation/config/schema.py

  • Modify: app-instance/backend/beaver/foundation/config/loader.py

  • Modify: app-instance/backend/beaver/foundation/config/__init__.py

  • Test: app-instance/backend/tests/unit/test_plugin_manifest.py

  • Test: app-instance/backend/tests/unit/test_plugin_hashing.py

  • Test: app-instance/backend/tests/unit/test_config_loader.py

  • Step 1: Write failing manifest validation tests

Create tests covering:

def test_load_plugin_manifest_accepts_declared_skill(tmp_path: Path) -> None:
    root = tmp_path / "comic"
    (root / "skills" / "comic").mkdir(parents=True)
    (root / "skills" / "comic" / "SKILL.md").write_text("# Comic\n", encoding="utf-8")
    (root / "beaver.plugin.json").write_text(
        json.dumps(
            {
                "schema_version": 1,
                "id": "baoyu-comic",
                "name": "Baoyu Comic",
                "version": "1.2.0",
                "skills": [{"name": "baoyu-comic", "path": "skills/comic"}],
            }
        ),
        encoding="utf-8",
    )

    manifest = load_plugin_manifest(root / "beaver.plugin.json")

    assert manifest.plugin_id == "baoyu-comic"
    assert manifest.skills[0].name == "baoyu-comic"
    assert manifest.skills[0].root == root / "skills" / "comic"


@pytest.mark.parametrize("value", ["../outside", "/absolute", "skills/../../outside"])
def test_load_plugin_manifest_rejects_escaping_skill_path(tmp_path: Path, value: str) -> None:
    path = tmp_path / "beaver.plugin.json"
    path.write_text(
        json.dumps(
            {
                "schema_version": 1,
                "id": "unsafe",
                "name": "Unsafe",
                "version": "1.0.0",
                "skills": [{"name": "unsafe", "path": value}],
            }
        ),
        encoding="utf-8",
    )

    with pytest.raises(ValueError, match="contained"):
        load_plugin_manifest(path)

Also test invalid IDs, duplicate skill names, unsupported schema versions, missing SKILL.md, and symlinked skill roots.

Add tree-hash tests:

def test_skill_tree_hash_changes_when_supporting_file_changes(tmp_path: Path) -> None:
    root = tmp_path / "skill"
    root.mkdir()
    (root / "SKILL.md").write_text("# Skill\n", encoding="utf-8")
    (root / "templates").mkdir()
    template = root / "templates" / "report.md"
    template.write_text("v1", encoding="utf-8")

    first = hash_plugin_skill_tree(root)
    template.write_text("v2", encoding="utf-8")
    second = hash_plugin_skill_tree(root)

    assert first.skill_content_hash == second.skill_content_hash
    assert first.skill_tree_hash != second.skill_tree_hash

Also verify path changes and executable-bit changes affect skill_tree_hash, while mtime and non-executable permission changes do not.

  • Step 2: Run tests and verify failure

Run:

cd app-instance/backend
pytest tests/unit/test_plugin_manifest.py tests/unit/test_plugin_hashing.py tests/unit/test_config_loader.py -q

Expected: FAIL because beaver.plugins and PluginsConfig do not exist.

  • Step 3: Implement immutable plugin models and config

Put plugin package models in beaver/plugins/models.py:

@dataclass(frozen=True, slots=True)
class PluginSkillDeclaration:
    name: str
    relative_path: str
    root: Path


@dataclass(frozen=True, slots=True)
class PluginManifest:
    schema_version: int
    plugin_id: str
    name: str
    version: str
    root: Path
    manifest_path: Path
    display_path: str
    skills: tuple[PluginSkillDeclaration, ...]


@dataclass(frozen=True, slots=True)
class PluginSkillFileDigest:
    path: str
    size: int
    executable: bool
    content_hash: str


@dataclass(frozen=True, slots=True)
class PluginSkillTreeDigest:
    skill_content_hash: str
    skill_tree_hash: str
    files: tuple[PluginSkillFileDigest, ...]


Put configuration in beaver/foundation/config/schema.py to preserve the foundation layer and avoid importing plugin runtime modules from config:

@dataclass(slots=True)
class PluginsConfig:
    search_paths: list[str] = field(default_factory=list)
    auto_sync: bool = True

Add plugins: PluginsConfig to BeaverConfig. Parse both camelCase and snake_case:

def _parse_plugins(raw: Any) -> PluginsConfig:
    data = _as_dict(raw)
    return PluginsConfig(
        search_paths=_string_list(data.get("searchPaths") or data.get("search_paths")),
        auto_sync=_bool(data.get("autoSync") if "autoSync" in data else data.get("auto_sync"), default=True),
    )
  • Step 4: Implement strict JSON manifest loading

load_plugin_manifest() must:

  1. parse a JSON object;
  2. require schema version 1;
  3. validate identifiers with ^[a-z0-9][a-z0-9_-]*$;
  4. resolve every skill root and check resolved.is_relative_to(plugin_root);
  5. reject symlinks in the path from plugin root to skill root;
  6. require a regular SKILL.md;
  7. initialize display_path without exposing an absolute path;
  8. return frozen dataclasses.
  • Step 5: Implement deterministic dual hashing

hash_plugin_skill_tree(root) must:

  1. reject symlinks and non-regular files;
  2. enumerate regular files by normalized POSIX relative path;
  3. compute skill_content_hash from normalized SKILL.md;
  4. compute skill_tree_hash from each path, byte length, file bytes, and one normalized executable-bit flag;
  5. include SKILL.md and every supporting file;
  6. exclude Beaver metadata such as version.json and upstream.json;
  7. ignore mtime, uid/gid, and non-executable mode bits.

Use length-prefixed binary fields in the digest input instead of ambiguous string concatenation.

  • Step 6: Run focused tests
cd app-instance/backend
pytest tests/unit/test_plugin_manifest.py tests/unit/test_plugin_hashing.py tests/unit/test_config_loader.py -q

Expected: PASS.

  • Step 7: Commit
git add app-instance/backend/beaver/plugins app-instance/backend/beaver/foundation/config app-instance/backend/tests/unit/test_plugin_manifest.py app-instance/backend/tests/unit/test_plugin_hashing.py app-instance/backend/tests/unit/test_config_loader.py
git commit -m "feat(plugins): add declarative skill manifest"

Task 2: Add Discovery And Atomic Plugin State

Files:

  • Create: app-instance/backend/beaver/plugins/discovery.py

  • Create: app-instance/backend/beaver/plugins/state.py

  • Create: app-instance/backend/beaver/foundation/utils/file_lock.py

  • Modify: app-instance/backend/beaver/plugins/models.py

  • Modify: app-instance/backend/beaver/plugins/__init__.py

  • Test: app-instance/backend/tests/unit/test_plugin_state.py

  • Test: app-instance/backend/tests/unit/test_workspace_write_lock.py

  • Step 1: Write failing discovery and state tests

Cover workspace discovery, configured search paths, duplicate plugin IDs, malformed manifests reported as errors instead of crashing the full scan, and state round trips:

def test_plugin_state_round_trip_is_atomic(tmp_path: Path) -> None:
    store = PluginStateStore(tmp_path)
    store.set_enabled("baoyu-comic", True)
    store.update_skill_binding(
        "baoyu-comic",
        "baoyu-comic",
        PluginSkillBinding(
            accepted_upstream_tree_hash="old",
            observed_upstream_tree_hash="new",
            accepted_beaver_version="v0001",
            current_beaver_version="v0002",
            pending_candidate_id="plugin-update:baoyu-comic:baoyu-comic:new",
            status="update_pending",
        ),
    )

    reloaded = PluginStateStore(tmp_path).get_plugin("baoyu-comic")

    assert reloaded is not None
    assert reloaded.enabled is True
    assert reloaded.skills["baoyu-comic"].accepted_upstream_tree_hash == "old"
    assert not (tmp_path / ".beaver" / "plugins" / "state.json.tmp").exists()

Add a multiprocess lock test in which two processes enter the same workspace lock and assert their critical sections never overlap. Add a reentrancy test in which nested acquisitions in one process complete without deadlock.

  • Step 2: Run tests and verify failure
cd app-instance/backend
pytest tests/unit/test_plugin_state.py tests/unit/test_workspace_write_lock.py -q

Expected: FAIL because discovery and state stores are missing.

  • Step 3: Implement state dataclasses

Add backward-compatible to_dict() and from_dict() methods for:

@dataclass(slots=True)
class PluginSkillBinding:
    accepted_upstream_tree_hash: str | None = None
    observed_upstream_tree_hash: str | None = None
    accepted_beaver_version: str | None = None
    current_beaver_version: str | None = None
    pending_candidate_id: str | None = None
    status: str = "discovered"
    last_error: str | None = None


@dataclass(slots=True)
class PluginState:
    plugin_id: str
    enabled: bool = False
    updates_paused: bool = False
    installed_version: str | None = None
    manifest_path: str | None = None
    status: str = "discovered"
    last_error: str | None = None
    skills: dict[str, PluginSkillBinding] = field(default_factory=dict)
  • Step 4: Implement atomic state persistence

Store data at <workspace>/.beaver/plugins/state.json. Write a complete JSON document to state.json.tmp, flush it, then replace state.json. Public methods:

list_plugins()
get_plugin(plugin_id)
set_enabled(plugin_id, enabled)
upsert_plugin(plugin_state)
update_skill_binding(plugin_id, skill_name, binding)
  • Step 5: Implement the shared workspace write lock

Add:

class WorkspaceWriteLock:
    def __init__(self, workspace: str | Path) -> None:
        self.path = Path(workspace) / ".beaver" / "locks" / "plugin-skill-write.lock"

    @contextmanager
    def acquire(self, *, timeout_seconds: float | None = None, blocking: bool = True):
        ...

Requirements:

  • use fcntl.flock() on POSIX and msvcrt.locking() on Windows, matching memory/curated/store.py;

  • guard with a process-local threading.RLock;

  • track per-thread recursion depth so nested store calls reuse the OS lock;

  • support non-blocking acquisition for Engine boot;

  • raise WorkspaceWriteLockBusy on timeout/contention;

  • keep the lock file separate from atomically replaced data files.

  • Step 6: Implement discovery

Scan:

  1. <workspace>/plugins;
  2. each configured plugins.search_paths.

Only direct child directories containing beaver.plugin.json are plugins. Return a PluginDiscoveryResult containing valid manifests and per-path errors. Duplicate IDs are errors and neither duplicate is activated. Discovery records a workspace-relative manifest display path when possible and a redacted <external>/<plugin-dir>/beaver.plugin.json path otherwise; absolute paths remain internal.

  • Step 7: Run focused tests
cd app-instance/backend
pytest tests/unit/test_plugin_state.py tests/unit/test_workspace_write_lock.py tests/unit/test_plugin_manifest.py -q

Expected: PASS.

  • Step 8: Commit
git add app-instance/backend/beaver/plugins app-instance/backend/beaver/foundation/utils/file_lock.py app-instance/backend/tests/unit/test_plugin_state.py app-instance/backend/tests/unit/test_workspace_write_lock.py
git commit -m "feat(plugins): discover packages and persist state"

Task 3: Persist Immutable Upstream Skill Snapshots

Files:

  • Create: app-instance/backend/beaver/plugins/transaction.py

  • Modify: app-instance/backend/beaver/skills/specs/models.py

  • Modify: app-instance/backend/beaver/skills/specs/storage.py

  • Modify: app-instance/backend/beaver/skills/specs/__init__.py

  • Test: app-instance/backend/tests/unit/test_plugin_skill_storage.py

  • Step 1: Write failing snapshot storage tests

Test exact content, supporting files, idempotence, symlink rejection, and source immutability:

def test_write_upstream_snapshot_copies_skill_without_mutating_source(tmp_path: Path) -> None:
    source = tmp_path / "plugin" / "skills" / "comic"
    source.mkdir(parents=True)
    (source / "SKILL.md").write_text("# Comic\n\nOriginal.\n", encoding="utf-8")
    (source / "templates").mkdir()
    (source / "templates" / "panel.txt").write_text("panel", encoding="utf-8")
    store = SkillSpecStore(tmp_path / "workspace")
    transaction = PluginSkillTransaction(tmp_path / "workspace")

    snapshot = store.stage_upstream_snapshot(
        transaction,
        skill_name="baoyu-comic",
        source_kind="plugin",
        source_id="baoyu-comic",
        source_version="1.0.0",
        source_path="skills/comic",
        source_root=source,
    )
    store.promote_upstream_snapshot(transaction, snapshot)

    loaded = store.read_upstream_snapshot("baoyu-comic", "baoyu-comic", snapshot.skill_tree_hash)
    assert loaded is not None
    assert loaded.content == "# Comic\n\nOriginal.\n"
    assert (loaded.root / "templates" / "panel.txt").read_text(encoding="utf-8") == "panel"
    assert (source / "SKILL.md").read_text(encoding="utf-8") == "# Comic\n\nOriginal.\n"

Also test:

  • changing only templates/panel.txt creates a different snapshot directory;

  • SkillVersion.from_dict() remains compatible without tree_hash;

  • reading a legacy version derives its complete tree hash;

  • staging does not make a snapshot visible to read_upstream_snapshot();

  • promoting a staged snapshot uses os.replace() and is idempotent;

  • a failed metadata write leaves no current pointer to the staged version.

  • Step 2: Run test and verify failure

cd app-instance/backend
pytest tests/unit/test_plugin_skill_storage.py -q

Expected: FAIL because upstream snapshot APIs do not exist.

  • Step 3: Add upstream snapshot models

Add:

@dataclass(slots=True)
class SkillUpstreamSnapshot:
    skill_name: str
    source_kind: str
    source_id: str
    source_version: str
    source_path: str
    skill_content_hash: str
    skill_tree_hash: str
    created_at: str
    frontmatter: dict[str, Any] = field(default_factory=dict)

Add LoadedSkillUpstreamSnapshot(snapshot, content, root) for storage reads. Extend SkillVersion with a backward-compatible tree_hash: str = ""; new versions persist the complete version-tree hash, while read_published_skill() derives it for legacy metadata that lacks the field.

  • Step 4: Add safe tree-copy helper

Refactor a private SkillSpecStore._copy_regular_tree(source_root, target_root) that:

  • rejects any symlink;
  • rejects paths containing empty, ., or .. segments;
  • copies regular files only;
  • creates parents;
  • never writes outside target_root.

Use it for transaction staging now; Task 4 will reuse it for mirrored versions.

  • Step 5: Implement same-filesystem staging and promotion

PluginSkillTransaction creates:

<workspace>/.beaver/staging/plugin-skills/<transaction-id>/

The staging root must be on the same filesystem as <workspace>/skills. It exposes:

stage_upstream_snapshot(...)
stage_skill_version(...)
promote_directory(staged, final)
cleanup()

promote_directory() uses os.replace() and never replaces an existing non-identical immutable directory. Cleanup removes only the transaction's staging root.

  • Step 6: Implement snapshot APIs

Write snapshots to:

skills/<skill>/upstreams/<source-id>/<skill-tree-hash>/

The snapshot metadata stores both hashes. If the directory already exists, verify all stored metadata and return it without rewriting.

Public methods:

stage_upstream_snapshot(transaction, ...)
promote_upstream_snapshot(transaction, snapshot)
read_upstream_snapshot(skill_name, source_id, skill_tree_hash)
  • Step 7: Make JSON/current/index writes atomic

Change SkillSpecStore._write_json() and current/index pointer writes to create a temporary file in the target directory, flush and fsync, then os.replace(). Immutable version directories are promoted first; runtime visibility changes only when current.json, skill.json, and the published index are atomically replaced under the workspace lock.

  • Step 8: Run focused and existing storage tests
cd app-instance/backend
pytest tests/unit/test_plugin_skill_storage.py tests/unit/test_phase5_skills_runtime.py -q

Expected: PASS.

  • Step 9: Commit
git add app-instance/backend/beaver/plugins/transaction.py app-instance/backend/beaver/skills/specs app-instance/backend/tests/unit/test_plugin_skill_storage.py
git commit -m "feat(skills): store immutable plugin upstream snapshots"

Task 4: Mirror Initial Plugin Skills As First-Class Skills

Files:

  • Create: app-instance/backend/beaver/plugins/skills.py

  • Modify: app-instance/backend/beaver/plugins/models.py

  • Modify: app-instance/backend/beaver/plugins/__init__.py

  • Modify: app-instance/backend/beaver/skills/specs/storage.py

  • Test: app-instance/backend/tests/unit/test_plugin_skill_sync.py

  • Step 1: Write failing initial mirror tests

Cover:

  • enabling mirrors SKILL.md and supporting files;
  • mirrored skill is returned by SkillsLoader.list_published_skills();
  • source_kind is plugin, but runtime source is still workspace;
  • existing non-plugin name collision fails without modification;
  • any validation/safety failure in a multi-skill plugin occurs before promotion and leaves every linked skill unchanged;
  • repeated sync is idempotent.
  • supporting files are present in the promoted version;
  • concurrent enable calls allocate only one version.

Core assertion:

result = manager.enable("baoyu-comic")
record = SkillsLoader(workspace).get_skill_record("baoyu-comic")
loaded = SkillSpecStore(workspace).read_published_skill("baoyu-comic")

assert result.status == "synced"
assert record is not None and record.source == "workspace"
assert record.source_kind == "plugin"
assert loaded is not None
assert loaded.version.version == "v0001"
assert loaded.version.provenance["plugin_id"] == "baoyu-comic"
assert loaded.version.provenance["upstream_skill_content_hash"]
assert loaded.version.provenance["upstream_skill_tree_hash"]
  • Step 2: Run tests and verify failure
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py -q

Expected: FAIL because PluginManager does not exist.

  • Step 3: Implement PluginManager constructor and discovery view

Constructor dependencies:

class PluginManager:
    def __init__(
        self,
        *,
        workspace: Path,
        manifests: dict[str, PluginManifest],
        discovery_errors: list[PluginDiscoveryError],
        state_store: PluginStateStore,
        skill_store: SkillSpecStore,
        learning_store: SkillLearningStore,
        publisher: SkillPublisher,
        safety_checker: SkillDraftSafetyChecker,
        write_lock: WorkspaceWriteLock,
    ) -> None:
        ...

Keep all filesystem and lifecycle dependencies injectable for tests.

  • Step 4: Implement exact initial mirror publication

Acquire the workspace write lock before reading state, allocating versions, or writing candidates. For each declared skill:

  1. persist the upstream snapshot;
  2. validate ownership conflict;
  3. parse frontmatter/body and create an in-memory SkillDraft with proposal_kind="plugin_initial_mirror";
  4. run SkillDraftSafetyChecker.check() and reject failed or critical reports;
  5. allocate the next vNNNN while holding the lock;
  6. stage a SkillVersion whose content exactly equals upstream SKILL.md;
  7. stage snapshot supporting files into the version directory;
  8. generate the complete next SkillSpec, current pointer, index, and plugin-state JSON payloads in memory.

Use provenance:

{
    "source_kind": "plugin",
    "plugin_id": manifest.plugin_id,
    "plugin_version": manifest.version,
    "plugin_skill_path": declaration.relative_path,
    "upstream_skill_content_hash": snapshot.skill_content_hash,
    "upstream_skill_tree_hash": snapshot.skill_tree_hash,
    "merge_mode": "initial_mirror",
}
  • Step 5: Promote the complete staged transaction

After every declared skill passes validation:

  1. for a new skill, promote its complete staged skill directory with one os.replace();
  2. for an existing skill, promote immutable upstream/version directories, atomically replace spec/index metadata, and replace current.json last as the visibility switch;
  3. atomically write plugin state last;
  4. clean the staging directory.

Do not implement reverse rollback across already-promoted immutable directories. If a metadata write fails, those directories remain unreferenced and harmless; the previous current pointers remain authoritative. Add startup cleanup for staging directories older than 24 hours.

  • Step 6: Run focused and loader tests
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_phase5_skills_runtime.py -q

Expected: PASS.

  • Step 7: Commit
git add app-instance/backend/beaver/plugins app-instance/backend/beaver/skills/specs/storage.py app-instance/backend/tests/unit/test_plugin_skill_sync.py
git commit -m "feat(plugins): mirror enabled plugin skills"

Task 5: Detect Upgrades And Create Idempotent Learning Candidates

Files:

  • Modify: app-instance/backend/beaver/plugins/skills.py

  • Modify: app-instance/backend/beaver/memory/skills/models.py

  • Modify: app-instance/backend/beaver/memory/skills/store.py

  • Test: app-instance/backend/tests/unit/test_plugin_skill_sync.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_candidate_state.py

  • Step 1: Write failing upgrade classification tests

Create four tree-hash fixtures representing B, L, and U:

@pytest.mark.parametrize(
    ("base", "local", "upstream", "expected"),
    [
        ("A", "A", "A", "unchanged"),
        ("A", "B", "B", "already_applied"),
        ("A", "A", "B", "fast_forward"),
        ("A", "LOCAL", "UPSTREAM", "three_way"),
    ],
)
def test_classify_plugin_skill_update(base: str, local: str, upstream: str, expected: str) -> None:
    assert classify_plugin_skill_update(base, local, upstream) == expected

Also test:

  • a supporting-file-only change returns fast_forward or three_way, never unchanged;

  • candidate ID stability across repeated sync;

  • new upstream supersedes an older pending candidate;

  • candidate evidence contains hashes/version references but no raw skill body;

  • legacy candidate payloads still parse.

  • two processes syncing the same update append only one candidate record.

  • Step 2: Run tests and verify failure

cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_candidate_state.py -q

Expected: FAIL because update classification and candidate kind are missing.

  • Step 3: Add plugin_skill_update candidate support

Do not add a special status. Existing candidate statuses remain sufficient. Ensure SkillLearningCandidate.from_dict() accepts the new kind without changing legacy defaults.

Use evidence:

{
    "plugin_id": plugin_id,
    "plugin_version": manifest.version,
    "skill_name": skill_name,
    "merge_mode": merge_mode,
    "base_upstream_tree_hash": accepted_tree_hash,
    "new_upstream_tree_hash": snapshot.skill_tree_hash,
    "local_version": current.version.version,
}

Set priority=10, confidence=1.0, trigger_reason="plugin_update".

  • Step 4: Implement update classification and candidate creation

Use canonical hashes and deterministic IDs:

candidate_id = (
    f"plugin-update:{plugin_id}:{skill_name}:"
    f"{new_upstream_tree_hash[:12]}"
)

For already_applied, advance state without a candidate. For fast_forward and three_way, record an open candidate. If the same ID exists in any status, do not append another JSONL record.

  • Step 5: Make candidate mutation atomic under the shared lock

Add an optional WorkspaceWriteLock to SkillLearningStore; EngineLoader supplies the shared workspace instance, while isolated unit-test construction falls back to a store-local lock. Add:

record_learning_candidate_if_absent(candidate) -> tuple[SkillLearningCandidate, bool]

Inside one lock acquisition, read current candidates, check the deterministic ID, and atomically rewrite or append the JSONL record. Apply the same lock to candidate update and transition methods. Nested calls from PluginManager reuse the reentrant lock.

  • Step 6: Supersede stale pending updates

When a different pending candidate exists for the same plugin skill:

learning_store.transition_learning_candidate(
    old_candidate_id,
    "superseded",
    event_type="plugin_update_superseded",
    payload={"replacement_candidate_id": new_candidate_id},
)
  • Step 7: Run focused tests
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_candidate_state.py -q

Expected: PASS.

  • Step 8: Commit
git add app-instance/backend/beaver/plugins/skills.py app-instance/backend/beaver/memory/skills/models.py app-instance/backend/beaver/memory/skills/store.py app-instance/backend/tests/unit/test_plugin_skill_sync.py app-instance/backend/tests/unit/test_skill_learning_candidate_state.py
git commit -m "feat(plugins): enqueue skill upgrade candidates"

Task 6: Add Plugin Update Draft Provenance And Fast-Forward Synthesis

Files:

  • Modify: app-instance/backend/beaver/skills/specs/models.py

  • Modify: app-instance/backend/beaver/skills/drafts/service.py

  • Modify: app-instance/backend/beaver/skills/publisher/service.py

  • Modify: app-instance/backend/beaver/skills/learning/service.py

  • Test: app-instance/backend/tests/unit/test_plugin_skill_learning.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_pipeline.py

  • Step 1: Write failing model and fast-forward tests

Test backward-compatible draft parsing and exact upstream fast-forward:

draft = asyncio.run(service.synthesize_draft(candidate.candidate_id, provider_bundle))

assert draft.proposal_kind == "plugin_skill_update"
assert draft.proposed_content == new_upstream.content
assert draft.base_version == "v0001"
assert draft.provenance["merge_mode"] == "fast_forward"
assert draft.provenance["new_upstream_tree_hash"] == new_upstream.snapshot.skill_tree_hash
assert provider.calls == []

After publish, assert the new version contains the new upstream supporting files even when SKILL.md did not change.

  • Step 2: Run tests and verify failure
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_pipeline.py -q

Expected: FAIL because drafts have no provenance and the learning service has no plugin update branch.

  • Step 3: Add backward-compatible draft provenance

Extend SkillDraft:

provenance: dict[str, Any] = field(default_factory=dict)

Include it in to_dict() and parse missing values as {} in from_dict().

  • Step 4: Add a focused draft constructor

Add:

def create_plugin_update_draft(
    self,
    *,
    skill_name: str,
    base_version: str,
    proposed_content: str,
    proposed_frontmatter: dict,
    created_by: str,
    reason: str,
    provenance: dict,
    evidence_refs: list[dict] | None = None,
) -> SkillDraft:

It writes proposal_kind="plugin_skill_update".

  • Step 5: Implement fast-forward synthesis

In SkillLearningService.synthesize_draft(), branch before ordinary revision:

if candidate.kind == "plugin_skill_update":
    return await self._synthesize_plugin_update(candidate, provider_bundle)

For merge_mode == "fast_forward", load U from SkillSpecStore, parse its frontmatter/body, and create a draft exactly equal to U. Do not call the provider.

  • Step 6: Serialize all skill publication

Add an optional WorkspaceWriteLock to SkillPublisher; EngineLoader supplies the shared workspace instance and isolated tests use a publisher-local fallback. Hold it across _next_version(), version staging/promotion, spec/current/index replacement, rollback, and disable. This protects ordinary learned skills as well as plugin-origin skills from racing with boot or explicit plugin sync.

  • Step 7: Materialize referenced supporting files during publish

For proposal_kind="plugin_skill_update", resolve the snapshot and supporting-file plan from draft provenance. Stage the complete next version directory, including SKILL.md and supporting files, before promoting it. Reject missing snapshots, path conflicts, or tree-hash mismatches. Ordinary skill publication keeps its current behavior.

  • Step 8: Preserve draft provenance on publish

Change SkillPublisher.publish() provenance construction to:

provenance={
    **dict(draft.provenance),
    "draft_id": draft_id,
    "proposal_kind": draft.proposal_kind,
    "trigger_run_id": draft.trigger_run_id,
    "trigger_session_id": draft.trigger_session_id,
}
  • Step 9: Run focused tests
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_pipeline.py -q

Expected: PASS.

  • Step 10: Commit
git add app-instance/backend/beaver/skills app-instance/backend/tests/unit/test_plugin_skill_learning.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py
git commit -m "feat(skill-learning): create plugin update drafts"

Task 7: Implement Three-Way Plugin Skill Synthesis

Files:

  • Create: app-instance/backend/beaver/plugins/tree_merge.py

  • Modify: app-instance/backend/beaver/skills/learning/synthesizer.py

  • Modify: app-instance/backend/beaver/skills/learning/service.py

  • Test: app-instance/backend/tests/unit/test_plugin_skill_learning.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_synthesizer_preservation.py

  • Step 1: Write failing three-way prompt and parse tests

Assert the prompt contains labeled OLD UPSTREAM, CURRENT LOCAL, and NEW UPSTREAM sections and does not confuse the current local version with the merge base.

Test response parsing for:

{
  "frontmatter": {"name": "baoyu-comic", "description": "Comic workflow", "tools": []},
  "content": "# Baoyu Comic\n...",
  "change_reason": "Adopt upstream layout while preserving learned review step.",
  "preserved_local_sections": ["Review"],
  "adopted_upstream_sections": ["Panel Layout"],
  "resolved_conflicts": ["Output ordering"],
  "dropped_sections": []
}

Add supporting-file merge tests:

def test_supporting_file_merge_adopts_upstream_when_local_is_unchanged() -> None:
    plan = merge_supporting_file_trees(base={"a.txt": "A"}, local={"a.txt": "A"}, upstream={"a.txt": "U"})
    assert plan.files["a.txt"].source == "upstream"
    assert plan.conflicts == []


def test_supporting_file_merge_blocks_divergent_edits() -> None:
    plan = merge_supporting_file_trees(base={"a.txt": "A"}, local={"a.txt": "L"}, upstream={"a.txt": "U"})
    assert plan.conflicts[0].path == "a.txt"
  • Step 2: Run tests and verify failure
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_synthesizer_preservation.py -q

Expected: FAIL because three-way synthesis does not exist.

  • Step 3: Add synthesize_plugin_update()

Signature:

async def synthesize_plugin_update(
    self,
    candidate: SkillLearningCandidate,
    evidence_packet: EvidencePacket,
    provider: LLMProvider,
    model: str,
    *,
    old_upstream: dict[str, Any],
    current_local: dict[str, Any],
    new_upstream: dict[str, Any],
) -> dict[str, Any]:

The system message must require JSON only and state:

  • preserve valid local learning;

  • adopt upstream fixes and safety changes;

  • do not concatenate duplicate sections;

  • list every intentional drop;

  • leave resolved_conflicts empty only when no semantic conflict exists.

  • Step 4: Load all three snapshots in the learning service

Resolve:

  • B using base_upstream_tree_hash;
  • L using local_version;
  • U using new_upstream_tree_hash.

Raise a specific ValueError when any referenced snapshot/version is missing. Do not fallback to a two-way merge.

  • Step 5: Build the deterministic supporting-file merge plan

Compare files by path and content/executable digest:

  • L == B: use U;
  • U == B: use L;
  • L == U: use either;
  • one-sided addition: use the added file;
  • divergent edit, different same-path additions, and delete-versus-edit: conflict.

Exclude SKILL.md because the synthesizer handles it. Store selected source references and conflict records in draft provenance; do not duplicate file bytes in JSON.

  • Step 6: Create the plugin update draft

Store merge decisions in draft provenance:

{
    **plugin_reference_fields,
    "merge_mode": "three_way",
    "preserved_local_sections": payload["preserved_local_sections"],
    "adopted_upstream_sections": payload["adopted_upstream_sections"],
    "resolved_conflicts": payload["resolved_conflicts"],
    "dropped_sections": payload["dropped_sections"],
    "supporting_file_plan": supporting_file_plan.to_dict(),
}

If the supporting-file plan contains conflicts, the draft may be inspected but cannot be published. V1 does not ask the LLM to merge arbitrary or binary files.

  • Step 7: Run focused tests
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_synthesizer_preservation.py -q

Expected: PASS.

  • Step 8: Commit
git add app-instance/backend/beaver/plugins/tree_merge.py app-instance/backend/beaver/skills/learning app-instance/backend/tests/unit/test_plugin_skill_learning.py app-instance/backend/tests/unit/test_skill_learning_synthesizer_preservation.py
git commit -m "feat(skill-learning): synthesize three-way plugin updates"

Task 8: Extend Replay Preservation For Plugin Merges

Files:

  • Modify: app-instance/backend/beaver/skills/learning/preservation.py

  • Modify: app-instance/backend/beaver/skills/learning/eval.py

  • Modify: app-instance/backend/beaver/skills/learning/pipeline.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_preservation.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_eval.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_pipeline.py

  • Step 1: Write failing plugin merge preservation tests

Cover:

  • merged draft preserves local Safety and adopts new upstream Safety;
  • silently dropping either Safety section fails;
  • explicitly resolved non-safety conflicts pass;
  • unresolved conflicts block publish;
  • unresolved supporting-file conflicts block publish;
  • baseline replay remains current local L.

Expected report shape:

assert report.preservation_report == {
    "mode": "plugin_three_way",
    "passed": True,
    "local": {...},
    "upstream": {...},
    "unresolved_conflicts": [],
}
  • Step 2: Run tests and verify failure
cd app-instance/backend
pytest tests/unit/test_skill_learning_preservation.py tests/unit/test_skill_learning_eval.py tests/unit/test_skill_learning_pipeline.py -q

Expected: FAIL because preservation only checks one base skill.

  • Step 3: Add plugin merge preservation helper

Add:

def check_plugin_merge_preservation(
    *,
    local_content: str,
    upstream_content: str,
    draft_content: str,
    merge_decisions: dict[str, Any],
) -> dict[str, Any]:

It calls existing check_preservation() for local and upstream content, gives Safety and Required Tools sections blocking weight, and reports unresolved conflicts separately.

  • Step 4: Use current local as replay baseline

When draft.proposal_kind == "plugin_skill_update", load draft.base_version as the baseline skill. Continue to run the candidate arm with the draft context. Do not use raw upstream B or U as the replay baseline.

  • Step 5: Tighten publish gate

Add:

if draft.proposal_kind == "plugin_skill_update":
    preservation = eval_report.preservation_report or {}
    if preservation.get("mode") != "plugin_three_way" and draft.provenance.get("merge_mode") == "three_way":
        raise ValueError("Plugin update requires a three-way preservation report")
    if preservation.get("unresolved_conflicts"):
        raise ValueError("Plugin update has unresolved merge conflicts")
    if draft.provenance.get("supporting_file_plan", {}).get("conflicts"):
        raise ValueError("Plugin update has unresolved supporting-file conflicts")

The existing passed is False gate remains active.

  • Step 6: Run focused tests
cd app-instance/backend
pytest tests/unit/test_skill_learning_preservation.py tests/unit/test_skill_learning_eval.py tests/unit/test_skill_learning_pipeline.py -q

Expected: PASS.

  • Step 7: Commit
git add app-instance/backend/beaver/skills/learning app-instance/backend/tests/unit/test_skill_learning_preservation.py app-instance/backend/tests/unit/test_skill_learning_eval.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py
git commit -m "feat(skill-learning): gate plugin merge preservation"

Task 9: Reconcile Publication And Implement Pause/Disable/Adopt

Files:

  • Modify: app-instance/backend/beaver/plugins/skills.py

  • Modify: app-instance/backend/beaver/skills/learning/pipeline.py

  • Modify: app-instance/backend/beaver/skills/publisher/service.py

  • Test: app-instance/backend/tests/unit/test_plugin_skill_sync.py

  • Test: app-instance/backend/tests/unit/test_skill_learning_pipeline.py

  • Step 1: Write failing lifecycle tests

Test:

  • publishing a plugin update advances accepted upstream tree hash;

  • pending candidate clears;

  • simulated observer failure leaves the published version intact;

  • the next sync reconciles state from current version provenance and does not recreate the candidate;

  • reconciliation never moves accepted_beaver_version backwards after rollback;

  • pause leaves linked skills active and creates no update candidates;

  • resume reconciles and syncs;

  • disabling plugin disables linked skills without deletion;

  • re-enable restores and syncs;

  • missing package sets plugin status missing, suspends sync, and leaves linked skills active;

  • adopt changes source_kind to managed, removes binding, and keeps the skill active.

  • Step 2: Run tests and verify failure

cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_pipeline.py -q

Expected: FAIL because publication has no plugin acknowledgement callback.

  • Step 3: Add a narrow publication observer

Extend pipeline construction with:

publish_observer: Callable[[SkillDraft, SkillVersion | SkillSpec], None] | None = None

After successful publish, call it before returning. Observer failure must be recorded and audited as plugin_publish_ack_failed; it must not delete the already-published version or turn the publish API response into a failure. Mark the learning candidate published before invoking the best-effort observer so clients do not retry a successful publish. The next sync is responsible for reconciliation.

  • Step 4: Implement PluginManager.on_skill_published()

For proposal_kind="plugin_skill_update":

  1. validate plugin ID, skill name, and new upstream tree hash from draft provenance;
  2. set accepted_upstream_tree_hash = new_upstream_tree_hash;
  3. set observed_upstream_tree_hash = new_upstream_tree_hash;
  4. set accepted_beaver_version = published.version;
  5. set current_beaver_version = published.version;
  6. clear pending_candidate_id;
  7. set status synced.
  • Step 5: Implement sync-time reconciliation

At the beginning of sync_enabled(), inspect each linked skill's current published version. When provenance contains:

{
    "proposal_kind": "plugin_skill_update",
    "plugin_id": plugin_id,
    "new_upstream_tree_hash": tree_hash,
}

and the referenced upstream snapshot exists, advance state only if the current version number is newer than accepted_beaver_version. Clear only the matching pending candidate. Never regress state when the runtime current pointer was rolled back to an older version.

  • Step 6: Implement pause, resume, disable, missing, and adopt

pause(plugin_id) sets updates_paused=True and leaves linked skills unchanged. resume(plugin_id) clears the flag and performs reconciliation/sync.

disable(plugin_id, disable_linked_skills=True) rejects calls without the explicit confirmation and calls SkillPublisher.disable() for every still-linked skill. adopt(plugin_id, skill_name):

  • requires an existing binding;
  • changes SkillSpec.source_kind to managed;
  • appends adopted_from_plugin:<plugin-id> to lineage;
  • removes the binding;
  • leaves the current version active.

When discovery cannot find a previously known plugin, set status missing, preserve enabled and updates_paused, skip update generation, and do not disable any linked skill.

  • Step 7: Run focused tests
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_pipeline.py -q

Expected: PASS.

  • Step 8: Commit
git add app-instance/backend/beaver/plugins/skills.py app-instance/backend/beaver/skills/learning/pipeline.py app-instance/backend/beaver/skills/publisher/service.py app-instance/backend/tests/unit/test_plugin_skill_sync.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py
git commit -m "feat(plugins): track published updates and ownership"

Task 10: Wire Plugin Sync Into Engine Loading

Files:

  • Modify: app-instance/backend/beaver/engine/loader.py

  • Modify: app-instance/backend/beaver/plugins/__init__.py

  • Test: app-instance/backend/tests/unit/test_plugin_runtime.py

  • Test: app-instance/backend/tests/unit/test_phase5_skills_runtime.py

  • Step 1: Write failing runtime assembly tests

Test:

  • discovered disabled plugins do not mirror;

  • enabled plugin mirrors before EngineLoadResult.skills is calculated;

  • changed plugin creates a candidate but never calls an LLM during boot;

  • repeated boot creates no duplicate versions/candidates;

  • concurrent multi-process boot creates no duplicate versions/candidates;

  • boot skips auto-sync and reports deferred_lock_busy when an explicit sync holds the workspace lock;

  • EngineLoadResult.plugin_manager and plugin summaries are available.

  • Step 2: Run tests and verify failure

cd app-instance/backend
pytest tests/unit/test_plugin_runtime.py tests/unit/test_phase5_skills_runtime.py -q

Expected: FAIL because EngineLoader does not assemble plugin services.

  • Step 3: Extend EngineLoadResult and loader injection

Add:

plugin_manager: PluginManager | None = None
plugins: list[dict] = field(default_factory=list)

Allow plugin_manager injection in EngineLoader.__init__() for tests.

  • Step 4: Assemble in dependency order

Required order:

  1. config/workspace;
  2. SkillSpecStore, learning store, and SkillsLoader;
  3. tool registry and builtins, including skill-view tools using that loader;
  4. draft/review/publisher and a safety checker using the completed tool registry;
  5. discovery and PluginStateStore;
  6. PluginManager;
  7. plugin_manager.sync_enabled(blocking=False) when config.plugins.auto_sync;
  8. learning service/pipeline with publication observer;
  9. result summaries.

Do not use SkillsLoader.extra_dirs for plugin skills. Explicit API enable/sync uses a bounded blocking lock timeout; Engine boot uses a non-blocking attempt and proceeds with the current published skill set if another writer owns the lock.

  • Step 5: Run runtime tests
cd app-instance/backend
pytest tests/unit/test_plugin_runtime.py tests/unit/test_phase5_skills_runtime.py -q

Expected: PASS.

  • Step 6: Commit
git add app-instance/backend/beaver/engine/loader.py app-instance/backend/beaver/plugins app-instance/backend/tests/unit/test_plugin_runtime.py app-instance/backend/tests/unit/test_phase5_skills_runtime.py
git commit -m "feat(runtime): sync declarative plugins at boot"

Task 11: Add Plugin Management API

Files:

  • Modify: app-instance/backend/beaver/interfaces/web/app.py

  • Test: app-instance/backend/tests/unit/test_plugin_web_api.py

  • Step 1: Write failing API tests

Cover:

GET  /api/plugins
POST /api/plugins/sync
POST /api/plugins/{plugin_id}/enable
POST /api/plugins/{plugin_id}/pause
POST /api/plugins/{plugin_id}/resume
POST /api/plugins/{plugin_id}/disable
POST /api/plugins/{plugin_id}/skills/{skill_name}/adopt

Assert 404 for unknown plugin, 409 for skill ownership conflict, and 400 for invalid manifest/sync errors. Assert lock timeout maps to 409 plugin_write_busy. Assert no payload contains the real absolute workspace or external search-root path. Assert disable without {"disable_linked_skills": true} is rejected.

  • Step 2: Run tests and verify failure
cd app-instance/backend
pytest tests/unit/test_plugin_web_api.py -q

Expected: FAIL with missing routes.

  • Step 3: Add normalized plugin payload helper

Return:

{
    "id": manifest.plugin_id,
    "name": manifest.name,
    "discovered_version": manifest.version,
    "installed_version": state.installed_version,
    "enabled": state.enabled,
    "status": state.status,
    "last_error": state.last_error,
    "manifest_path": manifest.display_path,
    "updates_paused": state.updates_paused,
    "skills": [
        {
            "name": declaration.name,
            "status": binding.status,
            "current_beaver_version": binding.current_beaver_version,
            "accepted_upstream_tree_hash": binding.accepted_upstream_tree_hash,
            "observed_upstream_tree_hash": binding.observed_upstream_tree_hash,
            "accepted_beaver_version": binding.accepted_beaver_version,
            "pending_candidate_id": binding.pending_candidate_id,
        }
    ],
}

Never return arbitrary plugin file content, secrets, or absolute server paths.

  • Step 4: Implement routes

Each mutating endpoint boots one runtime, invokes its plugin_manager, and returns the updated plugin payload. Map ValueError messages to stable HTTP status codes.

  • Step 5: Run focused and existing web tests
cd app-instance/backend
pytest tests/unit/test_plugin_web_api.py tests/unit/test_skill_learning_web_api.py -q

Expected: PASS.

  • Step 6: Commit
git add app-instance/backend/beaver/interfaces/web/app.py app-instance/backend/tests/unit/test_plugin_web_api.py
git commit -m "feat(api): manage declarative plugins"

Task 12: Add Plugin Management To The Skills UI

Files:

  • Modify: app-instance/frontend/types/index.ts

  • Modify: app-instance/frontend/lib/api.ts

  • Modify: app-instance/frontend/app/(app)/skills/page.tsx

  • Test: app-instance/frontend/lib/plugin-api.test.ts

  • Step 1: Write failing API client tests

Test URL, method, and response typing for list, sync, enable, pause, resume, disable, and adopt.

  • Step 2: Run frontend test and verify failure

Run the repository's existing frontend test command targeting:

cd app-instance/frontend
npx vitest run lib/plugin-api.test.ts

Expected: FAIL because plugin API functions do not exist.

  • Step 3: Add frontend types

Add:

export interface PluginSkillBinding {
  name: string;
  status: string;
  current_beaver_version?: string | null;
  accepted_upstream_tree_hash?: string | null;
  observed_upstream_tree_hash?: string | null;
  accepted_beaver_version?: string | null;
  pending_candidate_id?: string | null;
}

export interface BeaverPlugin {
  id: string;
  name: string;
  discovered_version?: string | null;
  installed_version?: string | null;
  enabled: boolean;
  updates_paused: boolean;
  status: string;
  last_error?: string | null;
  manifest_path?: string | null;
  skills: PluginSkillBinding[];
}
  • Step 4: Add API functions

Implement:

listPlugins()
syncPlugins()
enablePlugin(pluginId)
pausePlugin(pluginId)
resumePlugin(pluginId)
disablePlugin(pluginId, { disable_linked_skills: true })
adoptPluginSkill(pluginId, skillName)
  • Step 5: Add a plugins Skills tab

Extend SkillsTab and render a compact table with:

  • plugin name and versions;
  • enabled/status badges;
  • linked skills and pending candidate link;
  • icon buttons with tooltips for sync, enable, pause, resume, disable, and adopt;
  • confirmation before disable/adopt;
  • missing-source warning stating that current skills remain active but updates are suspended;
  • existing runAction() and error handling.

Do not add a separate marketing-style page or nested cards.

  • Step 6: Label plugin-origin skills and update candidates

In existing Published/Candidates/Drafts views:

  • show Plugin source badge when source_kind === "plugin";

  • render plugin_skill_update as 插件升级合并 / Plugin update merge;

  • show fast_forward or three_way from candidate evidence/provenance.

  • Step 7: Run frontend tests and type checks

cd app-instance/frontend
npx vitest run lib/plugin-api.test.ts
npm run lint
npx tsc --noEmit

Expected: PASS.

  • Step 8: Commit
git add app-instance/frontend/types/index.ts app-instance/frontend/lib/api.ts app-instance/frontend/lib/plugin-api.test.ts 'app-instance/frontend/app/(app)/skills/page.tsx'
git commit -m "feat(skills-ui): manage plugin skill mirrors"

Task 13: Add End-To-End Lifecycle Coverage And Documentation

Files:

  • Create: app-instance/backend/tests/integration/test_plugin_skill_lifecycle.py

  • Create: docs/plugins/skill-plugins.md

  • Modify: docs/product-discovery/beaver/README.md

  • Step 1: Write the end-to-end lifecycle test

The test must:

  1. create plugin 1.0.0;
  2. enable it and assert mirror v0001;
  3. publish a normal learned local revision v0002;
  4. replace the package with plugin 1.1.0;
  5. sync and assert one three_way candidate;
  6. synthesize with a stub provider;
  7. run safety and replay evaluation with a stub runner;
  8. submit, approve, and publish v0003;
  9. assert accepted upstream tree hash and provenance advanced;
  10. rollback to v0002;
  11. assert plugin source files were never modified;
  12. update only a supporting file and assert a new update candidate is created;
  13. simulate publish-observer failure and assert the next sync reconciles state;
  14. remove the plugin package and assert the plugin is missing while the current skill remains active;
  15. run two sync processes and assert no duplicate version or candidate is created.
  • Step 2: Run the integration test and fix only lifecycle defects
cd app-instance/backend
pytest tests/integration/test_plugin_skill_lifecycle.py -v

Expected: PASS.

  • Step 3: Write operator documentation

Document:

  • package layout and manifest;

  • discovery roots;

  • explicit enable requirement;

  • mirror and three-way merge behavior;

  • dual content/tree hashing and supporting-file merge conflicts;

  • update candidate review flow;

  • pause/resume versus disable/adopt;

  • recovery from missing/invalid plugins;

  • workspace locking, deferred boot sync, and publication reconciliation;

  • why plugin Python code is not executed in V1.

  • Step 4: Run the complete relevant backend suite

cd app-instance/backend
pytest \
  tests/unit/test_plugin_manifest.py \
  tests/unit/test_plugin_hashing.py \
  tests/unit/test_plugin_state.py \
  tests/unit/test_workspace_write_lock.py \
  tests/unit/test_plugin_skill_storage.py \
  tests/unit/test_plugin_skill_sync.py \
  tests/unit/test_plugin_skill_learning.py \
  tests/unit/test_plugin_runtime.py \
  tests/unit/test_plugin_web_api.py \
  tests/unit/test_skill_learning_candidate_state.py \
  tests/unit/test_skill_learning_pipeline.py \
  tests/unit/test_skill_learning_eval.py \
  tests/unit/test_skill_learning_worker.py \
  tests/unit/test_phase5_skills_runtime.py \
  tests/integration/test_plugin_skill_lifecycle.py \
  -q

Expected: PASS.

  • Step 5: Run frontend verification
cd app-instance/frontend
npx vitest run lib/plugin-api.test.ts
npm run lint
npx tsc --noEmit

Expected: PASS.

  • Step 6: Run a dirty-worktree-safe diff review
git status --short
git diff --check
git diff --stat

Expected:

  • no whitespace errors;

  • only plugin/skill lifecycle files and planned docs/tests are included in this feature;

  • unrelated pre-existing user changes remain untouched.

  • Step 7: Commit

git add app-instance/backend/tests/integration/test_plugin_skill_lifecycle.py docs/plugins/skill-plugins.md docs/product-discovery/beaver/README.md
git commit -m "docs(plugins): document skill mirror lifecycle"

Release Sequence

  1. Ship backend manifest, state, snapshots, and initial mirror behind the Plugins API.
  2. Enable update candidate generation after initial mirror tests pass in a real workspace.
  3. Enable three-way synthesis and replay publish gates.
  4. Ship the Plugins UI.
  5. Keep executable plugin code disabled; design it separately with process isolation and permission boundaries.

Rollout Metrics

Track:

  • plugin discovery and manifest error count;
  • initial mirror success/failure count;
  • plugin update candidates created, superseded, rejected, and published;
  • plugin update candidates caused by supporting-file-only changes;
  • fast-forward versus three-way update ratio;
  • write-lock contention and deferred boot sync count;
  • publication reconciliation repair count;
  • replay regression and preservation failure rate;
  • time from upstream discovery to accepted publication;
  • rollback count for plugin-origin versions.

Final Acceptance Test

The feature is complete only when a plugin-origin skill can:

  1. be enabled and used with normal skill priority;
  2. accumulate a normal Beaver-learned revision;
  3. receive a newer upstream plugin version;
  4. produce a three-way update draft without editing the plugin package;
  5. pass the same safety, replay, review, and publish gates as ordinary skills;
  6. retain full upstream and local provenance;
  7. detect and publish supporting-file-only updates;
  8. survive concurrent boot/sync without duplicate versions or candidates;
  9. recover plugin state after observer failure;
  10. remain active when its plugin package is temporarily missing;
  11. be paused, resumed, rolled back, disabled, re-enabled, or adopted without data loss.