# Plugin Skill Mirroring And Upgrade Learning Implementation Plan > **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking. **Goal:** Add declarative Beaver plugins whose skills are mirrored as normal managed skills, learn normally, and merge plugin upgrades through the existing safety, replay evaluation, review, publish, and rollback lifecycle. **Architecture:** A new `beaver.plugins` package discovers and validates `beaver.plugin.json`, computes content and full-tree hashes, persists enable/sync state, and stages immutable upstream/version trees before atomic promotion under a workspace write lock. Plugin upgrades become deterministic `plugin_skill_update` learning candidates using old upstream, current local, and new upstream inputs; the existing learning pipeline remains the only path for update publication, with sync-time reconciliation repairing failed state acknowledgements. **Tech Stack:** Python dataclasses and file-backed JSON stores, existing `SkillSpecStore` and skill-learning pipeline, FastAPI, pytest, Next.js/TypeScript, existing shadcn UI components. --- ## Scope This plan implements declarative skill plugins only. Do not add Python plugin entrypoints, hooks, providers, channels, dependency installation, or marketplace download support. Plugin-provided tools continue to use MCP. ## File Structure Create focused plugin modules: - `app-instance/backend/beaver/plugins/models.py`: manifest, discovery, state, and sync result dataclasses. - `app-instance/backend/beaver/plugins/manifest.py`: JSON parsing, identifier validation, and contained-path validation. - `app-instance/backend/beaver/plugins/hashing.py`: canonical skill-content and full-tree hashing. - `app-instance/backend/beaver/plugins/tree_merge.py`: deterministic three-way supporting-file merge plans. - `app-instance/backend/beaver/plugins/state.py`: atomic `.beaver/plugins/state.json` persistence. - `app-instance/backend/beaver/plugins/discovery.py`: scan workspace and configured plugin roots. - `app-instance/backend/beaver/plugins/transaction.py`: same-filesystem staging and immutable directory promotion. - `app-instance/backend/beaver/plugins/skills.py`: initial mirror, update classification, candidate creation, reconciliation, pause/resume, disable, and adopt. - `app-instance/backend/beaver/plugins/__init__.py`: public exports. - `app-instance/backend/beaver/foundation/utils/file_lock.py`: reentrant cross-process workspace write lock. Modify skill lifecycle modules: - `app-instance/backend/beaver/skills/specs/models.py`: add upstream snapshot and draft provenance models. - `app-instance/backend/beaver/skills/specs/storage.py`: persist immutable upstream snapshots and safely copy supporting files. - `app-instance/backend/beaver/memory/skills/store.py`: lock candidate existence checks and JSONL mutations. - `app-instance/backend/beaver/skills/drafts/service.py`: create plugin update drafts. - `app-instance/backend/beaver/skills/learning/service.py`: synthesize `plugin_skill_update`. - `app-instance/backend/beaver/skills/learning/synthesizer.py`: three-way plugin merge prompt and result. - `app-instance/backend/beaver/skills/learning/eval.py`: plugin merge preservation report. - `app-instance/backend/beaver/skills/learning/pipeline.py`: acknowledge successful plugin update publication. - `app-instance/backend/beaver/skills/publisher/service.py`: carry draft provenance into published versions. Modify runtime and management surfaces: - `app-instance/backend/beaver/foundation/config/schema.py` - `app-instance/backend/beaver/foundation/config/loader.py` - `app-instance/backend/beaver/engine/loader.py` - `app-instance/backend/beaver/interfaces/web/app.py` - `app-instance/frontend/types/index.ts` - `app-instance/frontend/lib/api.ts` - `app-instance/frontend/app/(app)/skills/page.tsx` Add tests: - `app-instance/backend/tests/unit/test_plugin_manifest.py` - `app-instance/backend/tests/unit/test_plugin_hashing.py` - `app-instance/backend/tests/unit/test_plugin_state.py` - `app-instance/backend/tests/unit/test_workspace_write_lock.py` - `app-instance/backend/tests/unit/test_plugin_skill_storage.py` - `app-instance/backend/tests/unit/test_plugin_skill_sync.py` - `app-instance/backend/tests/unit/test_plugin_skill_learning.py` - `app-instance/backend/tests/unit/test_plugin_runtime.py` - `app-instance/backend/tests/unit/test_plugin_web_api.py` - `app-instance/frontend/lib/plugin-api.test.ts` --- ### Task 1: Add Plugin Configuration And Manifest Models **Files:** - Create: `app-instance/backend/beaver/plugins/models.py` - Create: `app-instance/backend/beaver/plugins/manifest.py` - Create: `app-instance/backend/beaver/plugins/hashing.py` - Create: `app-instance/backend/beaver/plugins/__init__.py` - Modify: `app-instance/backend/beaver/foundation/config/schema.py` - Modify: `app-instance/backend/beaver/foundation/config/loader.py` - Modify: `app-instance/backend/beaver/foundation/config/__init__.py` - Test: `app-instance/backend/tests/unit/test_plugin_manifest.py` - Test: `app-instance/backend/tests/unit/test_plugin_hashing.py` - Test: `app-instance/backend/tests/unit/test_config_loader.py` - [x] **Step 1: Write failing manifest validation tests** Create tests covering: ```python def test_load_plugin_manifest_accepts_declared_skill(tmp_path: Path) -> None: root = tmp_path / "comic" (root / "skills" / "comic").mkdir(parents=True) (root / "skills" / "comic" / "SKILL.md").write_text("# Comic\n", encoding="utf-8") (root / "beaver.plugin.json").write_text( json.dumps( { "schema_version": 1, "id": "baoyu-comic", "name": "Baoyu Comic", "version": "1.2.0", "skills": [{"name": "baoyu-comic", "path": "skills/comic"}], } ), encoding="utf-8", ) manifest = load_plugin_manifest(root / "beaver.plugin.json") assert manifest.plugin_id == "baoyu-comic" assert manifest.skills[0].name == "baoyu-comic" assert manifest.skills[0].root == root / "skills" / "comic" @pytest.mark.parametrize("value", ["../outside", "/absolute", "skills/../../outside"]) def test_load_plugin_manifest_rejects_escaping_skill_path(tmp_path: Path, value: str) -> None: path = tmp_path / "beaver.plugin.json" path.write_text( json.dumps( { "schema_version": 1, "id": "unsafe", "name": "Unsafe", "version": "1.0.0", "skills": [{"name": "unsafe", "path": value}], } ), encoding="utf-8", ) with pytest.raises(ValueError, match="contained"): load_plugin_manifest(path) ``` Also test invalid IDs, duplicate skill names, unsupported schema versions, missing `SKILL.md`, and symlinked skill roots. Add tree-hash tests: ```python def test_skill_tree_hash_changes_when_supporting_file_changes(tmp_path: Path) -> None: root = tmp_path / "skill" root.mkdir() (root / "SKILL.md").write_text("# Skill\n", encoding="utf-8") (root / "templates").mkdir() template = root / "templates" / "report.md" template.write_text("v1", encoding="utf-8") first = hash_plugin_skill_tree(root) template.write_text("v2", encoding="utf-8") second = hash_plugin_skill_tree(root) assert first.skill_content_hash == second.skill_content_hash assert first.skill_tree_hash != second.skill_tree_hash ``` Also verify path changes and executable-bit changes affect `skill_tree_hash`, while mtime and non-executable permission changes do not. - [x] **Step 2: Run tests and verify failure** Run: ```bash cd app-instance/backend pytest tests/unit/test_plugin_manifest.py tests/unit/test_plugin_hashing.py tests/unit/test_config_loader.py -q ``` Expected: FAIL because `beaver.plugins` and `PluginsConfig` do not exist. - [x] **Step 3: Implement immutable plugin models and config** Put plugin package models in `beaver/plugins/models.py`: ```python @dataclass(frozen=True, slots=True) class PluginSkillDeclaration: name: str relative_path: str root: Path @dataclass(frozen=True, slots=True) class PluginManifest: schema_version: int plugin_id: str name: str version: str root: Path manifest_path: Path display_path: str skills: tuple[PluginSkillDeclaration, ...] @dataclass(frozen=True, slots=True) class PluginSkillFileDigest: path: str size: int executable: bool content_hash: str @dataclass(frozen=True, slots=True) class PluginSkillTreeDigest: skill_content_hash: str skill_tree_hash: str files: tuple[PluginSkillFileDigest, ...] ``` Put configuration in `beaver/foundation/config/schema.py` to preserve the foundation layer and avoid importing plugin runtime modules from config: ```python @dataclass(slots=True) class PluginsConfig: search_paths: list[str] = field(default_factory=list) auto_sync: bool = True ``` Add `plugins: PluginsConfig` to `BeaverConfig`. Parse both camelCase and snake_case: ```python def _parse_plugins(raw: Any) -> PluginsConfig: data = _as_dict(raw) return PluginsConfig( search_paths=_string_list(data.get("searchPaths") or data.get("search_paths")), auto_sync=_bool(data.get("autoSync") if "autoSync" in data else data.get("auto_sync"), default=True), ) ``` - [x] **Step 4: Implement strict JSON manifest loading** `load_plugin_manifest()` must: 1. parse a JSON object; 2. require schema version `1`; 3. validate identifiers with `^[a-z0-9][a-z0-9_-]*$`; 4. resolve every skill root and check `resolved.is_relative_to(plugin_root)`; 5. reject symlinks in the path from plugin root to skill root; 6. require a regular `SKILL.md`; 7. initialize `display_path` without exposing an absolute path; 8. return frozen dataclasses. - [x] **Step 5: Implement deterministic dual hashing** `hash_plugin_skill_tree(root)` must: 1. reject symlinks and non-regular files; 2. enumerate regular files by normalized POSIX relative path; 3. compute `skill_content_hash` from normalized `SKILL.md`; 4. compute `skill_tree_hash` from each path, byte length, file bytes, and one normalized executable-bit flag; 5. include `SKILL.md` and every supporting file; 6. exclude Beaver metadata such as `version.json` and `upstream.json`; 7. ignore mtime, uid/gid, and non-executable mode bits. Use length-prefixed binary fields in the digest input instead of ambiguous string concatenation. - [x] **Step 6: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_manifest.py tests/unit/test_plugin_hashing.py tests/unit/test_config_loader.py -q ``` Expected: PASS. - [x] **Step 7: Commit** ```bash git add app-instance/backend/beaver/plugins app-instance/backend/beaver/foundation/config app-instance/backend/tests/unit/test_plugin_manifest.py app-instance/backend/tests/unit/test_plugin_hashing.py app-instance/backend/tests/unit/test_config_loader.py git commit -m "feat(plugins): add declarative skill manifest" ``` --- ### Task 2: Add Discovery And Atomic Plugin State **Files:** - Create: `app-instance/backend/beaver/plugins/discovery.py` - Create: `app-instance/backend/beaver/plugins/state.py` - Create: `app-instance/backend/beaver/foundation/utils/file_lock.py` - Modify: `app-instance/backend/beaver/plugins/models.py` - Modify: `app-instance/backend/beaver/plugins/__init__.py` - Test: `app-instance/backend/tests/unit/test_plugin_state.py` - Test: `app-instance/backend/tests/unit/test_workspace_write_lock.py` - [x] **Step 1: Write failing discovery and state tests** Cover workspace discovery, configured search paths, duplicate plugin IDs, malformed manifests reported as errors instead of crashing the full scan, and state round trips: ```python def test_plugin_state_round_trip_is_atomic(tmp_path: Path) -> None: store = PluginStateStore(tmp_path) store.set_enabled("baoyu-comic", True) store.update_skill_binding( "baoyu-comic", "baoyu-comic", PluginSkillBinding( accepted_upstream_tree_hash="old", observed_upstream_tree_hash="new", accepted_beaver_version="v0001", current_beaver_version="v0002", pending_candidate_id="plugin-update:baoyu-comic:baoyu-comic:new", status="update_pending", ), ) reloaded = PluginStateStore(tmp_path).get_plugin("baoyu-comic") assert reloaded is not None assert reloaded.enabled is True assert reloaded.skills["baoyu-comic"].accepted_upstream_tree_hash == "old" assert not (tmp_path / ".beaver" / "plugins" / "state.json.tmp").exists() ``` Add a multiprocess lock test in which two processes enter the same workspace lock and assert their critical sections never overlap. Add a reentrancy test in which nested acquisitions in one process complete without deadlock. - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_state.py tests/unit/test_workspace_write_lock.py -q ``` Expected: FAIL because discovery and state stores are missing. - [x] **Step 3: Implement state dataclasses** Add backward-compatible `to_dict()` and `from_dict()` methods for: ```python @dataclass(slots=True) class PluginSkillBinding: accepted_upstream_tree_hash: str | None = None observed_upstream_tree_hash: str | None = None accepted_beaver_version: str | None = None current_beaver_version: str | None = None pending_candidate_id: str | None = None status: str = "discovered" last_error: str | None = None @dataclass(slots=True) class PluginState: plugin_id: str enabled: bool = False updates_paused: bool = False installed_version: str | None = None manifest_path: str | None = None status: str = "discovered" last_error: str | None = None skills: dict[str, PluginSkillBinding] = field(default_factory=dict) ``` - [x] **Step 4: Implement atomic state persistence** Store data at `/.beaver/plugins/state.json`. Write a complete JSON document to `state.json.tmp`, flush it, then replace `state.json`. Public methods: ```python list_plugins() get_plugin(plugin_id) set_enabled(plugin_id, enabled) upsert_plugin(plugin_state) update_skill_binding(plugin_id, skill_name, binding) ``` - [x] **Step 5: Implement the shared workspace write lock** Add: ```python class WorkspaceWriteLock: def __init__(self, workspace: str | Path) -> None: self.path = Path(workspace) / ".beaver" / "locks" / "plugin-skill-write.lock" @contextmanager def acquire(self, *, timeout_seconds: float | None = None, blocking: bool = True): ... ``` Requirements: - use `fcntl.flock()` on POSIX and `msvcrt.locking()` on Windows, matching `memory/curated/store.py`; - guard with a process-local `threading.RLock`; - track per-thread recursion depth so nested store calls reuse the OS lock; - support non-blocking acquisition for Engine boot; - raise `WorkspaceWriteLockBusy` on timeout/contention; - keep the lock file separate from atomically replaced data files. - [x] **Step 6: Implement discovery** Scan: 1. `/plugins`; 2. each configured `plugins.search_paths`. Only direct child directories containing `beaver.plugin.json` are plugins. Return a `PluginDiscoveryResult` containing valid manifests and per-path errors. Duplicate IDs are errors and neither duplicate is activated. Discovery records a workspace-relative manifest display path when possible and a redacted `//beaver.plugin.json` path otherwise; absolute paths remain internal. - [x] **Step 7: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_state.py tests/unit/test_workspace_write_lock.py tests/unit/test_plugin_manifest.py -q ``` Expected: PASS. - [x] **Step 8: Commit** ```bash git add app-instance/backend/beaver/plugins app-instance/backend/beaver/foundation/utils/file_lock.py app-instance/backend/tests/unit/test_plugin_state.py app-instance/backend/tests/unit/test_workspace_write_lock.py git commit -m "feat(plugins): discover packages and persist state" ``` --- ### Task 3: Persist Immutable Upstream Skill Snapshots **Files:** - Create: `app-instance/backend/beaver/plugins/transaction.py` - Modify: `app-instance/backend/beaver/skills/specs/models.py` - Modify: `app-instance/backend/beaver/skills/specs/storage.py` - Modify: `app-instance/backend/beaver/skills/specs/__init__.py` - Test: `app-instance/backend/tests/unit/test_plugin_skill_storage.py` - [x] **Step 1: Write failing snapshot storage tests** Test exact content, supporting files, idempotence, symlink rejection, and source immutability: ```python def test_write_upstream_snapshot_copies_skill_without_mutating_source(tmp_path: Path) -> None: source = tmp_path / "plugin" / "skills" / "comic" source.mkdir(parents=True) (source / "SKILL.md").write_text("# Comic\n\nOriginal.\n", encoding="utf-8") (source / "templates").mkdir() (source / "templates" / "panel.txt").write_text("panel", encoding="utf-8") store = SkillSpecStore(tmp_path / "workspace") transaction = PluginSkillTransaction(tmp_path / "workspace") snapshot = store.stage_upstream_snapshot( transaction, skill_name="baoyu-comic", source_kind="plugin", source_id="baoyu-comic", source_version="1.0.0", source_path="skills/comic", source_root=source, ) store.promote_upstream_snapshot(transaction, snapshot) loaded = store.read_upstream_snapshot("baoyu-comic", "baoyu-comic", snapshot.skill_tree_hash) assert loaded is not None assert loaded.content == "# Comic\n\nOriginal.\n" assert (loaded.root / "templates" / "panel.txt").read_text(encoding="utf-8") == "panel" assert (source / "SKILL.md").read_text(encoding="utf-8") == "# Comic\n\nOriginal.\n" ``` Also test: - changing only `templates/panel.txt` creates a different snapshot directory; - `SkillVersion.from_dict()` remains compatible without `tree_hash`; - reading a legacy version derives its complete tree hash; - staging does not make a snapshot visible to `read_upstream_snapshot()`; - promoting a staged snapshot uses `os.replace()` and is idempotent; - a failed metadata write leaves no current pointer to the staged version. - [x] **Step 2: Run test and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_storage.py -q ``` Expected: FAIL because upstream snapshot APIs do not exist. - [x] **Step 3: Add upstream snapshot models** Add: ```python @dataclass(slots=True) class SkillUpstreamSnapshot: skill_name: str source_kind: str source_id: str source_version: str source_path: str skill_content_hash: str skill_tree_hash: str created_at: str frontmatter: dict[str, Any] = field(default_factory=dict) ``` Add `LoadedSkillUpstreamSnapshot(snapshot, content, root)` for storage reads. Extend `SkillVersion` with a backward-compatible `tree_hash: str = ""`; new versions persist the complete version-tree hash, while `read_published_skill()` derives it for legacy metadata that lacks the field. - [x] **Step 4: Add safe tree-copy helper** Refactor a private `SkillSpecStore._copy_regular_tree(source_root, target_root)` that: - rejects any symlink; - rejects paths containing empty, `.`, or `..` segments; - copies regular files only; - creates parents; - never writes outside `target_root`. Use it for transaction staging now; Task 4 will reuse it for mirrored versions. - [x] **Step 5: Implement same-filesystem staging and promotion** `PluginSkillTransaction` creates: ```text /.beaver/staging/plugin-skills// ``` The staging root must be on the same filesystem as `/skills`. It exposes: ```python stage_upstream_snapshot(...) stage_skill_version(...) promote_directory(staged, final) cleanup() ``` `promote_directory()` uses `os.replace()` and never replaces an existing non-identical immutable directory. Cleanup removes only the transaction's staging root. - [x] **Step 6: Implement snapshot APIs** Write snapshots to: ```text skills//upstreams/// ``` The snapshot metadata stores both hashes. If the directory already exists, verify all stored metadata and return it without rewriting. Public methods: ```python stage_upstream_snapshot(transaction, ...) promote_upstream_snapshot(transaction, snapshot) read_upstream_snapshot(skill_name, source_id, skill_tree_hash) ``` - [x] **Step 7: Make JSON/current/index writes atomic** Change `SkillSpecStore._write_json()` and current/index pointer writes to create a temporary file in the target directory, flush and `fsync`, then `os.replace()`. Immutable version directories are promoted first; runtime visibility changes only when `current.json`, `skill.json`, and the published index are atomically replaced under the workspace lock. - [x] **Step 8: Run focused and existing storage tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_storage.py tests/unit/test_phase5_skills_runtime.py -q ``` Expected: PASS. - [x] **Step 9: Commit** ```bash git add app-instance/backend/beaver/plugins/transaction.py app-instance/backend/beaver/skills/specs app-instance/backend/tests/unit/test_plugin_skill_storage.py git commit -m "feat(skills): store immutable plugin upstream snapshots" ``` --- ### Task 4: Mirror Initial Plugin Skills As First-Class Skills **Files:** - Create: `app-instance/backend/beaver/plugins/skills.py` - Modify: `app-instance/backend/beaver/plugins/models.py` - Modify: `app-instance/backend/beaver/plugins/__init__.py` - Modify: `app-instance/backend/beaver/skills/specs/storage.py` - Test: `app-instance/backend/tests/unit/test_plugin_skill_sync.py` - [x] **Step 1: Write failing initial mirror tests** Cover: - enabling mirrors `SKILL.md` and supporting files; - mirrored skill is returned by `SkillsLoader.list_published_skills()`; - `source_kind` is `plugin`, but runtime source is still workspace; - existing non-plugin name collision fails without modification; - any validation/safety failure in a multi-skill plugin occurs before promotion and leaves every linked skill unchanged; - repeated sync is idempotent. - supporting files are present in the promoted version; - concurrent enable calls allocate only one version. Core assertion: ```python result = manager.enable("baoyu-comic") record = SkillsLoader(workspace).get_skill_record("baoyu-comic") loaded = SkillSpecStore(workspace).read_published_skill("baoyu-comic") assert result.status == "synced" assert record is not None and record.source == "workspace" assert record.source_kind == "plugin" assert loaded is not None assert loaded.version.version == "v0001" assert loaded.version.provenance["plugin_id"] == "baoyu-comic" assert loaded.version.provenance["upstream_skill_content_hash"] assert loaded.version.provenance["upstream_skill_tree_hash"] ``` - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_sync.py -q ``` Expected: FAIL because `PluginManager` does not exist. - [x] **Step 3: Implement `PluginManager` constructor and discovery view** Constructor dependencies: ```python class PluginManager: def __init__( self, *, workspace: Path, manifests: dict[str, PluginManifest], discovery_errors: list[PluginDiscoveryError], state_store: PluginStateStore, skill_store: SkillSpecStore, learning_store: SkillLearningStore, publisher: SkillPublisher, safety_checker: SkillDraftSafetyChecker, write_lock: WorkspaceWriteLock, ) -> None: ... ``` Keep all filesystem and lifecycle dependencies injectable for tests. - [x] **Step 4: Implement exact initial mirror publication** Acquire the workspace write lock before reading state, allocating versions, or writing candidates. For each declared skill: 1. persist the upstream snapshot; 2. validate ownership conflict; 3. parse frontmatter/body and create an in-memory `SkillDraft` with `proposal_kind="plugin_initial_mirror"`; 4. run `SkillDraftSafetyChecker.check()` and reject failed or critical reports; 5. allocate the next `vNNNN` while holding the lock; 6. stage a `SkillVersion` whose content exactly equals upstream `SKILL.md`; 7. stage snapshot supporting files into the version directory; 8. generate the complete next `SkillSpec`, current pointer, index, and plugin-state JSON payloads in memory. Use provenance: ```python { "source_kind": "plugin", "plugin_id": manifest.plugin_id, "plugin_version": manifest.version, "plugin_skill_path": declaration.relative_path, "upstream_skill_content_hash": snapshot.skill_content_hash, "upstream_skill_tree_hash": snapshot.skill_tree_hash, "merge_mode": "initial_mirror", } ``` - [x] **Step 5: Promote the complete staged transaction** After every declared skill passes validation: 1. for a new skill, promote its complete staged skill directory with one `os.replace()`; 2. for an existing skill, promote immutable upstream/version directories, atomically replace spec/index metadata, and replace `current.json` last as the visibility switch; 3. atomically write plugin state last; 4. clean the staging directory. Do not implement reverse rollback across already-promoted immutable directories. If a metadata write fails, those directories remain unreferenced and harmless; the previous current pointers remain authoritative. Add startup cleanup for staging directories older than 24 hours. - [x] **Step 6: Run focused and loader tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_phase5_skills_runtime.py -q ``` Expected: PASS. - [x] **Step 7: Commit** ```bash git add app-instance/backend/beaver/plugins app-instance/backend/beaver/skills/specs/storage.py app-instance/backend/tests/unit/test_plugin_skill_sync.py git commit -m "feat(plugins): mirror enabled plugin skills" ``` --- ### Task 5: Detect Upgrades And Create Idempotent Learning Candidates **Files:** - Modify: `app-instance/backend/beaver/plugins/skills.py` - Modify: `app-instance/backend/beaver/memory/skills/models.py` - Modify: `app-instance/backend/beaver/memory/skills/store.py` - Test: `app-instance/backend/tests/unit/test_plugin_skill_sync.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_candidate_state.py` - [x] **Step 1: Write failing upgrade classification tests** Create four tree-hash fixtures representing `B`, `L`, and `U`: ```python @pytest.mark.parametrize( ("base", "local", "upstream", "expected"), [ ("A", "A", "A", "unchanged"), ("A", "B", "B", "already_applied"), ("A", "A", "B", "fast_forward"), ("A", "LOCAL", "UPSTREAM", "three_way"), ], ) def test_classify_plugin_skill_update(base: str, local: str, upstream: str, expected: str) -> None: assert classify_plugin_skill_update(base, local, upstream) == expected ``` Also test: - a supporting-file-only change returns `fast_forward` or `three_way`, never `unchanged`; - candidate ID stability across repeated sync; - new upstream supersedes an older pending candidate; - candidate evidence contains hashes/version references but no raw skill body; - legacy candidate payloads still parse. - two processes syncing the same update append only one candidate record. - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_candidate_state.py -q ``` Expected: FAIL because update classification and candidate kind are missing. - [x] **Step 3: Add `plugin_skill_update` candidate support** Do not add a special status. Existing candidate statuses remain sufficient. Ensure `SkillLearningCandidate.from_dict()` accepts the new `kind` without changing legacy defaults. Use evidence: ```python { "plugin_id": plugin_id, "plugin_version": manifest.version, "skill_name": skill_name, "merge_mode": merge_mode, "base_upstream_tree_hash": accepted_tree_hash, "new_upstream_tree_hash": snapshot.skill_tree_hash, "local_version": current.version.version, } ``` Set `priority=10`, `confidence=1.0`, `trigger_reason="plugin_update"`. - [x] **Step 4: Implement update classification and candidate creation** Use canonical hashes and deterministic IDs: ```python candidate_id = ( f"plugin-update:{plugin_id}:{skill_name}:" f"{new_upstream_tree_hash[:12]}" ) ``` For `already_applied`, advance state without a candidate. For `fast_forward` and `three_way`, record an open candidate. If the same ID exists in any status, do not append another JSONL record. - [x] **Step 5: Make candidate mutation atomic under the shared lock** Add an optional `WorkspaceWriteLock` to `SkillLearningStore`; EngineLoader supplies the shared workspace instance, while isolated unit-test construction falls back to a store-local lock. Add: ```python record_learning_candidate_if_absent(candidate) -> tuple[SkillLearningCandidate, bool] ``` Inside one lock acquisition, read current candidates, check the deterministic ID, and atomically rewrite or append the JSONL record. Apply the same lock to candidate update and transition methods. Nested calls from `PluginManager` reuse the reentrant lock. - [x] **Step 6: Supersede stale pending updates** When a different pending candidate exists for the same plugin skill: ```python learning_store.transition_learning_candidate( old_candidate_id, "superseded", event_type="plugin_update_superseded", payload={"replacement_candidate_id": new_candidate_id}, ) ``` - [x] **Step 7: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_candidate_state.py -q ``` Expected: PASS. - [x] **Step 8: Commit** ```bash git add app-instance/backend/beaver/plugins/skills.py app-instance/backend/beaver/memory/skills/models.py app-instance/backend/beaver/memory/skills/store.py app-instance/backend/tests/unit/test_plugin_skill_sync.py app-instance/backend/tests/unit/test_skill_learning_candidate_state.py git commit -m "feat(plugins): enqueue skill upgrade candidates" ``` --- ### Task 6: Add Plugin Update Draft Provenance And Fast-Forward Synthesis **Files:** - Modify: `app-instance/backend/beaver/skills/specs/models.py` - Modify: `app-instance/backend/beaver/skills/drafts/service.py` - Modify: `app-instance/backend/beaver/skills/publisher/service.py` - Modify: `app-instance/backend/beaver/skills/learning/service.py` - Test: `app-instance/backend/tests/unit/test_plugin_skill_learning.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_pipeline.py` - [x] **Step 1: Write failing model and fast-forward tests** Test backward-compatible draft parsing and exact upstream fast-forward: ```python draft = asyncio.run(service.synthesize_draft(candidate.candidate_id, provider_bundle)) assert draft.proposal_kind == "plugin_skill_update" assert draft.proposed_content == new_upstream.content assert draft.base_version == "v0001" assert draft.provenance["merge_mode"] == "fast_forward" assert draft.provenance["new_upstream_tree_hash"] == new_upstream.snapshot.skill_tree_hash assert provider.calls == [] ``` After publish, assert the new version contains the new upstream supporting files even when `SKILL.md` did not change. - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_pipeline.py -q ``` Expected: FAIL because drafts have no provenance and the learning service has no plugin update branch. - [x] **Step 3: Add backward-compatible draft provenance** Extend `SkillDraft`: ```python provenance: dict[str, Any] = field(default_factory=dict) ``` Include it in `to_dict()` and parse missing values as `{}` in `from_dict()`. - [x] **Step 4: Add a focused draft constructor** Add: ```python def create_plugin_update_draft( self, *, skill_name: str, base_version: str, proposed_content: str, proposed_frontmatter: dict, created_by: str, reason: str, provenance: dict, evidence_refs: list[dict] | None = None, ) -> SkillDraft: ``` It writes `proposal_kind="plugin_skill_update"`. - [x] **Step 5: Implement fast-forward synthesis** In `SkillLearningService.synthesize_draft()`, branch before ordinary revision: ```python if candidate.kind == "plugin_skill_update": return await self._synthesize_plugin_update(candidate, provider_bundle) ``` For `merge_mode == "fast_forward"`, load `U` from `SkillSpecStore`, parse its frontmatter/body, and create a draft exactly equal to `U`. Do not call the provider. - [x] **Step 6: Serialize all skill publication** Add an optional `WorkspaceWriteLock` to `SkillPublisher`; EngineLoader supplies the shared workspace instance and isolated tests use a publisher-local fallback. Hold it across `_next_version()`, version staging/promotion, spec/current/index replacement, rollback, and disable. This protects ordinary learned skills as well as plugin-origin skills from racing with boot or explicit plugin sync. - [x] **Step 7: Materialize referenced supporting files during publish** For `proposal_kind="plugin_skill_update"`, resolve the snapshot and supporting-file plan from draft provenance. Stage the complete next version directory, including `SKILL.md` and supporting files, before promoting it. Reject missing snapshots, path conflicts, or tree-hash mismatches. Ordinary skill publication keeps its current behavior. - [x] **Step 8: Preserve draft provenance on publish** Change `SkillPublisher.publish()` provenance construction to: ```python provenance={ **dict(draft.provenance), "draft_id": draft_id, "proposal_kind": draft.proposal_kind, "trigger_run_id": draft.trigger_run_id, "trigger_session_id": draft.trigger_session_id, } ``` - [x] **Step 9: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_pipeline.py -q ``` Expected: PASS. - [x] **Step 10: Commit** ```bash git add app-instance/backend/beaver/skills app-instance/backend/tests/unit/test_plugin_skill_learning.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py git commit -m "feat(skill-learning): create plugin update drafts" ``` --- ### Task 7: Implement Three-Way Plugin Skill Synthesis **Files:** - Create: `app-instance/backend/beaver/plugins/tree_merge.py` - Modify: `app-instance/backend/beaver/skills/learning/synthesizer.py` - Modify: `app-instance/backend/beaver/skills/learning/service.py` - Test: `app-instance/backend/tests/unit/test_plugin_skill_learning.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_synthesizer_preservation.py` - [x] **Step 1: Write failing three-way prompt and parse tests** Assert the prompt contains labeled `OLD UPSTREAM`, `CURRENT LOCAL`, and `NEW UPSTREAM` sections and does not confuse the current local version with the merge base. Test response parsing for: ```json { "frontmatter": {"name": "baoyu-comic", "description": "Comic workflow", "tools": []}, "content": "# Baoyu Comic\n...", "change_reason": "Adopt upstream layout while preserving learned review step.", "preserved_local_sections": ["Review"], "adopted_upstream_sections": ["Panel Layout"], "resolved_conflicts": ["Output ordering"], "dropped_sections": [] } ``` Add supporting-file merge tests: ```python def test_supporting_file_merge_adopts_upstream_when_local_is_unchanged() -> None: plan = merge_supporting_file_trees(base={"a.txt": "A"}, local={"a.txt": "A"}, upstream={"a.txt": "U"}) assert plan.files["a.txt"].source == "upstream" assert plan.conflicts == [] def test_supporting_file_merge_blocks_divergent_edits() -> None: plan = merge_supporting_file_trees(base={"a.txt": "A"}, local={"a.txt": "L"}, upstream={"a.txt": "U"}) assert plan.conflicts[0].path == "a.txt" ``` - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_synthesizer_preservation.py -q ``` Expected: FAIL because three-way synthesis does not exist. - [x] **Step 3: Add `synthesize_plugin_update()`** Signature: ```python async def synthesize_plugin_update( self, candidate: SkillLearningCandidate, evidence_packet: EvidencePacket, provider: LLMProvider, model: str, *, old_upstream: dict[str, Any], current_local: dict[str, Any], new_upstream: dict[str, Any], ) -> dict[str, Any]: ``` The system message must require JSON only and state: - preserve valid local learning; - adopt upstream fixes and safety changes; - do not concatenate duplicate sections; - list every intentional drop; - leave `resolved_conflicts` empty only when no semantic conflict exists. - [x] **Step 4: Load all three snapshots in the learning service** Resolve: - `B` using `base_upstream_tree_hash`; - `L` using `local_version`; - `U` using `new_upstream_tree_hash`. Raise a specific `ValueError` when any referenced snapshot/version is missing. Do not fallback to a two-way merge. - [x] **Step 5: Build the deterministic supporting-file merge plan** Compare files by path and content/executable digest: - `L == B`: use `U`; - `U == B`: use `L`; - `L == U`: use either; - one-sided addition: use the added file; - divergent edit, different same-path additions, and delete-versus-edit: conflict. Exclude `SKILL.md` because the synthesizer handles it. Store selected source references and conflict records in draft provenance; do not duplicate file bytes in JSON. - [x] **Step 6: Create the plugin update draft** Store merge decisions in draft provenance: ```python { **plugin_reference_fields, "merge_mode": "three_way", "preserved_local_sections": payload["preserved_local_sections"], "adopted_upstream_sections": payload["adopted_upstream_sections"], "resolved_conflicts": payload["resolved_conflicts"], "dropped_sections": payload["dropped_sections"], "supporting_file_plan": supporting_file_plan.to_dict(), } ``` If the supporting-file plan contains conflicts, the draft may be inspected but cannot be published. V1 does not ask the LLM to merge arbitrary or binary files. - [x] **Step 7: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_synthesizer_preservation.py -q ``` Expected: PASS. - [x] **Step 8: Commit** ```bash git add app-instance/backend/beaver/plugins/tree_merge.py app-instance/backend/beaver/skills/learning app-instance/backend/tests/unit/test_plugin_skill_learning.py app-instance/backend/tests/unit/test_skill_learning_synthesizer_preservation.py git commit -m "feat(skill-learning): synthesize three-way plugin updates" ``` --- ### Task 8: Extend Replay Preservation For Plugin Merges **Files:** - Modify: `app-instance/backend/beaver/skills/learning/preservation.py` - Modify: `app-instance/backend/beaver/skills/learning/eval.py` - Modify: `app-instance/backend/beaver/skills/learning/pipeline.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_preservation.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_eval.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_pipeline.py` - [x] **Step 1: Write failing plugin merge preservation tests** Cover: - merged draft preserves local Safety and adopts new upstream Safety; - silently dropping either Safety section fails; - explicitly resolved non-safety conflicts pass; - unresolved conflicts block publish; - unresolved supporting-file conflicts block publish; - baseline replay remains current local `L`. Expected report shape: ```python assert report.preservation_report == { "mode": "plugin_three_way", "passed": True, "local": {...}, "upstream": {...}, "unresolved_conflicts": [], } ``` - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_skill_learning_preservation.py tests/unit/test_skill_learning_eval.py tests/unit/test_skill_learning_pipeline.py -q ``` Expected: FAIL because preservation only checks one base skill. - [x] **Step 3: Add plugin merge preservation helper** Add: ```python def check_plugin_merge_preservation( *, local_content: str, upstream_content: str, draft_content: str, merge_decisions: dict[str, Any], ) -> dict[str, Any]: ``` It calls existing `check_preservation()` for local and upstream content, gives Safety and Required Tools sections blocking weight, and reports unresolved conflicts separately. - [x] **Step 4: Use current local as replay baseline** When `draft.proposal_kind == "plugin_skill_update"`, load `draft.base_version` as the baseline skill. Continue to run the candidate arm with the draft context. Do not use raw upstream `B` or `U` as the replay baseline. - [x] **Step 5: Tighten publish gate** Add: ```python if draft.proposal_kind == "plugin_skill_update": preservation = eval_report.preservation_report or {} if preservation.get("mode") != "plugin_three_way" and draft.provenance.get("merge_mode") == "three_way": raise ValueError("Plugin update requires a three-way preservation report") if preservation.get("unresolved_conflicts"): raise ValueError("Plugin update has unresolved merge conflicts") if draft.provenance.get("supporting_file_plan", {}).get("conflicts"): raise ValueError("Plugin update has unresolved supporting-file conflicts") ``` The existing `passed is False` gate remains active. - [x] **Step 6: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_skill_learning_preservation.py tests/unit/test_skill_learning_eval.py tests/unit/test_skill_learning_pipeline.py -q ``` Expected: PASS. - [x] **Step 7: Commit** ```bash git add app-instance/backend/beaver/skills/learning app-instance/backend/tests/unit/test_skill_learning_preservation.py app-instance/backend/tests/unit/test_skill_learning_eval.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py git commit -m "feat(skill-learning): gate plugin merge preservation" ``` --- ### Task 9: Reconcile Publication And Implement Pause/Disable/Adopt **Files:** - Modify: `app-instance/backend/beaver/plugins/skills.py` - Modify: `app-instance/backend/beaver/skills/learning/pipeline.py` - Modify: `app-instance/backend/beaver/skills/publisher/service.py` - Test: `app-instance/backend/tests/unit/test_plugin_skill_sync.py` - Test: `app-instance/backend/tests/unit/test_skill_learning_pipeline.py` - [x] **Step 1: Write failing lifecycle tests** Test: - publishing a plugin update advances accepted upstream tree hash; - pending candidate clears; - simulated observer failure leaves the published version intact; - the next sync reconciles state from current version provenance and does not recreate the candidate; - reconciliation never moves `accepted_beaver_version` backwards after rollback; - pause leaves linked skills active and creates no update candidates; - resume reconciles and syncs; - disabling plugin disables linked skills without deletion; - re-enable restores and syncs; - missing package sets plugin status `missing`, suspends sync, and leaves linked skills active; - adopt changes `source_kind` to `managed`, removes binding, and keeps the skill active. - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_pipeline.py -q ``` Expected: FAIL because publication has no plugin acknowledgement callback. - [x] **Step 3: Add a narrow publication observer** Extend pipeline construction with: ```python publish_observer: Callable[[SkillDraft, SkillVersion | SkillSpec], None] | None = None ``` After successful publish, call it before returning. Observer failure must be recorded and audited as `plugin_publish_ack_failed`; it must not delete the already-published version or turn the publish API response into a failure. Mark the learning candidate published before invoking the best-effort observer so clients do not retry a successful publish. The next sync is responsible for reconciliation. - [x] **Step 4: Implement `PluginManager.on_skill_published()`** For `proposal_kind="plugin_skill_update"`: 1. validate plugin ID, skill name, and new upstream tree hash from draft provenance; 2. set `accepted_upstream_tree_hash = new_upstream_tree_hash`; 3. set `observed_upstream_tree_hash = new_upstream_tree_hash`; 4. set `accepted_beaver_version = published.version`; 5. set `current_beaver_version = published.version`; 6. clear `pending_candidate_id`; 7. set status `synced`. - [x] **Step 5: Implement sync-time reconciliation** At the beginning of `sync_enabled()`, inspect each linked skill's current published version. When provenance contains: ```python { "proposal_kind": "plugin_skill_update", "plugin_id": plugin_id, "new_upstream_tree_hash": tree_hash, } ``` and the referenced upstream snapshot exists, advance state only if the current version number is newer than `accepted_beaver_version`. Clear only the matching pending candidate. Never regress state when the runtime current pointer was rolled back to an older version. - [x] **Step 6: Implement pause, resume, disable, missing, and adopt** `pause(plugin_id)` sets `updates_paused=True` and leaves linked skills unchanged. `resume(plugin_id)` clears the flag and performs reconciliation/sync. `disable(plugin_id, disable_linked_skills=True)` rejects calls without the explicit confirmation and calls `SkillPublisher.disable()` for every still-linked skill. `adopt(plugin_id, skill_name)`: - requires an existing binding; - changes `SkillSpec.source_kind` to `managed`; - appends `adopted_from_plugin:` to lineage; - removes the binding; - leaves the current version active. When discovery cannot find a previously known plugin, set status `missing`, preserve `enabled` and `updates_paused`, skip update generation, and do not disable any linked skill. - [x] **Step 7: Run focused tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_pipeline.py -q ``` Expected: PASS. - [x] **Step 8: Commit** ```bash git add app-instance/backend/beaver/plugins/skills.py app-instance/backend/beaver/skills/learning/pipeline.py app-instance/backend/beaver/skills/publisher/service.py app-instance/backend/tests/unit/test_plugin_skill_sync.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py git commit -m "feat(plugins): track published updates and ownership" ``` --- ### Task 10: Wire Plugin Sync Into Engine Loading **Files:** - Modify: `app-instance/backend/beaver/engine/loader.py` - Modify: `app-instance/backend/beaver/plugins/__init__.py` - Test: `app-instance/backend/tests/unit/test_plugin_runtime.py` - Test: `app-instance/backend/tests/unit/test_phase5_skills_runtime.py` - [x] **Step 1: Write failing runtime assembly tests** Test: - discovered disabled plugins do not mirror; - enabled plugin mirrors before `EngineLoadResult.skills` is calculated; - changed plugin creates a candidate but never calls an LLM during boot; - repeated boot creates no duplicate versions/candidates; - concurrent multi-process boot creates no duplicate versions/candidates; - boot skips auto-sync and reports `deferred_lock_busy` when an explicit sync holds the workspace lock; - `EngineLoadResult.plugin_manager` and plugin summaries are available. - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_runtime.py tests/unit/test_phase5_skills_runtime.py -q ``` Expected: FAIL because `EngineLoader` does not assemble plugin services. - [x] **Step 3: Extend `EngineLoadResult` and loader injection** Add: ```python plugin_manager: PluginManager | None = None plugins: list[dict] = field(default_factory=list) ``` Allow `plugin_manager` injection in `EngineLoader.__init__()` for tests. - [x] **Step 4: Assemble in dependency order** Required order: 1. config/workspace; 2. `SkillSpecStore`, learning store, and `SkillsLoader`; 3. tool registry and builtins, including skill-view tools using that loader; 4. draft/review/publisher and a safety checker using the completed tool registry; 5. discovery and `PluginStateStore`; 6. `PluginManager`; 7. `plugin_manager.sync_enabled(blocking=False)` when `config.plugins.auto_sync`; 8. learning service/pipeline with publication observer; 9. result summaries. Do not use `SkillsLoader.extra_dirs` for plugin skills. Explicit API enable/sync uses a bounded blocking lock timeout; Engine boot uses a non-blocking attempt and proceeds with the current published skill set if another writer owns the lock. - [x] **Step 5: Run runtime tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_runtime.py tests/unit/test_phase5_skills_runtime.py -q ``` Expected: PASS. - [x] **Step 6: Commit** ```bash git add app-instance/backend/beaver/engine/loader.py app-instance/backend/beaver/plugins app-instance/backend/tests/unit/test_plugin_runtime.py app-instance/backend/tests/unit/test_phase5_skills_runtime.py git commit -m "feat(runtime): sync declarative plugins at boot" ``` --- ### Task 11: Add Plugin Management API **Files:** - Modify: `app-instance/backend/beaver/interfaces/web/app.py` - Test: `app-instance/backend/tests/unit/test_plugin_web_api.py` - [x] **Step 1: Write failing API tests** Cover: ```text GET /api/plugins POST /api/plugins/sync POST /api/plugins/{plugin_id}/enable POST /api/plugins/{plugin_id}/pause POST /api/plugins/{plugin_id}/resume POST /api/plugins/{plugin_id}/disable POST /api/plugins/{plugin_id}/skills/{skill_name}/adopt ``` Assert `404` for unknown plugin, `409` for skill ownership conflict, and `400` for invalid manifest/sync errors. Assert lock timeout maps to `409 plugin_write_busy`. Assert no payload contains the real absolute workspace or external search-root path. Assert disable without `{"disable_linked_skills": true}` is rejected. - [x] **Step 2: Run tests and verify failure** ```bash cd app-instance/backend pytest tests/unit/test_plugin_web_api.py -q ``` Expected: FAIL with missing routes. - [x] **Step 3: Add normalized plugin payload helper** Return: ```python { "id": manifest.plugin_id, "name": manifest.name, "discovered_version": manifest.version, "installed_version": state.installed_version, "enabled": state.enabled, "status": state.status, "last_error": state.last_error, "manifest_path": manifest.display_path, "updates_paused": state.updates_paused, "skills": [ { "name": declaration.name, "status": binding.status, "current_beaver_version": binding.current_beaver_version, "accepted_upstream_tree_hash": binding.accepted_upstream_tree_hash, "observed_upstream_tree_hash": binding.observed_upstream_tree_hash, "accepted_beaver_version": binding.accepted_beaver_version, "pending_candidate_id": binding.pending_candidate_id, } ], } ``` Never return arbitrary plugin file content, secrets, or absolute server paths. - [x] **Step 4: Implement routes** Each mutating endpoint boots one runtime, invokes its `plugin_manager`, and returns the updated plugin payload. Map `ValueError` messages to stable HTTP status codes. - [x] **Step 5: Run focused and existing web tests** ```bash cd app-instance/backend pytest tests/unit/test_plugin_web_api.py tests/unit/test_skill_learning_web_api.py -q ``` Expected: PASS. - [x] **Step 6: Commit** ```bash git add app-instance/backend/beaver/interfaces/web/app.py app-instance/backend/tests/unit/test_plugin_web_api.py git commit -m "feat(api): manage declarative plugins" ``` --- ### Task 12: Add Plugin Management To The Skills UI **Files:** - Modify: `app-instance/frontend/types/index.ts` - Modify: `app-instance/frontend/lib/api.ts` - Modify: `app-instance/frontend/app/(app)/skills/page.tsx` - Test: `app-instance/frontend/lib/plugin-api.test.ts` - [x] **Step 1: Write failing API client tests** Test URL, method, and response typing for list, sync, enable, pause, resume, disable, and adopt. - [x] **Step 2: Run frontend test and verify failure** Run the repository's existing frontend test command targeting: ```bash cd app-instance/frontend npx vitest run lib/plugin-api.test.ts ``` Expected: FAIL because plugin API functions do not exist. - [x] **Step 3: Add frontend types** Add: ```typescript export interface PluginSkillBinding { name: string; status: string; current_beaver_version?: string | null; accepted_upstream_tree_hash?: string | null; observed_upstream_tree_hash?: string | null; accepted_beaver_version?: string | null; pending_candidate_id?: string | null; } export interface BeaverPlugin { id: string; name: string; discovered_version?: string | null; installed_version?: string | null; enabled: boolean; updates_paused: boolean; status: string; last_error?: string | null; manifest_path?: string | null; skills: PluginSkillBinding[]; } ``` - [x] **Step 4: Add API functions** Implement: ```typescript listPlugins() syncPlugins() enablePlugin(pluginId) pausePlugin(pluginId) resumePlugin(pluginId) disablePlugin(pluginId, { disable_linked_skills: true }) adoptPluginSkill(pluginId, skillName) ``` - [x] **Step 5: Add a `plugins` Skills tab** Extend `SkillsTab` and render a compact table with: - plugin name and versions; - enabled/status badges; - linked skills and pending candidate link; - icon buttons with tooltips for sync, enable, pause, resume, disable, and adopt; - confirmation before disable/adopt; - missing-source warning stating that current skills remain active but updates are suspended; - existing `runAction()` and error handling. Do not add a separate marketing-style page or nested cards. - [x] **Step 6: Label plugin-origin skills and update candidates** In existing Published/Candidates/Drafts views: - show `Plugin` source badge when `source_kind === "plugin"`; - render `plugin_skill_update` as `插件升级合并 / Plugin update merge`; - show `fast_forward` or `three_way` from candidate evidence/provenance. - [x] **Step 7: Run frontend tests and type checks** ```bash cd app-instance/frontend npx vitest run lib/plugin-api.test.ts npm run lint npx tsc --noEmit ``` Expected: PASS. - [x] **Step 8: Commit** ```bash git add app-instance/frontend/types/index.ts app-instance/frontend/lib/api.ts app-instance/frontend/lib/plugin-api.test.ts 'app-instance/frontend/app/(app)/skills/page.tsx' git commit -m "feat(skills-ui): manage plugin skill mirrors" ``` --- ### Task 13: Add End-To-End Lifecycle Coverage And Documentation **Files:** - Create: `app-instance/backend/tests/integration/test_plugin_skill_lifecycle.py` - Create: `docs/plugins/skill-plugins.md` - Modify: `docs/product-discovery/beaver/README.md` - [x] **Step 1: Write the end-to-end lifecycle test** The test must: 1. create plugin `1.0.0`; 2. enable it and assert mirror `v0001`; 3. publish a normal learned local revision `v0002`; 4. replace the package with plugin `1.1.0`; 5. sync and assert one `three_way` candidate; 6. synthesize with a stub provider; 7. run safety and replay evaluation with a stub runner; 8. submit, approve, and publish `v0003`; 9. assert accepted upstream tree hash and provenance advanced; 10. rollback to `v0002`; 11. assert plugin source files were never modified; 12. update only a supporting file and assert a new update candidate is created; 13. simulate publish-observer failure and assert the next sync reconciles state; 14. remove the plugin package and assert the plugin is `missing` while the current skill remains active; 15. run two sync processes and assert no duplicate version or candidate is created. - [x] **Step 2: Run the integration test and fix only lifecycle defects** ```bash cd app-instance/backend pytest tests/integration/test_plugin_skill_lifecycle.py -v ``` Expected: PASS. - [x] **Step 3: Write operator documentation** Document: - package layout and manifest; - discovery roots; - explicit enable requirement; - mirror and three-way merge behavior; - dual content/tree hashing and supporting-file merge conflicts; - update candidate review flow; - pause/resume versus disable/adopt; - recovery from missing/invalid plugins; - workspace locking, deferred boot sync, and publication reconciliation; - why plugin Python code is not executed in V1. - [x] **Step 4: Run the complete relevant backend suite** ```bash cd app-instance/backend pytest \ tests/unit/test_plugin_manifest.py \ tests/unit/test_plugin_hashing.py \ tests/unit/test_plugin_state.py \ tests/unit/test_workspace_write_lock.py \ tests/unit/test_plugin_skill_storage.py \ tests/unit/test_plugin_skill_sync.py \ tests/unit/test_plugin_skill_learning.py \ tests/unit/test_plugin_runtime.py \ tests/unit/test_plugin_web_api.py \ tests/unit/test_skill_learning_candidate_state.py \ tests/unit/test_skill_learning_pipeline.py \ tests/unit/test_skill_learning_eval.py \ tests/unit/test_skill_learning_worker.py \ tests/unit/test_phase5_skills_runtime.py \ tests/integration/test_plugin_skill_lifecycle.py \ -q ``` Expected: PASS. - [x] **Step 5: Run frontend verification** ```bash cd app-instance/frontend npx vitest run lib/plugin-api.test.ts npm run lint npx tsc --noEmit ``` Expected: PASS. - [x] **Step 6: Run a dirty-worktree-safe diff review** ```bash git status --short git diff --check git diff --stat ``` Expected: - no whitespace errors; - only plugin/skill lifecycle files and planned docs/tests are included in this feature; - unrelated pre-existing user changes remain untouched. - [x] **Step 7: Commit** ```bash git add app-instance/backend/tests/integration/test_plugin_skill_lifecycle.py docs/plugins/skill-plugins.md docs/product-discovery/beaver/README.md git commit -m "docs(plugins): document skill mirror lifecycle" ``` --- ## Release Sequence 1. Ship backend manifest, state, snapshots, and initial mirror behind the Plugins API. 2. Enable update candidate generation after initial mirror tests pass in a real workspace. 3. Enable three-way synthesis and replay publish gates. 4. Ship the Plugins UI. 5. Keep executable plugin code disabled; design it separately with process isolation and permission boundaries. ## Rollout Metrics Track: - plugin discovery and manifest error count; - initial mirror success/failure count; - plugin update candidates created, superseded, rejected, and published; - plugin update candidates caused by supporting-file-only changes; - fast-forward versus three-way update ratio; - write-lock contention and deferred boot sync count; - publication reconciliation repair count; - replay regression and preservation failure rate; - time from upstream discovery to accepted publication; - rollback count for plugin-origin versions. ## Final Acceptance Test The feature is complete only when a plugin-origin skill can: 1. be enabled and used with normal skill priority; 2. accumulate a normal Beaver-learned revision; 3. receive a newer upstream plugin version; 4. produce a three-way update draft without editing the plugin package; 5. pass the same safety, replay, review, and publish gates as ordinary skills; 6. retain full upstream and local provenance; 7. detect and publish supporting-file-only updates; 8. survive concurrent boot/sync without duplicate versions or candidates; 9. recover plugin state after observer failure; 10. remain active when its plugin package is temporarily missing; 11. be paused, resumed, rolled back, disabled, re-enabled, or adopted without data loss.