beaver_project/docs/superpowers/plans/2026-06-15-plugin-skill-mirroring.md

# Plugin Skill Mirroring And Upgrade Learning Implementation Plan

> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.

**Goal:** Add declarative Beaver plugins whose skills are mirrored as normal managed skills, learn normally, and merge plugin upgrades through the existing safety, replay evaluation, review, publish, and rollback lifecycle.

**Architecture:** A new `beaver.plugins` package discovers and validates `beaver.plugin.json`, computes content and full-tree hashes, persists enable/sync state, and stages immutable upstream/version trees before atomic promotion under a workspace write lock. Plugin upgrades become deterministic `plugin_skill_update` learning candidates using old upstream, current local, and new upstream inputs; the existing learning pipeline remains the only path for update publication, with sync-time reconciliation repairing failed state acknowledgements.

**Tech Stack:** Python dataclasses and file-backed JSON stores, existing `SkillSpecStore` and skill-learning pipeline, FastAPI, pytest, Next.js/TypeScript, existing shadcn UI components.

---

## Scope

This plan implements declarative skill plugins only. Do not add Python plugin entrypoints,
hooks, providers, channels, dependency installation, or marketplace download support.
Plugin-provided tools continue to use MCP.

## File Structure

Create focused plugin modules:

- `app-instance/backend/beaver/plugins/models.py`: manifest, discovery, state, and sync result dataclasses.
- `app-instance/backend/beaver/plugins/manifest.py`: JSON parsing, identifier validation, and contained-path validation.
- `app-instance/backend/beaver/plugins/hashing.py`: canonical skill-content and full-tree hashing.
- `app-instance/backend/beaver/plugins/tree_merge.py`: deterministic three-way supporting-file merge plans.
- `app-instance/backend/beaver/plugins/state.py`: atomic `.beaver/plugins/state.json` persistence.
- `app-instance/backend/beaver/plugins/discovery.py`: scan workspace and configured plugin roots.
- `app-instance/backend/beaver/plugins/transaction.py`: same-filesystem staging and immutable directory promotion.
- `app-instance/backend/beaver/plugins/skills.py`: initial mirror, update classification, candidate creation, reconciliation, pause/resume, disable, and adopt.
- `app-instance/backend/beaver/plugins/__init__.py`: public exports.
- `app-instance/backend/beaver/foundation/utils/file_lock.py`: reentrant cross-process workspace write lock.

Modify skill lifecycle modules:

- `app-instance/backend/beaver/skills/specs/models.py`: add upstream snapshot and draft provenance models.
- `app-instance/backend/beaver/skills/specs/storage.py`: persist immutable upstream snapshots and safely copy supporting files.
- `app-instance/backend/beaver/memory/skills/store.py`: lock candidate existence checks and JSONL mutations.
- `app-instance/backend/beaver/skills/drafts/service.py`: create plugin update drafts.
- `app-instance/backend/beaver/skills/learning/service.py`: synthesize `plugin_skill_update`.
- `app-instance/backend/beaver/skills/learning/synthesizer.py`: three-way plugin merge prompt and result.
- `app-instance/backend/beaver/skills/learning/eval.py`: plugin merge preservation report.
- `app-instance/backend/beaver/skills/learning/pipeline.py`: acknowledge successful plugin update publication.
- `app-instance/backend/beaver/skills/publisher/service.py`: carry draft provenance into published versions.

Modify runtime and management surfaces:

- `app-instance/backend/beaver/foundation/config/schema.py`
- `app-instance/backend/beaver/foundation/config/loader.py`
- `app-instance/backend/beaver/engine/loader.py`
- `app-instance/backend/beaver/interfaces/web/app.py`
- `app-instance/frontend/types/index.ts`
- `app-instance/frontend/lib/api.ts`
- `app-instance/frontend/app/(app)/skills/page.tsx`

Add tests:

- `app-instance/backend/tests/unit/test_plugin_manifest.py`
- `app-instance/backend/tests/unit/test_plugin_hashing.py`
- `app-instance/backend/tests/unit/test_plugin_state.py`
- `app-instance/backend/tests/unit/test_workspace_write_lock.py`
- `app-instance/backend/tests/unit/test_plugin_skill_storage.py`
- `app-instance/backend/tests/unit/test_plugin_skill_sync.py`
- `app-instance/backend/tests/unit/test_plugin_skill_learning.py`
- `app-instance/backend/tests/unit/test_plugin_runtime.py`
- `app-instance/backend/tests/unit/test_plugin_web_api.py`
- `app-instance/frontend/lib/plugin-api.test.ts`

---

### Task 1: Add Plugin Configuration And Manifest Models

**Files:**
- Create: `app-instance/backend/beaver/plugins/models.py`
- Create: `app-instance/backend/beaver/plugins/manifest.py`
- Create: `app-instance/backend/beaver/plugins/hashing.py`
- Create: `app-instance/backend/beaver/plugins/__init__.py`
- Modify: `app-instance/backend/beaver/foundation/config/schema.py`
- Modify: `app-instance/backend/beaver/foundation/config/loader.py`
- Modify: `app-instance/backend/beaver/foundation/config/__init__.py`
- Test: `app-instance/backend/tests/unit/test_plugin_manifest.py`
- Test: `app-instance/backend/tests/unit/test_plugin_hashing.py`
- Test: `app-instance/backend/tests/unit/test_config_loader.py`

- [ ] **Step 1: Write failing manifest validation tests**

Create tests covering:

```python
def test_load_plugin_manifest_accepts_declared_skill(tmp_path: Path) -> None:
    root = tmp_path / "comic"
    (root / "skills" / "comic").mkdir(parents=True)
    (root / "skills" / "comic" / "SKILL.md").write_text("# Comic\n", encoding="utf-8")
    (root / "beaver.plugin.json").write_text(
        json.dumps(
            {
                "schema_version": 1,
                "id": "baoyu-comic",
                "name": "Baoyu Comic",
                "version": "1.2.0",
                "skills": [{"name": "baoyu-comic", "path": "skills/comic"}],
            }
        ),
        encoding="utf-8",
    )

    manifest = load_plugin_manifest(root / "beaver.plugin.json")

    assert manifest.plugin_id == "baoyu-comic"
    assert manifest.skills[0].name == "baoyu-comic"
    assert manifest.skills[0].root == root / "skills" / "comic"


@pytest.mark.parametrize("value", ["../outside", "/absolute", "skills/../../outside"])
def test_load_plugin_manifest_rejects_escaping_skill_path(tmp_path: Path, value: str) -> None:
    path = tmp_path / "beaver.plugin.json"
    path.write_text(
        json.dumps(
            {
                "schema_version": 1,
                "id": "unsafe",
                "name": "Unsafe",
                "version": "1.0.0",
                "skills": [{"name": "unsafe", "path": value}],
            }
        ),
        encoding="utf-8",
    )

    with pytest.raises(ValueError, match="contained"):
        load_plugin_manifest(path)
```

Also test invalid IDs, duplicate skill names, unsupported schema versions, missing
`SKILL.md`, and symlinked skill roots.

Add tree-hash tests:

```python
def test_skill_tree_hash_changes_when_supporting_file_changes(tmp_path: Path) -> None:
    root = tmp_path / "skill"
    root.mkdir()
    (root / "SKILL.md").write_text("# Skill\n", encoding="utf-8")
    (root / "templates").mkdir()
    template = root / "templates" / "report.md"
    template.write_text("v1", encoding="utf-8")

    first = hash_plugin_skill_tree(root)
    template.write_text("v2", encoding="utf-8")
    second = hash_plugin_skill_tree(root)

    assert first.skill_content_hash == second.skill_content_hash
    assert first.skill_tree_hash != second.skill_tree_hash
```

Also verify path changes and executable-bit changes affect `skill_tree_hash`, while mtime
and non-executable permission changes do not.

- [ ] **Step 2: Run tests and verify failure**

Run:

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_manifest.py tests/unit/test_plugin_hashing.py tests/unit/test_config_loader.py -q
```

Expected: FAIL because `beaver.plugins` and `PluginsConfig` do not exist.

- [ ] **Step 3: Implement immutable plugin models and config**

Put plugin package models in `beaver/plugins/models.py`:

```python
@dataclass(frozen=True, slots=True)
class PluginSkillDeclaration:
    name: str
    relative_path: str
    root: Path


@dataclass(frozen=True, slots=True)
class PluginManifest:
    schema_version: int
    plugin_id: str
    name: str
    version: str
    root: Path
    manifest_path: Path
    display_path: str
    skills: tuple[PluginSkillDeclaration, ...]


@dataclass(frozen=True, slots=True)
class PluginSkillFileDigest:
    path: str
    size: int
    executable: bool
    content_hash: str


@dataclass(frozen=True, slots=True)
class PluginSkillTreeDigest:
    skill_content_hash: str
    skill_tree_hash: str
    files: tuple[PluginSkillFileDigest, ...]


```

Put configuration in `beaver/foundation/config/schema.py` to preserve the foundation layer
and avoid importing plugin runtime modules from config:

```python
@dataclass(slots=True)
class PluginsConfig:
    search_paths: list[str] = field(default_factory=list)
    auto_sync: bool = True
```

Add `plugins: PluginsConfig` to `BeaverConfig`. Parse both camelCase and snake_case:

```python
def _parse_plugins(raw: Any) -> PluginsConfig:
    data = _as_dict(raw)
    return PluginsConfig(
        search_paths=_string_list(data.get("searchPaths") or data.get("search_paths")),
        auto_sync=_bool(data.get("autoSync") if "autoSync" in data else data.get("auto_sync"), default=True),
    )
```

- [ ] **Step 4: Implement strict JSON manifest loading**

`load_plugin_manifest()` must:

1. parse a JSON object;
2. require schema version `1`;
3. validate identifiers with `^[a-z0-9][a-z0-9_-]*$`;
4. resolve every skill root and check `resolved.is_relative_to(plugin_root)`;
5. reject symlinks in the path from plugin root to skill root;
6. require a regular `SKILL.md`;
7. initialize `display_path` without exposing an absolute path;
8. return frozen dataclasses.

- [ ] **Step 5: Implement deterministic dual hashing**

`hash_plugin_skill_tree(root)` must:

1. reject symlinks and non-regular files;
2. enumerate regular files by normalized POSIX relative path;
3. compute `skill_content_hash` from normalized `SKILL.md`;
4. compute `skill_tree_hash` from each path, byte length, file bytes, and one normalized
   executable-bit flag;
5. include `SKILL.md` and every supporting file;
6. exclude Beaver metadata such as `version.json` and `upstream.json`;
7. ignore mtime, uid/gid, and non-executable mode bits.

Use length-prefixed binary fields in the digest input instead of ambiguous string
concatenation.

- [ ] **Step 6: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_manifest.py tests/unit/test_plugin_hashing.py tests/unit/test_config_loader.py -q
```

Expected: PASS.

- [ ] **Step 7: Commit**

```bash
git add app-instance/backend/beaver/plugins app-instance/backend/beaver/foundation/config app-instance/backend/tests/unit/test_plugin_manifest.py app-instance/backend/tests/unit/test_plugin_hashing.py app-instance/backend/tests/unit/test_config_loader.py
git commit -m "feat(plugins): add declarative skill manifest"
```

---

### Task 2: Add Discovery And Atomic Plugin State

**Files:**
- Create: `app-instance/backend/beaver/plugins/discovery.py`
- Create: `app-instance/backend/beaver/plugins/state.py`
- Create: `app-instance/backend/beaver/foundation/utils/file_lock.py`
- Modify: `app-instance/backend/beaver/plugins/models.py`
- Modify: `app-instance/backend/beaver/plugins/__init__.py`
- Test: `app-instance/backend/tests/unit/test_plugin_state.py`
- Test: `app-instance/backend/tests/unit/test_workspace_write_lock.py`

- [ ] **Step 1: Write failing discovery and state tests**

Cover workspace discovery, configured search paths, duplicate plugin IDs, malformed
manifests reported as errors instead of crashing the full scan, and state round trips:

```python
def test_plugin_state_round_trip_is_atomic(tmp_path: Path) -> None:
    store = PluginStateStore(tmp_path)
    store.set_enabled("baoyu-comic", True)
    store.update_skill_binding(
        "baoyu-comic",
        "baoyu-comic",
        PluginSkillBinding(
            accepted_upstream_tree_hash="old",
            observed_upstream_tree_hash="new",
            accepted_beaver_version="v0001",
            current_beaver_version="v0002",
            pending_candidate_id="plugin-update:baoyu-comic:baoyu-comic:new",
            status="update_pending",
        ),
    )

    reloaded = PluginStateStore(tmp_path).get_plugin("baoyu-comic")

    assert reloaded is not None
    assert reloaded.enabled is True
    assert reloaded.skills["baoyu-comic"].accepted_upstream_tree_hash == "old"
    assert not (tmp_path / ".beaver" / "plugins" / "state.json.tmp").exists()
```

Add a multiprocess lock test in which two processes enter the same workspace lock and
assert their critical sections never overlap. Add a reentrancy test in which nested
acquisitions in one process complete without deadlock.

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_state.py tests/unit/test_workspace_write_lock.py -q
```

Expected: FAIL because discovery and state stores are missing.

- [ ] **Step 3: Implement state dataclasses**

Add backward-compatible `to_dict()` and `from_dict()` methods for:

```python
@dataclass(slots=True)
class PluginSkillBinding:
    accepted_upstream_tree_hash: str | None = None
    observed_upstream_tree_hash: str | None = None
    accepted_beaver_version: str | None = None
    current_beaver_version: str | None = None
    pending_candidate_id: str | None = None
    status: str = "discovered"
    last_error: str | None = None


@dataclass(slots=True)
class PluginState:
    plugin_id: str
    enabled: bool = False
    updates_paused: bool = False
    installed_version: str | None = None
    manifest_path: str | None = None
    status: str = "discovered"
    last_error: str | None = None
    skills: dict[str, PluginSkillBinding] = field(default_factory=dict)
```

- [ ] **Step 4: Implement atomic state persistence**

Store data at `<workspace>/.beaver/plugins/state.json`. Write a complete JSON document to
`state.json.tmp`, flush it, then replace `state.json`. Public methods:

```python
list_plugins()
get_plugin(plugin_id)
set_enabled(plugin_id, enabled)
upsert_plugin(plugin_state)
update_skill_binding(plugin_id, skill_name, binding)
```

- [ ] **Step 5: Implement the shared workspace write lock**

Add:

```python
class WorkspaceWriteLock:
    def __init__(self, workspace: str | Path) -> None:
        self.path = Path(workspace) / ".beaver" / "locks" / "plugin-skill-write.lock"

    @contextmanager
    def acquire(self, *, timeout_seconds: float | None = None, blocking: bool = True):
        ...
```

Requirements:

- use `fcntl.flock()` on POSIX and `msvcrt.locking()` on Windows, matching
  `memory/curated/store.py`;
- guard with a process-local `threading.RLock`;
- track per-thread recursion depth so nested store calls reuse the OS lock;
- support non-blocking acquisition for Engine boot;
- raise `WorkspaceWriteLockBusy` on timeout/contention;
- keep the lock file separate from atomically replaced data files.

- [ ] **Step 6: Implement discovery**

Scan:

1. `<workspace>/plugins`;
2. each configured `plugins.search_paths`.

Only direct child directories containing `beaver.plugin.json` are plugins. Return a
`PluginDiscoveryResult` containing valid manifests and per-path errors. Duplicate IDs are
errors and neither duplicate is activated. Discovery records a workspace-relative
manifest display path when possible and a redacted
`<external>/<plugin-dir>/beaver.plugin.json` path otherwise; absolute paths remain
internal.

- [ ] **Step 7: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_state.py tests/unit/test_workspace_write_lock.py tests/unit/test_plugin_manifest.py -q
```

Expected: PASS.

- [ ] **Step 8: Commit**

```bash
git add app-instance/backend/beaver/plugins app-instance/backend/beaver/foundation/utils/file_lock.py app-instance/backend/tests/unit/test_plugin_state.py app-instance/backend/tests/unit/test_workspace_write_lock.py
git commit -m "feat(plugins): discover packages and persist state"
```

---

### Task 3: Persist Immutable Upstream Skill Snapshots

**Files:**
- Create: `app-instance/backend/beaver/plugins/transaction.py`
- Modify: `app-instance/backend/beaver/skills/specs/models.py`
- Modify: `app-instance/backend/beaver/skills/specs/storage.py`
- Modify: `app-instance/backend/beaver/skills/specs/__init__.py`
- Test: `app-instance/backend/tests/unit/test_plugin_skill_storage.py`

- [ ] **Step 1: Write failing snapshot storage tests**

Test exact content, supporting files, idempotence, symlink rejection, and source
immutability:

```python
def test_write_upstream_snapshot_copies_skill_without_mutating_source(tmp_path: Path) -> None:
    source = tmp_path / "plugin" / "skills" / "comic"
    source.mkdir(parents=True)
    (source / "SKILL.md").write_text("# Comic\n\nOriginal.\n", encoding="utf-8")
    (source / "templates").mkdir()
    (source / "templates" / "panel.txt").write_text("panel", encoding="utf-8")
    store = SkillSpecStore(tmp_path / "workspace")
    transaction = PluginSkillTransaction(tmp_path / "workspace")

    snapshot = store.stage_upstream_snapshot(
        transaction,
        skill_name="baoyu-comic",
        source_kind="plugin",
        source_id="baoyu-comic",
        source_version="1.0.0",
        source_path="skills/comic",
        source_root=source,
    )
    store.promote_upstream_snapshot(transaction, snapshot)

    loaded = store.read_upstream_snapshot("baoyu-comic", "baoyu-comic", snapshot.skill_tree_hash)
    assert loaded is not None
    assert loaded.content == "# Comic\n\nOriginal.\n"
    assert (loaded.root / "templates" / "panel.txt").read_text(encoding="utf-8") == "panel"
    assert (source / "SKILL.md").read_text(encoding="utf-8") == "# Comic\n\nOriginal.\n"
```

Also test:

- changing only `templates/panel.txt` creates a different snapshot directory;
- `SkillVersion.from_dict()` remains compatible without `tree_hash`;
- reading a legacy version derives its complete tree hash;
- staging does not make a snapshot visible to `read_upstream_snapshot()`;
- promoting a staged snapshot uses `os.replace()` and is idempotent;
- a failed metadata write leaves no current pointer to the staged version.

- [ ] **Step 2: Run test and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_storage.py -q
```

Expected: FAIL because upstream snapshot APIs do not exist.

- [ ] **Step 3: Add upstream snapshot models**

Add:

```python
@dataclass(slots=True)
class SkillUpstreamSnapshot:
    skill_name: str
    source_kind: str
    source_id: str
    source_version: str
    source_path: str
    skill_content_hash: str
    skill_tree_hash: str
    created_at: str
    frontmatter: dict[str, Any] = field(default_factory=dict)
```

Add `LoadedSkillUpstreamSnapshot(snapshot, content, root)` for storage reads. Extend
`SkillVersion` with a backward-compatible `tree_hash: str = ""`; new versions persist the
complete version-tree hash, while `read_published_skill()` derives it for legacy metadata
that lacks the field.

- [ ] **Step 4: Add safe tree-copy helper**

Refactor a private `SkillSpecStore._copy_regular_tree(source_root, target_root)` that:

- rejects any symlink;
- rejects paths containing empty, `.`, or `..` segments;
- copies regular files only;
- creates parents;
- never writes outside `target_root`.

Use it for transaction staging now; Task 4 will reuse it for mirrored versions.

- [ ] **Step 5: Implement same-filesystem staging and promotion**

`PluginSkillTransaction` creates:

```text
<workspace>/.beaver/staging/plugin-skills/<transaction-id>/
```

The staging root must be on the same filesystem as `<workspace>/skills`. It exposes:

```python
stage_upstream_snapshot(...)
stage_skill_version(...)
promote_directory(staged, final)
cleanup()
```

`promote_directory()` uses `os.replace()` and never replaces an existing non-identical
immutable directory. Cleanup removes only the transaction's staging root.

- [ ] **Step 6: Implement snapshot APIs**

Write snapshots to:

```text
skills/<skill>/upstreams/<source-id>/<skill-tree-hash>/
```

The snapshot metadata stores both hashes. If the directory already exists, verify all
stored metadata and return it without rewriting.

Public methods:

```python
stage_upstream_snapshot(transaction, ...)
promote_upstream_snapshot(transaction, snapshot)
read_upstream_snapshot(skill_name, source_id, skill_tree_hash)
```

- [ ] **Step 7: Make JSON/current/index writes atomic**

Change `SkillSpecStore._write_json()` and current/index pointer writes to create a temporary
file in the target directory, flush and `fsync`, then `os.replace()`. Immutable version
directories are promoted first; runtime visibility changes only when `current.json`,
`skill.json`, and the published index are atomically replaced under the workspace lock.

- [ ] **Step 8: Run focused and existing storage tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_storage.py tests/unit/test_phase5_skills_runtime.py -q
```

Expected: PASS.

- [ ] **Step 9: Commit**

```bash
git add app-instance/backend/beaver/plugins/transaction.py app-instance/backend/beaver/skills/specs app-instance/backend/tests/unit/test_plugin_skill_storage.py
git commit -m "feat(skills): store immutable plugin upstream snapshots"
```

---

### Task 4: Mirror Initial Plugin Skills As First-Class Skills

**Files:**
- Create: `app-instance/backend/beaver/plugins/skills.py`
- Modify: `app-instance/backend/beaver/plugins/models.py`
- Modify: `app-instance/backend/beaver/plugins/__init__.py`
- Modify: `app-instance/backend/beaver/skills/specs/storage.py`
- Test: `app-instance/backend/tests/unit/test_plugin_skill_sync.py`

- [ ] **Step 1: Write failing initial mirror tests**

Cover:

- enabling mirrors `SKILL.md` and supporting files;
- mirrored skill is returned by `SkillsLoader.list_published_skills()`;
- `source_kind` is `plugin`, but runtime source is still workspace;
- existing non-plugin name collision fails without modification;
- any validation/safety failure in a multi-skill plugin occurs before promotion and leaves
  every linked skill unchanged;
- repeated sync is idempotent.
- supporting files are present in the promoted version;
- concurrent enable calls allocate only one version.

Core assertion:

```python
result = manager.enable("baoyu-comic")
record = SkillsLoader(workspace).get_skill_record("baoyu-comic")
loaded = SkillSpecStore(workspace).read_published_skill("baoyu-comic")

assert result.status == "synced"
assert record is not None and record.source == "workspace"
assert record.source_kind == "plugin"
assert loaded is not None
assert loaded.version.version == "v0001"
assert loaded.version.provenance["plugin_id"] == "baoyu-comic"
assert loaded.version.provenance["upstream_skill_content_hash"]
assert loaded.version.provenance["upstream_skill_tree_hash"]
```

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py -q
```

Expected: FAIL because `PluginManager` does not exist.

- [ ] **Step 3: Implement `PluginManager` constructor and discovery view**

Constructor dependencies:

```python
class PluginManager:
    def __init__(
        self,
        *,
        workspace: Path,
        manifests: dict[str, PluginManifest],
        discovery_errors: list[PluginDiscoveryError],
        state_store: PluginStateStore,
        skill_store: SkillSpecStore,
        learning_store: SkillLearningStore,
        publisher: SkillPublisher,
        safety_checker: SkillDraftSafetyChecker,
        write_lock: WorkspaceWriteLock,
    ) -> None:
        ...
```

Keep all filesystem and lifecycle dependencies injectable for tests.

- [ ] **Step 4: Implement exact initial mirror publication**

Acquire the workspace write lock before reading state, allocating versions, or writing
candidates. For each declared skill:

1. persist the upstream snapshot;
2. validate ownership conflict;
3. parse frontmatter/body and create an in-memory `SkillDraft` with
   `proposal_kind="plugin_initial_mirror"`;
4. run `SkillDraftSafetyChecker.check()` and reject failed or critical reports;
5. allocate the next `vNNNN` while holding the lock;
6. stage a `SkillVersion` whose content exactly equals upstream `SKILL.md`;
7. stage snapshot supporting files into the version directory;
8. generate the complete next `SkillSpec`, current pointer, index, and plugin-state JSON
   payloads in memory.

Use provenance:

```python
{
    "source_kind": "plugin",
    "plugin_id": manifest.plugin_id,
    "plugin_version": manifest.version,
    "plugin_skill_path": declaration.relative_path,
    "upstream_skill_content_hash": snapshot.skill_content_hash,
    "upstream_skill_tree_hash": snapshot.skill_tree_hash,
    "merge_mode": "initial_mirror",
}
```

- [ ] **Step 5: Promote the complete staged transaction**

After every declared skill passes validation:

1. for a new skill, promote its complete staged skill directory with one `os.replace()`;
2. for an existing skill, promote immutable upstream/version directories, atomically
   replace spec/index metadata, and replace `current.json` last as the visibility switch;
3. atomically write plugin state last;
4. clean the staging directory.

Do not implement reverse rollback across already-promoted immutable directories. If a
metadata write fails, those directories remain unreferenced and harmless; the previous
current pointers remain authoritative. Add startup cleanup for staging directories older
than 24 hours.

- [ ] **Step 6: Run focused and loader tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_phase5_skills_runtime.py -q
```

Expected: PASS.

- [ ] **Step 7: Commit**

```bash
git add app-instance/backend/beaver/plugins app-instance/backend/beaver/skills/specs/storage.py app-instance/backend/tests/unit/test_plugin_skill_sync.py
git commit -m "feat(plugins): mirror enabled plugin skills"
```

---

### Task 5: Detect Upgrades And Create Idempotent Learning Candidates

**Files:**
- Modify: `app-instance/backend/beaver/plugins/skills.py`
- Modify: `app-instance/backend/beaver/memory/skills/models.py`
- Modify: `app-instance/backend/beaver/memory/skills/store.py`
- Test: `app-instance/backend/tests/unit/test_plugin_skill_sync.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_candidate_state.py`

- [ ] **Step 1: Write failing upgrade classification tests**

Create four tree-hash fixtures representing `B`, `L`, and `U`:

```python
@pytest.mark.parametrize(
    ("base", "local", "upstream", "expected"),
    [
        ("A", "A", "A", "unchanged"),
        ("A", "B", "B", "already_applied"),
        ("A", "A", "B", "fast_forward"),
        ("A", "LOCAL", "UPSTREAM", "three_way"),
    ],
)
def test_classify_plugin_skill_update(base: str, local: str, upstream: str, expected: str) -> None:
    assert classify_plugin_skill_update(base, local, upstream) == expected
```

Also test:

- a supporting-file-only change returns `fast_forward` or `three_way`, never `unchanged`;
- candidate ID stability across repeated sync;
- new upstream supersedes an older pending candidate;
- candidate evidence contains hashes/version references but no raw skill body;
- legacy candidate payloads still parse.
- two processes syncing the same update append only one candidate record.

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_candidate_state.py -q
```

Expected: FAIL because update classification and candidate kind are missing.

- [ ] **Step 3: Add `plugin_skill_update` candidate support**

Do not add a special status. Existing candidate statuses remain sufficient. Ensure
`SkillLearningCandidate.from_dict()` accepts the new `kind` without changing legacy
defaults.

Use evidence:

```python
{
    "plugin_id": plugin_id,
    "plugin_version": manifest.version,
    "skill_name": skill_name,
    "merge_mode": merge_mode,
    "base_upstream_tree_hash": accepted_tree_hash,
    "new_upstream_tree_hash": snapshot.skill_tree_hash,
    "local_version": current.version.version,
}
```

Set `priority=10`, `confidence=1.0`, `trigger_reason="plugin_update"`.

- [ ] **Step 4: Implement update classification and candidate creation**

Use canonical hashes and deterministic IDs:

```python
candidate_id = (
    f"plugin-update:{plugin_id}:{skill_name}:"
    f"{new_upstream_tree_hash[:12]}"
)
```

For `already_applied`, advance state without a candidate. For `fast_forward` and
`three_way`, record an open candidate. If the same ID exists in any status, do not append
another JSONL record.

- [ ] **Step 5: Make candidate mutation atomic under the shared lock**

Add an optional `WorkspaceWriteLock` to `SkillLearningStore`; EngineLoader supplies the
shared workspace instance, while isolated unit-test construction falls back to a
store-local lock. Add:

```python
record_learning_candidate_if_absent(candidate) -> tuple[SkillLearningCandidate, bool]
```

Inside one lock acquisition, read current candidates, check the deterministic ID, and
atomically rewrite or append the JSONL record. Apply the same lock to candidate update and
transition methods. Nested calls from `PluginManager` reuse the reentrant lock.

- [ ] **Step 6: Supersede stale pending updates**

When a different pending candidate exists for the same plugin skill:

```python
learning_store.transition_learning_candidate(
    old_candidate_id,
    "superseded",
    event_type="plugin_update_superseded",
    payload={"replacement_candidate_id": new_candidate_id},
)
```

- [ ] **Step 7: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_candidate_state.py -q
```

Expected: PASS.

- [ ] **Step 8: Commit**

```bash
git add app-instance/backend/beaver/plugins/skills.py app-instance/backend/beaver/memory/skills/models.py app-instance/backend/beaver/memory/skills/store.py app-instance/backend/tests/unit/test_plugin_skill_sync.py app-instance/backend/tests/unit/test_skill_learning_candidate_state.py
git commit -m "feat(plugins): enqueue skill upgrade candidates"
```

---

### Task 6: Add Plugin Update Draft Provenance And Fast-Forward Synthesis

**Files:**
- Modify: `app-instance/backend/beaver/skills/specs/models.py`
- Modify: `app-instance/backend/beaver/skills/drafts/service.py`
- Modify: `app-instance/backend/beaver/skills/publisher/service.py`
- Modify: `app-instance/backend/beaver/skills/learning/service.py`
- Test: `app-instance/backend/tests/unit/test_plugin_skill_learning.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_pipeline.py`

- [ ] **Step 1: Write failing model and fast-forward tests**

Test backward-compatible draft parsing and exact upstream fast-forward:

```python
draft = asyncio.run(service.synthesize_draft(candidate.candidate_id, provider_bundle))

assert draft.proposal_kind == "plugin_skill_update"
assert draft.proposed_content == new_upstream.content
assert draft.base_version == "v0001"
assert draft.provenance["merge_mode"] == "fast_forward"
assert draft.provenance["new_upstream_tree_hash"] == new_upstream.snapshot.skill_tree_hash
assert provider.calls == []
```

After publish, assert the new version contains the new upstream supporting files even when
`SKILL.md` did not change.

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_pipeline.py -q
```

Expected: FAIL because drafts have no provenance and the learning service has no plugin
update branch.

- [ ] **Step 3: Add backward-compatible draft provenance**

Extend `SkillDraft`:

```python
provenance: dict[str, Any] = field(default_factory=dict)
```

Include it in `to_dict()` and parse missing values as `{}` in `from_dict()`.

- [ ] **Step 4: Add a focused draft constructor**

Add:

```python
def create_plugin_update_draft(
    self,
    *,
    skill_name: str,
    base_version: str,
    proposed_content: str,
    proposed_frontmatter: dict,
    created_by: str,
    reason: str,
    provenance: dict,
    evidence_refs: list[dict] | None = None,
) -> SkillDraft:
```

It writes `proposal_kind="plugin_skill_update"`.

- [ ] **Step 5: Implement fast-forward synthesis**

In `SkillLearningService.synthesize_draft()`, branch before ordinary revision:

```python
if candidate.kind == "plugin_skill_update":
    return await self._synthesize_plugin_update(candidate, provider_bundle)
```

For `merge_mode == "fast_forward"`, load `U` from `SkillSpecStore`, parse its
frontmatter/body, and create a draft exactly equal to `U`. Do not call the provider.

- [ ] **Step 6: Serialize all skill publication**

Add an optional `WorkspaceWriteLock` to `SkillPublisher`; EngineLoader supplies the shared
workspace instance and isolated tests use a publisher-local fallback. Hold it across
`_next_version()`, version staging/promotion, spec/current/index replacement, rollback,
and disable. This protects ordinary learned skills as well as plugin-origin skills from
racing with boot or explicit plugin sync.

- [ ] **Step 7: Materialize referenced supporting files during publish**

For `proposal_kind="plugin_skill_update"`, resolve the snapshot and supporting-file plan
from draft provenance. Stage the complete next version directory, including `SKILL.md`
and supporting files, before promoting it. Reject missing snapshots, path conflicts, or
tree-hash mismatches. Ordinary skill publication keeps its current behavior.

- [ ] **Step 8: Preserve draft provenance on publish**

Change `SkillPublisher.publish()` provenance construction to:

```python
provenance={
    **dict(draft.provenance),
    "draft_id": draft_id,
    "proposal_kind": draft.proposal_kind,
    "trigger_run_id": draft.trigger_run_id,
    "trigger_session_id": draft.trigger_session_id,
}
```

- [ ] **Step 9: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_pipeline.py -q
```

Expected: PASS.

- [ ] **Step 10: Commit**

```bash
git add app-instance/backend/beaver/skills app-instance/backend/tests/unit/test_plugin_skill_learning.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py
git commit -m "feat(skill-learning): create plugin update drafts"
```

---

### Task 7: Implement Three-Way Plugin Skill Synthesis

**Files:**
- Create: `app-instance/backend/beaver/plugins/tree_merge.py`
- Modify: `app-instance/backend/beaver/skills/learning/synthesizer.py`
- Modify: `app-instance/backend/beaver/skills/learning/service.py`
- Test: `app-instance/backend/tests/unit/test_plugin_skill_learning.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_synthesizer_preservation.py`

- [ ] **Step 1: Write failing three-way prompt and parse tests**

Assert the prompt contains labeled `OLD UPSTREAM`, `CURRENT LOCAL`, and `NEW UPSTREAM`
sections and does not confuse the current local version with the merge base.

Test response parsing for:

```json
{
  "frontmatter": {"name": "baoyu-comic", "description": "Comic workflow", "tools": []},
  "content": "# Baoyu Comic\n...",
  "change_reason": "Adopt upstream layout while preserving learned review step.",
  "preserved_local_sections": ["Review"],
  "adopted_upstream_sections": ["Panel Layout"],
  "resolved_conflicts": ["Output ordering"],
  "dropped_sections": []
}
```

Add supporting-file merge tests:

```python
def test_supporting_file_merge_adopts_upstream_when_local_is_unchanged() -> None:
    plan = merge_supporting_file_trees(base={"a.txt": "A"}, local={"a.txt": "A"}, upstream={"a.txt": "U"})
    assert plan.files["a.txt"].source == "upstream"
    assert plan.conflicts == []


def test_supporting_file_merge_blocks_divergent_edits() -> None:
    plan = merge_supporting_file_trees(base={"a.txt": "A"}, local={"a.txt": "L"}, upstream={"a.txt": "U"})
    assert plan.conflicts[0].path == "a.txt"
```

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_synthesizer_preservation.py -q
```

Expected: FAIL because three-way synthesis does not exist.

- [ ] **Step 3: Add `synthesize_plugin_update()`**

Signature:

```python
async def synthesize_plugin_update(
    self,
    candidate: SkillLearningCandidate,
    evidence_packet: EvidencePacket,
    provider: LLMProvider,
    model: str,
    *,
    old_upstream: dict[str, Any],
    current_local: dict[str, Any],
    new_upstream: dict[str, Any],
) -> dict[str, Any]:
```

The system message must require JSON only and state:

- preserve valid local learning;
- adopt upstream fixes and safety changes;
- do not concatenate duplicate sections;
- list every intentional drop;
- leave `resolved_conflicts` empty only when no semantic conflict exists.

- [ ] **Step 4: Load all three snapshots in the learning service**

Resolve:

- `B` using `base_upstream_tree_hash`;
- `L` using `local_version`;
- `U` using `new_upstream_tree_hash`.

Raise a specific `ValueError` when any referenced snapshot/version is missing. Do not
fallback to a two-way merge.

- [ ] **Step 5: Build the deterministic supporting-file merge plan**

Compare files by path and content/executable digest:

- `L == B`: use `U`;
- `U == B`: use `L`;
- `L == U`: use either;
- one-sided addition: use the added file;
- divergent edit, different same-path additions, and delete-versus-edit: conflict.

Exclude `SKILL.md` because the synthesizer handles it. Store selected source references
and conflict records in draft provenance; do not duplicate file bytes in JSON.

- [ ] **Step 6: Create the plugin update draft**

Store merge decisions in draft provenance:

```python
{
    **plugin_reference_fields,
    "merge_mode": "three_way",
    "preserved_local_sections": payload["preserved_local_sections"],
    "adopted_upstream_sections": payload["adopted_upstream_sections"],
    "resolved_conflicts": payload["resolved_conflicts"],
    "dropped_sections": payload["dropped_sections"],
    "supporting_file_plan": supporting_file_plan.to_dict(),
}
```

If the supporting-file plan contains conflicts, the draft may be inspected but cannot be
published. V1 does not ask the LLM to merge arbitrary or binary files.

- [ ] **Step 7: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_learning.py tests/unit/test_skill_learning_synthesizer_preservation.py -q
```

Expected: PASS.

- [ ] **Step 8: Commit**

```bash
git add app-instance/backend/beaver/plugins/tree_merge.py app-instance/backend/beaver/skills/learning app-instance/backend/tests/unit/test_plugin_skill_learning.py app-instance/backend/tests/unit/test_skill_learning_synthesizer_preservation.py
git commit -m "feat(skill-learning): synthesize three-way plugin updates"
```

---

### Task 8: Extend Replay Preservation For Plugin Merges

**Files:**
- Modify: `app-instance/backend/beaver/skills/learning/preservation.py`
- Modify: `app-instance/backend/beaver/skills/learning/eval.py`
- Modify: `app-instance/backend/beaver/skills/learning/pipeline.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_preservation.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_eval.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_pipeline.py`

- [ ] **Step 1: Write failing plugin merge preservation tests**

Cover:

- merged draft preserves local Safety and adopts new upstream Safety;
- silently dropping either Safety section fails;
- explicitly resolved non-safety conflicts pass;
- unresolved conflicts block publish;
- unresolved supporting-file conflicts block publish;
- baseline replay remains current local `L`.

Expected report shape:

```python
assert report.preservation_report == {
    "mode": "plugin_three_way",
    "passed": True,
    "local": {...},
    "upstream": {...},
    "unresolved_conflicts": [],
}
```

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_skill_learning_preservation.py tests/unit/test_skill_learning_eval.py tests/unit/test_skill_learning_pipeline.py -q
```

Expected: FAIL because preservation only checks one base skill.

- [ ] **Step 3: Add plugin merge preservation helper**

Add:

```python
def check_plugin_merge_preservation(
    *,
    local_content: str,
    upstream_content: str,
    draft_content: str,
    merge_decisions: dict[str, Any],
) -> dict[str, Any]:
```

It calls existing `check_preservation()` for local and upstream content, gives Safety and
Required Tools sections blocking weight, and reports unresolved conflicts separately.

- [ ] **Step 4: Use current local as replay baseline**

When `draft.proposal_kind == "plugin_skill_update"`, load `draft.base_version` as the
baseline skill. Continue to run the candidate arm with the draft context. Do not use raw
upstream `B` or `U` as the replay baseline.

- [ ] **Step 5: Tighten publish gate**

Add:

```python
if draft.proposal_kind == "plugin_skill_update":
    preservation = eval_report.preservation_report or {}
    if preservation.get("mode") != "plugin_three_way" and draft.provenance.get("merge_mode") == "three_way":
        raise ValueError("Plugin update requires a three-way preservation report")
    if preservation.get("unresolved_conflicts"):
        raise ValueError("Plugin update has unresolved merge conflicts")
    if draft.provenance.get("supporting_file_plan", {}).get("conflicts"):
        raise ValueError("Plugin update has unresolved supporting-file conflicts")
```

The existing `passed is False` gate remains active.

- [ ] **Step 6: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_skill_learning_preservation.py tests/unit/test_skill_learning_eval.py tests/unit/test_skill_learning_pipeline.py -q
```

Expected: PASS.

- [ ] **Step 7: Commit**

```bash
git add app-instance/backend/beaver/skills/learning app-instance/backend/tests/unit/test_skill_learning_preservation.py app-instance/backend/tests/unit/test_skill_learning_eval.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py
git commit -m "feat(skill-learning): gate plugin merge preservation"
```

---

### Task 9: Reconcile Publication And Implement Pause/Disable/Adopt

**Files:**
- Modify: `app-instance/backend/beaver/plugins/skills.py`
- Modify: `app-instance/backend/beaver/skills/learning/pipeline.py`
- Modify: `app-instance/backend/beaver/skills/publisher/service.py`
- Test: `app-instance/backend/tests/unit/test_plugin_skill_sync.py`
- Test: `app-instance/backend/tests/unit/test_skill_learning_pipeline.py`

- [ ] **Step 1: Write failing lifecycle tests**

Test:

- publishing a plugin update advances accepted upstream tree hash;
- pending candidate clears;
- simulated observer failure leaves the published version intact;
- the next sync reconciles state from current version provenance and does not recreate the
  candidate;
- reconciliation never moves `accepted_beaver_version` backwards after rollback;
- pause leaves linked skills active and creates no update candidates;
- resume reconciles and syncs;
- disabling plugin disables linked skills without deletion;
- re-enable restores and syncs;
- missing package sets plugin status `missing`, suspends sync, and leaves linked skills
  active;
- adopt changes `source_kind` to `managed`, removes binding, and keeps the skill active.

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_pipeline.py -q
```

Expected: FAIL because publication has no plugin acknowledgement callback.

- [ ] **Step 3: Add a narrow publication observer**

Extend pipeline construction with:

```python
publish_observer: Callable[[SkillDraft, SkillVersion | SkillSpec], None] | None = None
```

After successful publish, call it before returning. Observer failure must be recorded and
audited as `plugin_publish_ack_failed`; it must not delete the already-published version
or turn the publish API response into a failure. Mark the learning candidate published
before invoking the best-effort observer so clients do not retry a successful publish.
The next sync is responsible for reconciliation.

- [ ] **Step 4: Implement `PluginManager.on_skill_published()`**

For `proposal_kind="plugin_skill_update"`:

1. validate plugin ID, skill name, and new upstream tree hash from draft provenance;
2. set `accepted_upstream_tree_hash = new_upstream_tree_hash`;
3. set `observed_upstream_tree_hash = new_upstream_tree_hash`;
4. set `accepted_beaver_version = published.version`;
5. set `current_beaver_version = published.version`;
6. clear `pending_candidate_id`;
7. set status `synced`.

- [ ] **Step 5: Implement sync-time reconciliation**

At the beginning of `sync_enabled()`, inspect each linked skill's current published
version. When provenance contains:

```python
{
    "proposal_kind": "plugin_skill_update",
    "plugin_id": plugin_id,
    "new_upstream_tree_hash": tree_hash,
}
```

and the referenced upstream snapshot exists, advance state only if the current version
number is newer than `accepted_beaver_version`. Clear only the matching pending candidate.
Never regress state when the runtime current pointer was rolled back to an older version.

- [ ] **Step 6: Implement pause, resume, disable, missing, and adopt**

`pause(plugin_id)` sets `updates_paused=True` and leaves linked skills unchanged.
`resume(plugin_id)` clears the flag and performs reconciliation/sync.

`disable(plugin_id, disable_linked_skills=True)` rejects calls without the explicit
confirmation and calls `SkillPublisher.disable()` for every still-linked skill.
`adopt(plugin_id, skill_name)`:

- requires an existing binding;
- changes `SkillSpec.source_kind` to `managed`;
- appends `adopted_from_plugin:<plugin-id>` to lineage;
- removes the binding;
- leaves the current version active.

When discovery cannot find a previously known plugin, set status `missing`, preserve
`enabled` and `updates_paused`, skip update generation, and do not disable any linked
skill.

- [ ] **Step 7: Run focused tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_skill_sync.py tests/unit/test_skill_learning_pipeline.py -q
```

Expected: PASS.

- [ ] **Step 8: Commit**

```bash
git add app-instance/backend/beaver/plugins/skills.py app-instance/backend/beaver/skills/learning/pipeline.py app-instance/backend/beaver/skills/publisher/service.py app-instance/backend/tests/unit/test_plugin_skill_sync.py app-instance/backend/tests/unit/test_skill_learning_pipeline.py
git commit -m "feat(plugins): track published updates and ownership"
```

---

### Task 10: Wire Plugin Sync Into Engine Loading

**Files:**
- Modify: `app-instance/backend/beaver/engine/loader.py`
- Modify: `app-instance/backend/beaver/plugins/__init__.py`
- Test: `app-instance/backend/tests/unit/test_plugin_runtime.py`
- Test: `app-instance/backend/tests/unit/test_phase5_skills_runtime.py`

- [ ] **Step 1: Write failing runtime assembly tests**

Test:

- discovered disabled plugins do not mirror;
- enabled plugin mirrors before `EngineLoadResult.skills` is calculated;
- changed plugin creates a candidate but never calls an LLM during boot;
- repeated boot creates no duplicate versions/candidates;
- concurrent multi-process boot creates no duplicate versions/candidates;
- boot skips auto-sync and reports `deferred_lock_busy` when an explicit sync holds the
  workspace lock;
- `EngineLoadResult.plugin_manager` and plugin summaries are available.

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_runtime.py tests/unit/test_phase5_skills_runtime.py -q
```

Expected: FAIL because `EngineLoader` does not assemble plugin services.

- [ ] **Step 3: Extend `EngineLoadResult` and loader injection**

Add:

```python
plugin_manager: PluginManager | None = None
plugins: list[dict] = field(default_factory=list)
```

Allow `plugin_manager` injection in `EngineLoader.__init__()` for tests.

- [ ] **Step 4: Assemble in dependency order**

Required order:

1. config/workspace;
2. `SkillSpecStore`, learning store, and `SkillsLoader`;
3. tool registry and builtins, including skill-view tools using that loader;
4. draft/review/publisher and a safety checker using the completed tool registry;
5. discovery and `PluginStateStore`;
6. `PluginManager`;
7. `plugin_manager.sync_enabled(blocking=False)` when `config.plugins.auto_sync`;
8. learning service/pipeline with publication observer;
9. result summaries.

Do not use `SkillsLoader.extra_dirs` for plugin skills. Explicit API enable/sync uses a
bounded blocking lock timeout; Engine boot uses a non-blocking attempt and proceeds with
the current published skill set if another writer owns the lock.

- [ ] **Step 5: Run runtime tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_runtime.py tests/unit/test_phase5_skills_runtime.py -q
```

Expected: PASS.

- [ ] **Step 6: Commit**

```bash
git add app-instance/backend/beaver/engine/loader.py app-instance/backend/beaver/plugins app-instance/backend/tests/unit/test_plugin_runtime.py app-instance/backend/tests/unit/test_phase5_skills_runtime.py
git commit -m "feat(runtime): sync declarative plugins at boot"
```

---

### Task 11: Add Plugin Management API

**Files:**
- Modify: `app-instance/backend/beaver/interfaces/web/app.py`
- Test: `app-instance/backend/tests/unit/test_plugin_web_api.py`

- [ ] **Step 1: Write failing API tests**

Cover:

```text
GET  /api/plugins
POST /api/plugins/sync
POST /api/plugins/{plugin_id}/enable
POST /api/plugins/{plugin_id}/pause
POST /api/plugins/{plugin_id}/resume
POST /api/plugins/{plugin_id}/disable
POST /api/plugins/{plugin_id}/skills/{skill_name}/adopt
```

Assert `404` for unknown plugin, `409` for skill ownership conflict, and `400` for invalid
manifest/sync errors. Assert lock timeout maps to `409 plugin_write_busy`. Assert no
payload contains the real absolute workspace or external search-root path. Assert disable
without `{"disable_linked_skills": true}` is rejected.

- [ ] **Step 2: Run tests and verify failure**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_web_api.py -q
```

Expected: FAIL with missing routes.

- [ ] **Step 3: Add normalized plugin payload helper**

Return:

```python
{
    "id": manifest.plugin_id,
    "name": manifest.name,
    "discovered_version": manifest.version,
    "installed_version": state.installed_version,
    "enabled": state.enabled,
    "status": state.status,
    "last_error": state.last_error,
    "manifest_path": manifest.display_path,
    "updates_paused": state.updates_paused,
    "skills": [
        {
            "name": declaration.name,
            "status": binding.status,
            "current_beaver_version": binding.current_beaver_version,
            "accepted_upstream_tree_hash": binding.accepted_upstream_tree_hash,
            "observed_upstream_tree_hash": binding.observed_upstream_tree_hash,
            "accepted_beaver_version": binding.accepted_beaver_version,
            "pending_candidate_id": binding.pending_candidate_id,
        }
    ],
}
```

Never return arbitrary plugin file content, secrets, or absolute server paths.

- [ ] **Step 4: Implement routes**

Each mutating endpoint boots one runtime, invokes its `plugin_manager`, and returns the
updated plugin payload. Map `ValueError` messages to stable HTTP status codes.

- [ ] **Step 5: Run focused and existing web tests**

```bash
cd app-instance/backend
pytest tests/unit/test_plugin_web_api.py tests/unit/test_skill_learning_web_api.py -q
```

Expected: PASS.

- [ ] **Step 6: Commit**

```bash
git add app-instance/backend/beaver/interfaces/web/app.py app-instance/backend/tests/unit/test_plugin_web_api.py
git commit -m "feat(api): manage declarative plugins"
```

---

### Task 12: Add Plugin Management To The Skills UI

**Files:**
- Modify: `app-instance/frontend/types/index.ts`
- Modify: `app-instance/frontend/lib/api.ts`
- Modify: `app-instance/frontend/app/(app)/skills/page.tsx`
- Test: `app-instance/frontend/lib/plugin-api.test.ts`

- [ ] **Step 1: Write failing API client tests**

Test URL, method, and response typing for list, sync, enable, pause, resume, disable, and
adopt.

- [ ] **Step 2: Run frontend test and verify failure**

Run the repository's existing frontend test command targeting:

```bash
cd app-instance/frontend
npx vitest run lib/plugin-api.test.ts
```

Expected: FAIL because plugin API functions do not exist.

- [ ] **Step 3: Add frontend types**

Add:

```typescript
export interface PluginSkillBinding {
  name: string;
  status: string;
  current_beaver_version?: string | null;
  accepted_upstream_tree_hash?: string | null;
  observed_upstream_tree_hash?: string | null;
  accepted_beaver_version?: string | null;
  pending_candidate_id?: string | null;
}

export interface BeaverPlugin {
  id: string;
  name: string;
  discovered_version?: string | null;
  installed_version?: string | null;
  enabled: boolean;
  updates_paused: boolean;
  status: string;
  last_error?: string | null;
  manifest_path?: string | null;
  skills: PluginSkillBinding[];
}
```

- [ ] **Step 4: Add API functions**

Implement:

```typescript
listPlugins()
syncPlugins()
enablePlugin(pluginId)
pausePlugin(pluginId)
resumePlugin(pluginId)
disablePlugin(pluginId, { disable_linked_skills: true })
adoptPluginSkill(pluginId, skillName)
```

- [ ] **Step 5: Add a `plugins` Skills tab**

Extend `SkillsTab` and render a compact table with:

- plugin name and versions;
- enabled/status badges;
- linked skills and pending candidate link;
- icon buttons with tooltips for sync, enable, pause, resume, disable, and adopt;
- confirmation before disable/adopt;
- missing-source warning stating that current skills remain active but updates are
  suspended;
- existing `runAction()` and error handling.

Do not add a separate marketing-style page or nested cards.

- [ ] **Step 6: Label plugin-origin skills and update candidates**

In existing Published/Candidates/Drafts views:

- show `Plugin` source badge when `source_kind === "plugin"`;
- render `plugin_skill_update` as `插件升级合并 / Plugin update merge`;
- show `fast_forward` or `three_way` from candidate evidence/provenance.

- [ ] **Step 7: Run frontend tests and type checks**

```bash
cd app-instance/frontend
npx vitest run lib/plugin-api.test.ts
npm run lint
npx tsc --noEmit
```

Expected: PASS.

- [ ] **Step 8: Commit**

```bash
git add app-instance/frontend/types/index.ts app-instance/frontend/lib/api.ts app-instance/frontend/lib/plugin-api.test.ts 'app-instance/frontend/app/(app)/skills/page.tsx'
git commit -m "feat(skills-ui): manage plugin skill mirrors"
```

---

### Task 13: Add End-To-End Lifecycle Coverage And Documentation

**Files:**
- Create: `app-instance/backend/tests/integration/test_plugin_skill_lifecycle.py`
- Create: `docs/plugins/skill-plugins.md`
- Modify: `docs/product-discovery/beaver/README.md`

- [ ] **Step 1: Write the end-to-end lifecycle test**

The test must:

1. create plugin `1.0.0`;
2. enable it and assert mirror `v0001`;
3. publish a normal learned local revision `v0002`;
4. replace the package with plugin `1.1.0`;
5. sync and assert one `three_way` candidate;
6. synthesize with a stub provider;
7. run safety and replay evaluation with a stub runner;
8. submit, approve, and publish `v0003`;
9. assert accepted upstream tree hash and provenance advanced;
10. rollback to `v0002`;
11. assert plugin source files were never modified;
12. update only a supporting file and assert a new update candidate is created;
13. simulate publish-observer failure and assert the next sync reconciles state;
14. remove the plugin package and assert the plugin is `missing` while the current skill
    remains active;
15. run two sync processes and assert no duplicate version or candidate is created.

- [ ] **Step 2: Run the integration test and fix only lifecycle defects**

```bash
cd app-instance/backend
pytest tests/integration/test_plugin_skill_lifecycle.py -v
```

Expected: PASS.

- [ ] **Step 3: Write operator documentation**

Document:

- package layout and manifest;
- discovery roots;
- explicit enable requirement;
- mirror and three-way merge behavior;
- dual content/tree hashing and supporting-file merge conflicts;
- update candidate review flow;
- pause/resume versus disable/adopt;
- recovery from missing/invalid plugins;
- workspace locking, deferred boot sync, and publication reconciliation;
- why plugin Python code is not executed in V1.

- [ ] **Step 4: Run the complete relevant backend suite**

```bash
cd app-instance/backend
pytest \
  tests/unit/test_plugin_manifest.py \
  tests/unit/test_plugin_hashing.py \
  tests/unit/test_plugin_state.py \
  tests/unit/test_workspace_write_lock.py \
  tests/unit/test_plugin_skill_storage.py \
  tests/unit/test_plugin_skill_sync.py \
  tests/unit/test_plugin_skill_learning.py \
  tests/unit/test_plugin_runtime.py \
  tests/unit/test_plugin_web_api.py \
  tests/unit/test_skill_learning_candidate_state.py \
  tests/unit/test_skill_learning_pipeline.py \
  tests/unit/test_skill_learning_eval.py \
  tests/unit/test_skill_learning_worker.py \
  tests/unit/test_phase5_skills_runtime.py \
  tests/integration/test_plugin_skill_lifecycle.py \
  -q
```

Expected: PASS.

- [ ] **Step 5: Run frontend verification**

```bash
cd app-instance/frontend
npx vitest run lib/plugin-api.test.ts
npm run lint
npx tsc --noEmit
```

Expected: PASS.

- [ ] **Step 6: Run a dirty-worktree-safe diff review**

```bash
git status --short
git diff --check
git diff --stat
```

Expected:

- no whitespace errors;
- only plugin/skill lifecycle files and planned docs/tests are included in this feature;
- unrelated pre-existing user changes remain untouched.

- [ ] **Step 7: Commit**

```bash
git add app-instance/backend/tests/integration/test_plugin_skill_lifecycle.py docs/plugins/skill-plugins.md docs/product-discovery/beaver/README.md
git commit -m "docs(plugins): document skill mirror lifecycle"
```

---

## Release Sequence

1. Ship backend manifest, state, snapshots, and initial mirror behind the Plugins API.
2. Enable update candidate generation after initial mirror tests pass in a real workspace.
3. Enable three-way synthesis and replay publish gates.
4. Ship the Plugins UI.
5. Keep executable plugin code disabled; design it separately with process isolation and
   permission boundaries.

## Rollout Metrics

Track:

- plugin discovery and manifest error count;
- initial mirror success/failure count;
- plugin update candidates created, superseded, rejected, and published;
- plugin update candidates caused by supporting-file-only changes;
- fast-forward versus three-way update ratio;
- write-lock contention and deferred boot sync count;
- publication reconciliation repair count;
- replay regression and preservation failure rate;
- time from upstream discovery to accepted publication;
- rollback count for plugin-origin versions.

## Final Acceptance Test

The feature is complete only when a plugin-origin skill can:

1. be enabled and used with normal skill priority;
2. accumulate a normal Beaver-learned revision;
3. receive a newer upstream plugin version;
4. produce a three-way update draft without editing the plugin package;
5. pass the same safety, replay, review, and publish gates as ordinary skills;
6. retain full upstream and local provenance;
7. detect and publish supporting-file-only updates;
8. survive concurrent boot/sync without duplicate versions or candidates;
9. recover plugin state after observer failure;
10. remain active when its plugin package is temporarily missing;
11. be paused, resumed, rolled back, disabled, re-enabled, or adopted without data loss.