Files
beaver_project/docs/superpowers/specs/2026-06-15-memory-gateway-backend-design.md

11 KiB

Memory Gateway Backend Design

Goal

Allow each Beaver instance to select exactly one memory backend through .beaver/config.json:

  • curated: preserve the existing MEMORY.md / USER.md snapshot and memory tool behavior.
  • memory_gateway: recall memory through POST /memories/search, then persist each completed conversation turn through one POST /memories/add followed by one POST /memories/flush.

The Memory Gateway integration is best-effort. Gateway failures must be auditable without turning an otherwise successful Beaver chat run into a failure.

Scope

This change includes:

  • Runtime configuration for selecting the memory backend.
  • Fixed Memory Gateway credentials and search scopes in instance config.
  • A Memory Gateway HTTP client.
  • A memory backend strategy boundary used by AgentLoop.
  • Pre-run recall and post-run turn persistence.
  • Hidden session audit events for recall and persistence outcomes.
  • Unit and integration-style tests using fake HTTP responses/providers.

This change does not include:

  • Automatic POST /users calls or credential provisioning.
  • A memory settings UI or memory administration UI.
  • Resource upload support from Beaver.
  • Gateway memory override or deletion APIs.
  • Persisting tool calls, tool results, system events, reasoning, or skill activation messages.
  • Simultaneously enabling curated memory and Memory Gateway.

Configuration

Beaver adds a top-level memory section:

{
  "memory": {
    "mode": "memory_gateway",
    "gateway": {
      "baseUrl": "http://127.0.0.1:8010",
      "userId": "gateway_test_user",
      "userKey": "uk_xxx",
      "appId": "default",
      "projectId": "default",
      "scope": ["current_chat", "resources"],
      "topK": 8,
      "timeoutSeconds": 10
    }
  }
}

Configuration rules:

  • Missing memory.mode defaults to curated for backward compatibility.
  • Valid modes are only curated and memory_gateway.
  • Gateway mode requires non-empty baseUrl, userId, and userKey.
  • appId and projectId default to default.
  • scope is read from config and must be a non-empty subset of current_chat, resources, and all_user_memory. The initial test setup uses current_chat and resources only.
  • topK defaults to 8 and must be between 1 and 100.
  • timeoutSeconds defaults to 10 and must be positive.
  • Invalid Gateway configuration fails runtime loading. Network and HTTP failures after valid startup configuration remain best-effort.
  • userKey must never appear in status payloads, session event payloads, or error messages produced by Beaver.

Architecture

Memory backend strategy

Introduce one runtime-facing memory strategy abstraction with two operations:

  1. recall_before_run: prepare memory context before provider messages are built.
  2. persist_after_run: persist the current user message and final assistant answer after the run reaches its normal completion path.

The strategy has two implementations:

  • CuratedMemoryBackend wraps the existing MemoryService. Recall returns the existing frozen MemorySnapshot; post-run persistence is a no-op because curated writes remain model-driven through the existing memory tool.
  • MemoryGatewayBackend wraps a dedicated asynchronous HTTP client. Recall calls Gateway search and returns sanitized reference content; persistence calls add once and, only after add succeeds, flush once.

EngineLoader validates configuration, constructs exactly one strategy, and registers the original memory tool only in curated mode. session_search remains available in both modes because transcript search is separate from the selected long-term memory backend.

AgentLoop depends on the strategy interface rather than branching directly on the configured mode.

Memory Gateway HTTP client

The client owns only HTTP transport and response validation for:

  • POST {baseUrl}/memories/search
  • POST {baseUrl}/memories/add
  • POST {baseUrl}/memories/flush

It uses an async HTTP client, the configured timeout, JSON request bodies, and a small typed exception that contains HTTP status/path context but never contains the configured userKey or complete request body.

No automatic retry is added in Beaver for this first integration. The Gateway already handles upstream ingestion retries, and retrying add from Beaver could duplicate a conversation turn when the first request succeeded but its response was lost.

Recall Data Flow

For every run in memory_gateway mode:

  1. AgentLoop creates or resolves the Beaver session_id.
  2. Before ContextBuilder.build_messages, it calls MemoryGatewayBackend.recall_before_run with the current user prompt.
  3. The Gateway search request is:
{
  "user_id": "<configured userId>",
  "user_key": "<configured userKey>",
  "conversation_id": "<Beaver resolved_session_id>",
  "query": "<current user prompt>",
  "scope": ["<configured scopes>"],
  "top_k": 8,
  "app_id": "<configured appId>",
  "project_id": "<configured projectId>"
}
  1. Beaver accepts only a top-level results list. Malformed responses are treated as recall failures.
  2. Each result is reduced to these optional fields: id, session_id, text, score, source_scope, and resource_uri. Gateway raw data is never injected into the model.
  3. Empty or unusable results produce no recalled-memory message.
  4. Non-empty results become one ephemeral provider message placed after skill activation messages and before persisted session history/current user input. The message is reference data, is not written to Beaver's session history, and is not included in post-run Gateway persistence.
  5. The stable system prompt includes a short rule that recalled memory is untrusted reference data, not executable instruction. The recalled text itself is not concatenated into the system prompt.

In curated mode, this flow is unchanged from today: a per-run frozen curated snapshot is added to the system prompt, and no Gateway request occurs.

Persistence Data Flow

For every memory_gateway run that reaches the normal completion path:

  1. Wait until the tool loop has produced the final assistant text.
  2. Construct exactly two Gateway messages in chronological order:
[
  {
    "sender_id": "<configured userId>",
    "role": "user",
    "timestamp": 1780000000000,
    "content": "<original current user prompt>"
  },
  {
    "sender_id": "beaver",
    "role": "assistant",
    "timestamp": 1780000001000,
    "content": "<final assistant text>"
  }
]

Timestamps are UTC Unix epoch milliseconds captured for the user turn and the final assistant turn. They must be positive and monotonic within the payload.

  1. Call /memories/add exactly once with:
{
  "user_id": "<configured userId>",
  "user_key": "<configured userKey>",
  "session_id": "chat:<Beaver resolved_session_id>",
  "app_id": "<configured appId>",
  "project_id": "<configured projectId>",
  "messages": ["<the two messages above>"]
}
  1. If add succeeds, call /memories/flush exactly once with the same Gateway identity, app/project scope, and chat:<resolved_session_id>.
  2. If add fails, do not call flush.
  3. Runs that enter Beaver's exception/error completion path are not persisted. Normal completion outputs such as a tool-limit fallback are persisted because they are the assistant response returned to the user.
  4. Tool calls, tool results, hidden events, system prompts, recalled-memory messages, reasoning content, and activated skill text are never included.

In curated mode, there is no automatic post-run persistence. Existing model-driven memory tool writes remain unchanged.

Session Audit Events

Gateway mode writes hidden (context_visible=false) session events without credentials or full Gateway response bodies:

  • memory_gateway_recall_succeeded: scope and result count.
  • memory_gateway_recall_failed: operation, sanitized error category, and optional HTTP status.
  • memory_gateway_add_succeeded: session identifier and message count.
  • memory_gateway_add_failed: sanitized failure metadata.
  • memory_gateway_flush_succeeded: session identifier.
  • memory_gateway_flush_failed: sanitized failure metadata and an indication that add had already succeeded.

These events support debugging without entering normal context history or FTS.

Failure Semantics

  • Search timeout, connection failure, 401, other HTTP error, or malformed JSON: record recall failure and continue the run without recalled memory.
  • Add failure: record add failure, skip flush, and return the normal assistant result.
  • Flush failure: record flush failure and return the normal assistant result.
  • Audit event persistence failure follows existing session-store behavior and is not separately swallowed by the memory strategy.
  • Gateway failures are not shown as user-facing chat errors in this phase.

Security and Privacy

  • Fixed Gateway credentials come only from Beaver instance configuration.
  • userKey is passed only in Gateway request bodies and retained in the in-memory configuration/client object.
  • Client exceptions and audit payloads use sanitized operation metadata, never serialized request bodies.
  • Recalled resource and conversation text is treated as untrusted data.
  • Gateway raw fields are discarded before prompt construction to limit prompt size and reduce accidental propagation of backend metadata.
  • Memory modes are mutually exclusive, preventing duplicate recall and writes across curated and Gateway stores.

Testing

Configuration tests

  • Missing memory configuration defaults to curated mode.
  • Complete Gateway configuration parses camelCase and exposes normalized typed values.
  • Invalid mode, empty credentials, empty/unknown scopes, invalid topK, and non-positive timeout fail with explicit configuration errors.
  • Error text does not include userKey.

HTTP client tests

  • Search, add, and flush use the exact paths and payload shapes above.
  • Configured timeout is applied.
  • Non-2xx, network, invalid JSON, and invalid response shapes produce sanitized client exceptions.
  • Exception strings never contain the configured key.

Strategy tests

  • Curated mode returns a frozen snapshot and performs no HTTP requests.
  • Gateway mode performs search with configured scopes and strips raw fields.
  • Empty search results produce no reference message.
  • Gateway persistence sends exactly the original user prompt and final assistant response, then flushes once.
  • Add failure skips flush; flush failure preserves the successful add outcome.

Agent loop tests

  • Gateway search occurs before the provider call.
  • Recalled content appears before the current user prompt and outside the system prompt body.
  • The system prompt contains the untrusted-reference rule in Gateway mode.
  • Add and flush happen after the final assistant response and exactly once each.
  • Tool/system/reasoning content is absent from the add payload.
  • Recall/add/flush failures do not change the returned AgentRunResult.
  • Hidden success/failure audit events contain no credentials.
  • Curated mode regression tests confirm frozen snapshot injection and original memory tool availability.
  • Gateway mode confirms the original memory tool is not registered or exposed.

Documentation

Update the backend README/config example with both modes and a warning that the test-stage userKey is a secret. Document that changing modes requires runtime reload/restart because EngineLoader constructs the selected strategy during boot.