chore: initialize EverOS 1.0.0

md-first memory extraction framework for AI agents. Markdown is the single source of truth; SQLite holds state and LanceDB provides the rebuildable vector + BM25 + scalar index. The codebase follows a single-direction DDD layering (entrypoints -> service -> memory -> infra, with component / core / config cross-cutting) enforced by import-linter. Engineering surface: - Coding conventions in .claude/rules/ (path-scoped) and workflows in .claude/skills/ (/commit, /new-branch, /pr). - GitHub Actions CI runs make lint + test + integration; pre-commit mirrors the gates locally (ruff, hygiene hooks, gitlint commit-msg). - Commit messages follow Conventional Commits, enforced by gitlint. - make lint also enforces datetime two-zone discipline and OpenAPI drift.
2026-06-05 22:35:51 +08:00
commit 518b8eca85
636 changed files with 160553 additions and 0 deletions
--- a/scripts/e2e_memorize/README.md
+++ b/scripts/e2e_memorize/README.md
@ -0,0 +1,123 @@
+# End-to-end memorize test
+
+In-process driver that pushes a realistic fixture through `service.memorize`,
+batching by 6 messages per `/add` call and then `/flush` at the end.
+
+## What's here
+
+| File | Purpose |
+|---|---|
+| `fixtures/chat_session.json` | 22 messages · 3 topic shifts · multi-user (Alice → Bob) — chat-mode fixture |
+| `fixtures/agent_session.json` | 21 items · 2 task threads · interleaved `tool_calls` / `tool` results — agent-mode fixture |
+| `run.py` | In-process runner (no HTTP) |
+
+## Prereqs
+
+1. **LLM client configured** in `.env`:
+   - `EVEROS_LLM__API_KEY=...`
+   - `EVEROS_LLM__BASE_URL=...` (OpenAI-compatible)
+   - `EVEROS_LLM__MODEL=...` (defaults to `gpt-4o-mini`)
+   - Without these, the boundary stage logs `memorize_no_llm_client` and skips the run.
+2. **Memory root**: defaults to `~/.everos`; override with `EVEROS_MEMORY__ROOT=...`.
+3. **Mode** is read from `settings.memorize.mode` (toml/env) before the first `memorize()` call.
+
+## Run
+
+```bash
+# Chat mode — boundary uses everalgo.boundary.detect_boundaries
+EVEROS_MEMORIZE__MODE=chat uv run python scripts/e2e_memorize/run.py \
+    --fixture scripts/e2e_memorize/fixtures/chat_session.json \
+    --expected-mode chat
+
+# Agent mode — boundary uses everalgo.agent_memory.AgentBoundaryDetector
+# (filter→detect→remap; tool items preserved in cells)
+EVEROS_MEMORIZE__MODE=agent uv run python scripts/e2e_memorize/run.py \
+    --fixture scripts/e2e_memorize/fixtures/agent_session.json \
+    --expected-mode agent
+
+# Dry run (print batch plan, no LLM calls)
+uv run python scripts/e2e_memorize/run.py \
+    --fixture scripts/e2e_memorize/fixtures/chat_session.json --dry-run
+```
+
+## What to verify after a run
+
+### 1. Console output
+
+Each batch prints `status=` (`accumulated` while buffering, `extracted` when
+cells got cut). Final `flush` should be `extracted` if any cell remained
+in the tail. The trailing file walker lists md / sqlite files modified
+in the last 10 minutes.
+
+### 2. Episode md (sync — 4A)
+
+```
+~/.everos/users/<owner_id>/episodes/episode-YYYY-MM-DD.md
+```
+
+- Chat fixture: 2 owners (`u_alice`, `u_bob`) — expect Episodes split into
+  ~3-4 cells aligned with topic shifts (Python bug → weekend ramen → Q3
+  review → SRE handoff/ramen wrap).
+- Agent fixture: 1 user (`u_alice`) — expect ~2 Episodes aligned with the
+  two task threads (latency rollback → DB index fix).
+
+### 3. SQLite memcell rows
+
+```bash
+sqlite3 ~/.everos/.index/sqlite/system.db \
+    "select memcell_id, track, owner_id, owner_type, json_array_length(sender_ids_json) as senders
+     from memcell order by timestamp"
+```
+
+- Chat run: rows with `track=user_memory`, `owner_type=user`.
+- Agent run: parallel rows for both tracks (`user_memory` **and**
+  `agent_memory`) since agent mode dispatches both pipelines.
+
+### 4. Unprocessed buffer
+
+```bash
+sqlite3 ~/.everos/.index/sqlite/system.db \
+    "select session_id, count(*) from unprocessed_buffer
+     where track='memorize' group by session_id"
+```
+
+After `flush` the buffer should be empty for the test session.
+
+### 5. OME async output (only if subscribers exist)
+
+- `users/<owner>/atomic_facts/atomic_fact-YYYY-MM-DD.md` (always; `extract_atomic_facts` is registered)
+- `users/<owner>/foresights/foresight-YYYY-MM-DD.md` (always; `extract_foresight` is registered)
+- `agents/<agent>/agent_cases/agent_case-YYYY-MM-DD.md` (**only after `extract_agent_cases` strategy is written + registered** — currently absent, the emit is a no-op)
+
+### 6. Reset between runs
+
+The fixture's session_id is randomised per invocation, so previous runs
+don't pollute the new one. To wipe everything:
+
+```bash
+rm -rf ~/.everos/users ~/.everos/agents ~/.everos/.index/sqlite/system.db
+```
+
+## Boundary expectations cheat sheet
+
+### Chat fixture topic shifts (timestamps ms)
+
+| Range | Topic |
+|---|---|
+| msgs 1-6  (`1747396800–1747397010`) | Python KeyError debugging |
+| msgs 7-12 (`1747400400–1747400610`) | Weekend ramen plans |
+| msgs 13-16 (`1747407600–1747407720`) | Q3 revenue review meeting prep |
+| msgs 17-22 (`1747411200–1747411410`) | Bob joins, SRE handoff + ramen + Q3 deck deadline |
+
+Boundary detector should cut on topic gaps; 3 cuts → 4 cells is the most likely outcome.
+
+### Agent fixture task threads
+
+| Range | Task |
+|---|---|
+| items 1-13 (`1747396800–1747397140`) | API latency spike → identify keepalive pool regression → rollback |
+| items 14-21 (`1747400400–1747400720`) | DB connection pool exhaustion → find unindexed query → CREATE INDEX CONCURRENTLY |
+
+Boundary detector should cut between item 13 and item 14 (timestamp jump
+~55 minutes, topic flip). Tool items inside each cell stay attached to
+their initiating chat turn.
--- a/scripts/e2e_memorize/fixtures/agent_session.json
+++ b/scripts/e2e_memorize/fixtures/agent_session.json
@ -0,0 +1,33 @@
+{
+  "_doc": "Agent-mode end-to-end fixture. 23 items across 2 task threads: (1) production latency diagnosis with rollback, (2) DB connection-pool exhaustion investigation. Includes ToolCallRequest + ToolCallResult interleaved with user/assistant text turns, parallel tool calls, one assistant turn with empty content (only tool_calls), one extended chain-of-thought reasoning, and one tool failure + retry in task 2. Run with EVEROS_MEMORIZE__MODE=agent. Expected boundary detector cuts ~2 cells aligned with the two task threads; tool items fold into the cell containing the preceding chat turn.",
+  "session_id_hint": "e2e_agent",
+  "messages": [
+    {"sender_id": "u_alice", "sender_name": "Alice",   "role": "user",      "timestamp": 1747396800000, "content": "Production API p99 latency spiked from 280ms to 2.5s starting around 21:00 UTC. Customer reports rolling in. Can you investigate?"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396820000, "content": "I'll pull the recent deploys and check for correlation with the spike timestamp.", "tool_calls": [{"id": "call_001", "type": "function", "function": {"name": "list_recent_deploys", "arguments": "{\"hours_back\": 4, \"service\": \"api-gateway\"}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747396825000, "tool_call_id": "call_001", "content": "[{\"deploy_id\": \"d-9f3a\", \"service\": \"api-gateway\", \"deployed_at\": \"2026-05-16T20:47:12Z\", \"author\": \"bob\", \"commit_sha\": \"a1b2c3d\", \"diff_summary\": \"refactor: switch upstream client to keepalive pool\"}, {\"deploy_id\": \"d-9f29\", \"service\": \"api-gateway\", \"deployed_at\": \"2026-05-16T15:12:03Z\", \"author\": \"carol\", \"commit_sha\": \"e4f5g6h\", \"diff_summary\": \"chore: bump otel-sdk to 1.31\"}]"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396870000, "content": "", "tool_calls": [{"id": "call_002", "type": "function", "function": {"name": "get_deploy_diff", "arguments": "{\"deploy_id\": \"d-9f3a\", \"files\": [\"upstream/client.go\", \"config/pool.yaml\"]}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747396880000, "tool_call_id": "call_002", "content": "config/pool.yaml: max_idle_conns_per_host changed from 32 → 4. upstream/client.go: new transport with IdleConnTimeout=10s (was 90s). The default Go transport now reuses far fewer connections."},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396930000, "content": "Two changes in d-9f3a stand out, and the interaction between them likely explains the cliff. First, max_idle_conns_per_host went from 32 down to 4 — that alone caps how many warm sockets the upstream client can pin per backend. Second, IdleConnTimeout dropped from 90s to 10s, so even the few warm sockets we do keep get torn down between requests during normal traffic gaps. Combined effect: under sustained load p99 of ~250rps × ~6 backends, the pool is permanently saturated and every spillover request pays a TCP+TLS handshake (~800ms-1.5s on the affected region) instead of reusing a connection. That maps directly to the 280ms → 2.5s latency jump. Before I file a rollback recommendation I want to confirm the prediction against live pool metrics — if reuse_ratio collapsed and new-connection rate is up >10x, that's enough signal.", "tool_calls": [{"id": "call_003", "type": "function", "function": {"name": "query_pool_metrics", "arguments": "{\"service\": \"api-gateway\", \"window_min\": 15}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747396938000, "tool_call_id": "call_003", "content": "{\"idle_conns_avg\": 1.2, \"new_conn_rate_per_s\": 184, \"new_conn_rate_baseline\": 6, \"reuse_ratio\": 0.04, \"reuse_ratio_baseline\": 0.91}"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396980000, "content": "Confirmed: reuse_ratio collapsed from 0.91 to 0.04, new-connection rate is 30x baseline. This is the root cause. Recommend rollback of d-9f3a — the keepalive refactor needs a config review before re-deploying. Want me to initiate rollback?"},
+    {"sender_id": "u_alice", "sender_name": "Alice",   "role": "user",      "timestamp": 1747397040000, "content": "Yes, roll it back. Page Bob async so he sees the rollback when he comes online tomorrow — no need to wake him up."},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747397070000, "content": "Initiating rollback to d-9f29 and queuing a non-paging Slack DM for Bob.", "tool_calls": [{"id": "call_004", "type": "function", "function": {"name": "rollback_deploy", "arguments": "{\"service\": \"api-gateway\", \"to_deploy_id\": \"d-9f29\", \"reason\": \"p99 latency regression from keepalive pool refactor (max_idle_conns 32->4)\"}"}}, {"id": "call_005", "type": "function", "function": {"name": "slack_dm", "arguments": "{\"user\": \"bob\", \"channel\": \"async\", \"message\": \"Rolled back api-gateway d-9f3a at 21:04 UTC — keepalive pool change caused p99 spike. Logs in #incident-2026-05-16.\"}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747397115000, "tool_call_id": "call_004", "content": "{\"status\": \"ok\", \"rolled_back_to\": \"d-9f29\", \"completed_at\": \"2026-05-16T21:05:01Z\", \"verification\": {\"p99_5min_after\": \"312ms\", \"reuse_ratio_after\": 0.88}}"},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747397116000, "tool_call_id": "call_005", "content": "{\"status\": \"queued\", \"message_id\": \"slack_msg_xyz\"}"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747397140000, "content": "Rollback complete. p99 dropped back to 312ms within 5 minutes, reuse ratio recovered to 0.88. Slack DM queued for Bob. Incident-2026-05-16 is the channel I logged it to."},
+
+    {"sender_id": "u_alice", "sender_name": "Alice",   "role": "user",      "timestamp": 1747400400000, "content": "Different issue — we're seeing connection pool exhaustion warnings on the auth-service Postgres pool the last hour. Can you check whether it's a leak or just a sustained traffic spike?"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400420000, "content": "Pulling pool stats and recent traffic shape.", "tool_calls": [{"id": "call_006", "type": "function", "function": {"name": "query_db_pool", "arguments": "{\"service\": \"auth-service\", \"window_min\": 60}"}}, {"id": "call_007", "type": "function", "function": {"name": "query_request_rate", "arguments": "{\"service\": \"auth-service\", \"window_min\": 60}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747400430000, "tool_call_id": "call_006", "content": "{\"max_conns\": 50, \"in_use_now\": 49, \"in_use_p99_1h\": 50, \"idle_p99_1h\": 1, \"wait_queue_depth_p99\": 18, \"long_running_queries_above_30s\": 12}"},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747400431000, "tool_call_id": "call_007", "content": "{\"rps_now\": 142, \"rps_baseline\": 138, \"rps_p99_1h\": 151}"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400470000, "content": "Traffic is flat (rps 142 vs baseline 138), but 12 long-running queries above 30s and wait queue depth p99 of 18. This is a leak / unbounded query, not a traffic surge. Let me find the offending queries.", "tool_calls": [{"id": "call_008", "type": "function", "function": {"name": "list_long_running_queries", "arguments": "{\"service\": \"auth-service\", \"min_duration_s\": 30, \"window_min\": 60}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747400478000, "tool_call_id": "call_008", "content": "{\"error\": \"timeout\", \"detail\": \"query against pg_stat_activity exceeded 15s while pool was saturated; retry with a narrower window\"}"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400490000, "content": "Tool itself is starved by the same pool exhaustion — fair enough. Retrying against the last 10 minutes only so pg_stat_activity has a tighter slice to scan.", "tool_calls": [{"id": "call_008b", "type": "function", "function": {"name": "list_long_running_queries", "arguments": "{\"service\": \"auth-service\", \"min_duration_s\": 30, \"window_min\": 10}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747400499000, "tool_call_id": "call_008b", "content": "[{\"query_sha\": \"q-71f\", \"sql_truncated\": \"SELECT u.* FROM users u JOIN sessions s ON s.user_id = u.id WHERE s.expired_at > now() - interval '30 day'\", \"avg_duration_s\": 84, \"calls_last_hour\": 11, \"missing_index\": \"sessions.expired_at\"}]"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400510000, "content": "Single offender — query q-71f is doing a full sessions-table scan because sessions.expired_at is unindexed. Each call takes ~84s and holds a connection. 11 calls in the last hour saturate the pool. Recommend: (1) add the index now (concurrent), (2) add a query timeout cap on this endpoint long-term. Want me to add the index?"},
+    {"sender_id": "u_alice", "sender_name": "Alice",   "role": "user",      "timestamp": 1747400560000, "content": "Yes please add the index. Use CREATE INDEX CONCURRENTLY so we don't block writes. Then post in #incident-2026-05-16 with the runbook entry."},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400590000, "content": "On it. Creating the concurrent index and posting the runbook update.", "tool_calls": [{"id": "call_009", "type": "function", "function": {"name": "execute_sql", "arguments": "{\"db\": \"auth-service\", \"sql\": \"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_sessions_expired_at ON sessions(expired_at)\", \"timeout_s\": 600}"}}]},
+    {"sender_id": "tool_runner", "sender_name": null, "role": "tool",      "timestamp": 1747400650000, "tool_call_id": "call_009", "content": "{\"status\": \"ok\", \"duration_s\": 47, \"rows_indexed\": 1841229}"},
+    {"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400720000, "content": "Index created in 47s over 1.84M rows. Query q-71f should drop from ~84s to sub-second. Pool will drain in the next minute as long-running queries finish. Posted runbook to #incident-2026-05-16."}
+  ]
+}
--- a/scripts/e2e_memorize/fixtures/chat_session.json
+++ b/scripts/e2e_memorize/fixtures/chat_session.json
@ -0,0 +1,31 @@
+{
+  "_doc": "Chat-mode end-to-end fixture. 22 messages across 3 topics + a multi-user handoff. Run with EVEROS_MEMORIZE__MODE=chat. Expected boundary detector cuts ~3-4 cells (Python bug → weekend plans → cross-team meeting).",
+  "session_id_hint": "e2e_chat",
+  "messages": [
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747396800000, "content": "I'm getting a weird KeyError on a dict comprehension when the source list has None entries. The traceback points at the dict literal, not the comprehension itself."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747396830000, "content": "Can you share a minimal repro? Something like the comprehension expression plus 2-3 sample input items. I want to see whether the None is on the key side or the value side."},
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747396890000, "content": "Sure: `{r['id']: r['name'] for r in rows}` where rows comes from a left-join and sometimes the right side returns None. So r is None for those rows."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747396920000, "content": "Got it. The None is the row itself, not a field. `None['id']` raises TypeError actually, but if you preprocess the rows to dicts upstream, the missing key on a partial dict would be KeyError. Either way, filter before the comprehension: `{r['id']: r['name'] for r in rows if r is not None and 'id' in r}`."},
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747396980000, "content": "That worked, thanks. Also occasionally the SQL itself wraps the row in a tuple instead of dict — pyodbc behaviour. Got bitten by it last quarter too."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747397010000, "content": "pyodbc returns Row objects by default; you'd need `cursor.execute(...).fetchall()` then iterate. Or set `cursor.row_factory = lambda c, r: dict(zip([col[0] for col in c.description], r))` to normalise upstream."},
+
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747400400000, "content": "Switching topics — any weekend plans? I'm thinking of trying the new ramen place on Carmine Street, supposedly the tonkotsu is decent."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747400430000, "content": "Tonkotsu sounds good. If you go Saturday before 1pm you'll skip the queue. There's also a coffee place two doors down that does decent cold brew if you want a pre-meal stop."},
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747400490000, "content": "Cold brew yes. Bob mentioned wanting to come too, do you think the ramen place takes reservations for three?"},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747400520000, "content": "Their site says walk-in only on weekends. For three people the wait is usually 25-40 minutes around noon — manageable but worth knowing."},
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747400580000, "content": "OK we'll show up at 11:30 then. Will send Bob the address."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747400610000, "content": "Sensible. Reminder it's cash-only for the first 6 months apparently — saw that on a Reddit thread last week."},
+
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747407600000, "content": "One more thing — the Q3 revenue review is next Thursday at 2pm, who from engineering needs to attend? I want to make sure platform team is represented since we're presenting the infra cost reductions."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747407630000, "content": "Standard attendees are eng director + relevant TL. For platform cost reductions, your eng lead Carol should be there since she owns the runbook. Add Dan as a backup in case Carol's on PTO."},
+    {"sender_id": "u_alice", "sender_name": "Alice",     "role": "user",      "timestamp": 1747407690000, "content": "Good call. Carol mentioned PTO that week actually. Let me ping Dan to confirm he can do the slides handoff."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747407720000, "content": "If Dan has bandwidth, also ask him to pre-share the deck Monday so finance has time to review the cost numbers before the meeting. Saves a lot of meeting time clarifying figures."},
+
+    {"sender_id": "u_bob", "sender_name": "Bob",         "role": "user",      "timestamp": 1747411200000, "content": "Hey, Alice forwarded me about the ramen plan and the Thursday meeting. I can do both but the Thursday slot conflicts with the SRE oncall handoff at 1:45 — anyone we can swap with?"},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747411230000, "content": "Erin usually does the Wednesday → Thursday handoff for SRE. She owes you one from last cycle. Ask her to take the 1:45 slot and you can ack the comms async at 1:30."},
+    {"sender_id": "u_bob", "sender_name": "Bob",         "role": "user",      "timestamp": 1747411290000, "content": "Erin's good. I'll ping her on Slack today. About the ramen — Alice said 11:30 Saturday, that works for me, walking distance from my place."},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747411320000, "content": "Great. Just remind Alice it's cash-only for now — she'll forget otherwise based on past experience."},
+    {"sender_id": "u_bob", "sender_name": "Bob",         "role": "user",      "timestamp": 1747411380000, "content": "Ha, true. I'll grab some cash on the way. Also can you remind me about the Q3 revenue deck — when does it need to be done by, EOD Tuesday?"},
+    {"sender_id": "assistant", "sender_name": "Bot",     "role": "assistant", "timestamp": 1747411410000, "content": "Yes — Dan agreed to pre-share Monday, so practical deadline is EOD Tuesday for finance to review Wednesday. After Wednesday review they may ask for last-minute number tweaks; build slack into Thursday morning for that."}
+  ]
+}
--- a/scripts/e2e_memorize/run.py
+++ b/scripts/e2e_memorize/run.py
@ -0,0 +1,186 @@
+"""End-to-end memorize runner — in-process call into ``service.memorize``.
+
+Calls ``service.memorize.memorize()`` directly (not via HTTP) so this works
+without ``everos server start``. Drives a fixture through ``/add`` in
+N-sized batches, then triggers ``/flush`` to drain the tail.
+
+Reads ``settings.memorize.mode`` from current env / toml — set the mode via
+``EVEROS_MEMORIZE__MODE=chat|agent`` *before* invoking this script (the
+config is cached after the first ``load_settings()`` call).
+
+Usage:
+    EVEROS_MEMORIZE__MODE=chat  uv run python scripts/e2e_memorize/run.py \\
+        --fixture scripts/e2e_memorize/fixtures/chat_session.json
+
+    EVEROS_MEMORIZE__MODE=agent uv run python scripts/e2e_memorize/run.py \\
+        --fixture scripts/e2e_memorize/fixtures/agent_session.json --batch-size 5
+
+After it finishes, check:
+    ~/.everos/users/<owner>/episodes/<date>.md      (written sync by 4A)
+    ~/.everos/.index/sqlite/system.db memcell rows  (written by boundary)
+    ~/.everos/agents/<agent>/agent_cases/<date>.md  (written async by OME
+        - only if a consumer of AgentMemCellWritten is registered)
+"""
+
+from __future__ import annotations
+
+import argparse
+import asyncio
+import json
+import sys
+import time
+import uuid
+from pathlib import Path
+
+from sqlmodel import SQLModel
+
+from everos.component.llm import get_llm_client
+from everos.config import load_settings
+from everos.core.persistence import MemoryRoot
+from everos.infra.persistence.sqlite import dispose_engine, get_engine
+from everos.service.memorize import _get_engine as _get_ome_engine
+from everos.service.memorize import memorize
+
+
+def _chunks(items: list[dict], n: int) -> list[list[dict]]:
+    return [items[i : i + n] for i in range(0, len(items), n)]
+
+
+def _print_header(mode: str, fixture_path: Path, session_id: str) -> None:
+    print("=" * 72)
+    print(f"  everos e2e memorize  ·  mode={mode}")
+    print(f"  fixture     : {fixture_path.name}")
+    print(f"  session_id  : {session_id}")
+    print(f"  memory root : {MemoryRoot.default().root}")
+    llm_state = "<configured>" if get_llm_client() else "<None — pipeline will skip>"
+    print(f"  llm_client  : {llm_state}")
+    print("=" * 72)
+
+
+def _list_written_files(session_id: str, mode: str) -> None:
+    """Walk memory root and print files touched in this run."""
+    root = MemoryRoot.default().root
+    cutoff = time.time() - 600  # files modified in the last 10 min
+    print()
+    print("─── files modified within the last 10 minutes under memory root ───")
+    interesting = ("users", "agents", "knowledge", ".index")
+    for sub in interesting:
+        base = Path(root) / sub
+        if not base.is_dir():
+            continue
+        for p in sorted(base.rglob("*")):
+            if p.is_file() and p.stat().st_mtime >= cutoff:
+                rel = p.relative_to(root)
+                size = p.stat().st_size
+                print(f"  {rel}  ({size}b)")
+    print()
+    print(f"Tip: grep '{session_id}' in any episode md to find this run's entries.")
+
+
+async def _setup() -> None:
+    """Create sqlite schema + start OME engine — the bits the HTTP lifespan
+    normally handles. LanceDB is not needed for the memorize sync path
+    (only cascade reads it), so we skip it.
+    """
+    engine = get_engine()
+    async with engine.begin() as conn:
+        await conn.run_sync(SQLModel.metadata.create_all)
+    ome = _get_ome_engine()
+    await ome.start()
+
+
+async def _teardown() -> None:
+    ome = _get_ome_engine()
+    await ome.stop()
+    await dispose_engine()
+
+
+async def _run(args: argparse.Namespace) -> None:
+    settings = load_settings()
+    mode = settings.memorize.mode
+    if args.expected_mode and args.expected_mode != mode:
+        print(
+            f"!! expected mode={args.expected_mode!r} but "
+            f"settings.memorize.mode={mode!r}. "
+            "Set EVEROS_MEMORIZE__MODE before launching."
+        )
+        sys.exit(2)
+
+    fixture_path = Path(args.fixture).resolve()  # noqa: ASYNC240
+    fixture = json.loads(fixture_path.read_text())  # noqa: ASYNC230
+    messages: list[dict] = fixture["messages"]
+    session_id = f"{fixture.get('session_id_hint', 'e2e')}_{uuid.uuid4().hex[:8]}"
+
+    _print_header(mode, fixture_path, session_id)
+
+    if args.dry_run:
+        for i, batch in enumerate(_chunks(messages, args.batch_size), start=1):
+            print(
+                f"[dry] batch {i}: {len(batch)} msgs "
+                f"(first content: {batch[0]['content'][:60]!r})"
+            )
+        print("[dry] would flush at the end")
+        return
+
+    await _setup()
+    try:
+        batches = _chunks(messages, args.batch_size)
+        for i, batch in enumerate(batches, start=1):
+            result = await memorize(
+                {"session_id": session_id, "messages": batch}, is_final=False
+            )
+            print(
+                f"add batch {i}/{len(batches)} ({len(batch)} msgs)  →  "
+                f"status={result.status:<11s}  message_count={result.message_count}"
+            )
+
+        print()
+        print("flushing residual tail...")
+        flush_result = await memorize(
+            {"session_id": session_id, "messages": []}, is_final=True
+        )
+        print(
+            f"flush  →  status={flush_result.status:<11s}  "
+            f"message_count={flush_result.message_count}"
+        )
+
+        # OME strategies are fire-and-forget; each cell fires 2 strategies
+        # (atomic_facts + foresight), each ~5-10s on a real LLM. Sleep long
+        # enough for ~8-10 invocations to finish before engine.stop() drains
+        # the scheduler — otherwise APS cancels in-flight LLM calls.
+        await asyncio.sleep(30)
+
+        _list_written_files(session_id, mode)
+    finally:
+        await _teardown()
+
+
+def main() -> None:
+    parser = argparse.ArgumentParser(description=__doc__)
+    parser.add_argument(
+        "--fixture",
+        required=True,
+        help="path to fixture JSON (e.g. fixtures/chat_session.json)",
+    )
+    parser.add_argument(
+        "--batch-size",
+        type=int,
+        default=6,
+        help="how many messages per /add call (default 6 — 20 msgs across ~4 batches)",
+    )
+    parser.add_argument(
+        "--expected-mode",
+        choices=["chat", "agent"],
+        help="sanity check: fail fast if settings.memorize.mode mismatches",
+    )
+    parser.add_argument(
+        "--dry-run",
+        action="store_true",
+        help="print the batch plan without calling memorize",
+    )
+    args = parser.parse_args()
+    asyncio.run(_run(args))
+
+
+if __name__ == "__main__":
+    main()