Files
EverOS/scripts/e2e_memorize/README.md
Elliot Chen 518b8eca85 chore: initialize EverOS 1.0.0
md-first memory extraction framework for AI agents.

Markdown is the single source of truth; SQLite holds state and LanceDB
provides the rebuildable vector + BM25 + scalar index. The codebase follows
a single-direction DDD layering (entrypoints -> service -> memory -> infra,
with component / core / config cross-cutting) enforced by import-linter.

Engineering surface:
- Coding conventions in .claude/rules/ (path-scoped) and workflows in
  .claude/skills/ (/commit, /new-branch, /pr).
- GitHub Actions CI runs make lint + test + integration; pre-commit mirrors
  the gates locally (ruff, hygiene hooks, gitlint commit-msg).
- Commit messages follow Conventional Commits, enforced by gitlint.
- make lint also enforces datetime two-zone discipline and OpenAPI drift.
2026-06-06 07:33:17 +08:00

4.6 KiB
Raw Blame History

End-to-end memorize test

In-process driver that pushes a realistic fixture through service.memorize, batching by 6 messages per /add call and then /flush at the end.

What's here

File Purpose
fixtures/chat_session.json 22 messages · 3 topic shifts · multi-user (Alice → Bob) — chat-mode fixture
fixtures/agent_session.json 21 items · 2 task threads · interleaved tool_calls / tool results — agent-mode fixture
run.py In-process runner (no HTTP)

Prereqs

  1. LLM client configured in .env:
    • EVEROS_LLM__API_KEY=...
    • EVEROS_LLM__BASE_URL=... (OpenAI-compatible)
    • EVEROS_LLM__MODEL=... (defaults to gpt-4o-mini)
    • Without these, the boundary stage logs memorize_no_llm_client and skips the run.
  2. Memory root: defaults to ~/.everos; override with EVEROS_MEMORY__ROOT=....
  3. Mode is read from settings.memorize.mode (toml/env) before the first memorize() call.

Run

# Chat mode — boundary uses everalgo.boundary.detect_boundaries
EVEROS_MEMORIZE__MODE=chat uv run python scripts/e2e_memorize/run.py \
    --fixture scripts/e2e_memorize/fixtures/chat_session.json \
    --expected-mode chat

# Agent mode — boundary uses everalgo.agent_memory.AgentBoundaryDetector
# (filter→detect→remap; tool items preserved in cells)
EVEROS_MEMORIZE__MODE=agent uv run python scripts/e2e_memorize/run.py \
    --fixture scripts/e2e_memorize/fixtures/agent_session.json \
    --expected-mode agent

# Dry run (print batch plan, no LLM calls)
uv run python scripts/e2e_memorize/run.py \
    --fixture scripts/e2e_memorize/fixtures/chat_session.json --dry-run

What to verify after a run

1. Console output

Each batch prints status= (accumulated while buffering, extracted when cells got cut). Final flush should be extracted if any cell remained in the tail. The trailing file walker lists md / sqlite files modified in the last 10 minutes.

2. Episode md (sync — 4A)

~/.everos/users/<owner_id>/episodes/episode-YYYY-MM-DD.md
  • Chat fixture: 2 owners (u_alice, u_bob) — expect Episodes split into ~3-4 cells aligned with topic shifts (Python bug → weekend ramen → Q3 review → SRE handoff/ramen wrap).
  • Agent fixture: 1 user (u_alice) — expect ~2 Episodes aligned with the two task threads (latency rollback → DB index fix).

3. SQLite memcell rows

sqlite3 ~/.everos/.index/sqlite/system.db \
    "select memcell_id, track, owner_id, owner_type, json_array_length(sender_ids_json) as senders
     from memcell order by timestamp"
  • Chat run: rows with track=user_memory, owner_type=user.
  • Agent run: parallel rows for both tracks (user_memory and agent_memory) since agent mode dispatches both pipelines.

4. Unprocessed buffer

sqlite3 ~/.everos/.index/sqlite/system.db \
    "select session_id, count(*) from unprocessed_buffer
     where track='memorize' group by session_id"

After flush the buffer should be empty for the test session.

5. OME async output (only if subscribers exist)

  • users/<owner>/atomic_facts/atomic_fact-YYYY-MM-DD.md (always; extract_atomic_facts is registered)
  • users/<owner>/foresights/foresight-YYYY-MM-DD.md (always; extract_foresight is registered)
  • agents/<agent>/agent_cases/agent_case-YYYY-MM-DD.md (only after extract_agent_cases strategy is written + registered — currently absent, the emit is a no-op)

6. Reset between runs

The fixture's session_id is randomised per invocation, so previous runs don't pollute the new one. To wipe everything:

rm -rf ~/.everos/users ~/.everos/agents ~/.everos/.index/sqlite/system.db

Boundary expectations cheat sheet

Chat fixture topic shifts (timestamps ms)

Range Topic
msgs 1-6 (17473968001747397010) Python KeyError debugging
msgs 7-12 (17474004001747400610) Weekend ramen plans
msgs 13-16 (17474076001747407720) Q3 revenue review meeting prep
msgs 17-22 (17474112001747411410) Bob joins, SRE handoff + ramen + Q3 deck deadline

Boundary detector should cut on topic gaps; 3 cuts → 4 cells is the most likely outcome.

Agent fixture task threads

Range Task
items 1-13 (17473968001747397140) API latency spike → identify keepalive pool regression → rollback
items 14-21 (17474004001747400720) DB connection pool exhaustion → find unindexed query → CREATE INDEX CONCURRENTLY

Boundary detector should cut between item 13 and item 14 (timestamp jump ~55 minutes, topic flip). Tool items inside each cell stay attached to their initiating chat turn.