chore: initialize EverOS 1.0.0

md-first memory extraction framework for AI agents. Markdown is the single source of truth; SQLite holds state and LanceDB provides the rebuildable vector + BM25 + scalar index. The codebase follows a single-direction DDD layering (entrypoints -> service -> memory -> infra, with component / core / config cross-cutting) enforced by import-linter. Engineering surface: - Coding conventions in .claude/rules/ (path-scoped) and workflows in .claude/skills/ (/commit, /new-branch, /pr). - GitHub Actions CI runs make lint + test + integration; pre-commit mirrors the gates locally (ruff, hygiene hooks, gitlint commit-msg). - Commit messages follow Conventional Commits, enforced by gitlint. - make lint also enforces datetime two-zone discipline and OpenAPI drift.
2026-06-05 22:35:51 +08:00
commit 518b8eca85
636 changed files with 160553 additions and 0 deletions
--- a/docs/api.md
+++ b/docs/api.md
--- a/docs/architecture.md
+++ b/docs/architecture.md
@ -0,0 +1,213 @@
+# Architecture
+
+> Companion: [.claude/rules/architecture.md](../.claude/rules/architecture.md) (auto-loaded coding rules)
+
+## DDD layered architecture
+
+```
+┌──────────────────────────────────────────────────────┐
+│  entrypoints/  (Presentation)                         │
+│    cli + api                                          │
+├──────────────────────────────────────────────────────┤
+│  service/      (Application — Use Case orchestration) │
+│    memorize / retrieve / evolve / manage              │
+├──────────────────────────────────────────────────────┤
+│  memory/       (Domain — Business core)               │
+│    models + extract + search + cascade + prompt_slots │
+├──────────────────────────────────────────────────────┤
+│  infra/persistence  (Storage adapters; infra/ may host other adapter types)    │
+│    markdown + sqlite + lancedb                        │
+└──────────────────────────────────────────────────────┘
+
+Cross-cutting (used by all layers, depends on none):
+  component/  ← Injectable providers (LLM / Embedding / config / utils)
+  core/       ← Runtime base (observability / lifespan / context)
+  config/     ← Configuration data (Settings schema + default.toml)
+```
+
+## Dependency direction (single-direction, enforced)
+
+```
+entrypoints → service → memory → infra
+```
+
+| from → to | Allowed? |
+|---|---|
+| entrypoints → service | ✅ |
+| entrypoints → memory / infra | ❌ (must go through service) |
+| service → memory | ✅ |
+| memory → infra | ✅ |
+| memory → service | ❌ |
+| infra → memory | ❌ |
+| infra cross-subpackage (e.g. lancedb → markdown within persistence/) | ❌ (use service to orchestrate) |
+| any → component / core / config | ✅ (cross-cutting) |
+
+Enforced via `import-linter` in CI:
+
+```toml
+[tool.importlinter]
+root_packages = ["everos"]
+
+[[tool.importlinter.contracts]]
+name = "Layered architecture"
+type = "layers"
+layers = [
+    "everos.entrypoints",
+    "everos.service",
+    "everos.memory",
+    "everos.infra",
+]
+```
+
+## Storage three-piece set
+
+```
+┌────────────────────────────────────────────────────────────────┐
+│             md-first storage stack                              │
+└────────────────────────────────────────────────────────────────┘
+
+   ┌──────────────┐   ┌──────────────┐   ┌─────────────────┐
+   │   Markdown   │   │   SQLite     │   │    LanceDB      │
+   │  (truth)     │   │  (state)     │   │  (index)        │
+   ├──────────────┤   ├──────────────┤   ├─────────────────┤
+   │ entries +    │   │ change queue │   │ vector ANN      │
+   │ frontmatter  │   │ + state/LSN  │   │ BM25 (Tantivy)  │
+   │ Git friendly │   │ buffer /     │   │ scalar filter   │
+   │ Obsidian OK  │   │   audit      │   │ multi-modal     │
+   └──────────────┘   └──────────────┘   └─────────────────┘
+          │                  │                    │
+          ▼                  ▼                    ▼
+    memory-root/         .index/sqlite/      .index/lancedb/
+   (truth source)       (system data)       (rebuildable)
+```
+
+## Write path
+
+```
+External message
+       │
+       ▼
+1. service.memorize           (entrypoint of write path)
+       │
+       ▼
+2. memory.extract.pipeline    (calls everalgo)
+       │
+       ▼
+3. infra.persistence.markdown.write       (atomic: tmp + fsync + rename)
+       │  ✅ md write success → return immediately
+       │
+   ┌───┴────┐
+   │        │
+   ▼        ▼
+4a. SQLite   4b. memory.cascade  (async daemon)
+    audit        watches md → diff entries → LanceDB sync
+```
+
+**Key guarantee**: md write is strongly consistent (fsync). LanceDB is eventually consistent. LanceDB unavailability does not block response — changes buffer in the SQLite `md_change_state` queue, replayed on recovery.
+
+## Read path
+
+```
+User query
+   │
+   ▼
+1. service.retrieve
+   │
+   ▼
+2. memory.search.hybrid       single LanceDB query =
+                                BM25 + vector ANN + scalar filter
+   │
+   ▼
+3. (optional) read md         original markdown for context
+   │
+   ▼
+   Return
+```
+
+## Key components
+
+### `memory/extract/`
+
+```
+extract/
+├── ingest/      Standardized message intake + multi-modal parser dispatch
+├── pipeline/    Main extraction pipeline (calls everalgo + dual-track split + writes store)
+└── evolution/   Async memory evolution (event/counter/cron triggers)
+```
+
+### `memory/cascade/`
+
+Daemon that watches markdown changes and syncs to LanceDB:
+
+- inotify / FSEvents file watcher (cross-platform via `watchdog`)
+- 500ms debounce
+- Entry-level diff (added / changed / removed)
+- LanceDB single-transaction update (text + vector columns atomic)
+- LSN-based crash recovery via the SQLite `md_change_state` queue
+
+### `memory/prompt_slots/`
+
+Three-layer prompt overlay:
+
+```
+config/prompt_slots/*.yaml          (Layer 1: defaults, ships with package)
+       ↓
+~/.everos/prompt_slots/*.yaml       (Layer 2: app-level override)
+       ↓
+runtime override                    (Layer 3: per-call override)
+```
+
+everalgo receives PromptSlot as parameter — no hardcoded prompts in algorithm code.
+
+### `core/observability/`
+
+Three-piece observability:
+
+- `metrics/` — Prometheus counter / gauge / histogram + global registry
+- `logging/` — structlog with context processor (trace_id propagation)
+- `tracing/` — OpenTelemetry tracer + span helpers
+
+## Markdown layout
+
+```
+~/.everos/                                  # memory root (default; EVEROS_MEMORY__ROOT)
+└── <app_id>/<project_id>/                  # scope ("default" → default_app/default_project)
+    ├── users/<user_id>/
+    │   ├── user.md                                     # profile (single-file rewrite)
+    │   ├── episodes/episode-<YYYY-MM-DD>.md            # daily-log append
+    │   ├── .atomic_facts/atomic_fact-<YYYY-MM-DD>.md   # hidden, framework-derived
+    │   └── .foresights/foresight-<YYYY-MM-DD>.md       # hidden, framework-derived
+    ├── agents/<agent_id>/
+    │   ├── .cases/agent_case-<YYYY-MM-DD>.md           # hidden, framework-derived
+    │   └── skills/skill_<name>/SKILL.md                # named-dir
+    └── knowledge/                                      # global shared knowledge
+```
+
+System-managed entries (`.index/`, `.tmp/`) and `ome.toml` live directly
+under the memory root.
+Full tree + frontmatter chassis: [storage_layout.md](storage_layout.md) and
+[how-memory-works.md](how-memory-works.md). Frontmatter has 4-tier field
+protection (L1 read-only / L2 system / L3 business / L4 user).
+
+## everalgo boundary
+
+[`everalgo`](https://github.com/EverMind-AI/EverAlgo) is a separate Python library (published as the `everalgo-*` PyPI packages) holding **only memory extraction algorithms**:
+
+- `everalgo.parser` — multi-modal parsing
+- `everalgo.user_memory` — ConvMemCell / Episode / Foresight / AtomicFact / Profile extractors
+- `everalgo.agent_memory` — AgentMemCell / Case / Skill extractors
+- `everalgo.knowledge` — file-to-knowledge
+
+everalgo is:
+
+- **Stateless** — pure functions, no class hierarchy
+- **No I/O** — does not touch md files / LanceDB / SQLite
+- **No prompts inline** — receives `PromptSlot` parameter, project supplies defaults
+
+This boundary lets everalgo be reused across product forms (this open-source build, EverOS Cloud, OpenClaw plugins, etc.).
+
+## Further reading
+
+- [docs/overview.md](overview.md) — vision and scope
+- [docs/engineering.md](engineering.md) — engineering tooling and CI / CD
+- [.claude/rules/architecture.md](../.claude/rules/architecture.md) — short-form rules for Claude Code
--- a/docs/cascade_runbook.md
+++ b/docs/cascade_runbook.md
@ -0,0 +1,271 @@
+# Cascade Runbook
+
+The cascade daemon keeps LanceDB in sync with the markdown files under
+the memory root. Service / entry points only ever write markdown; the
+daemon is the **sole** writer of the LanceDB index. This runbook covers
+the recurring operational questions.
+
+## What runs where
+
+When `everos server start` boots, the FastAPI lifespan wires four
+providers in order:
+
+1. **Metrics** — Prometheus collector.
+2. **SQLite** — system DB + schema (`SQLModel.metadata.create_all`).
+3. **LanceDB** — async connection + schema verification + FTS indexes.
+4. **Cascade** — watcher + scanner + worker, all in-process tasks.
+
+The cascade subsystem itself is three independent loops:
+
+| Loop | Source signal | Effect |
+|---|---|---|
+| Watcher | `watchdog` filesystem events (sync thread) | `md_change_state.upsert` per registered kind |
+| Scanner | Periodic walk (`scan_interval_seconds`, default 30 s) | Same — catches changes the watcher missed |
+| Worker | `claim_pending_batch` polling (default 1 s when idle) | Handler dispatch → LanceDB upsert / delete |
+
+Every loop talks to the same `md_change_state` sqlite table. The
+worker's claim mode (`pending → processing → done/failed`) keeps
+concurrent workers honest.
+
+## Health: `everos cascade status`
+
+```
+queue:
+  pending:                   3
+  done:                      1247
+  failed (retryable=TRUE):   1     (eligible for `cascade fix --apply`)
+  failed (retryable=FALSE):  1     (fix md and re-save to recover)
+lsn:
+  max:           1252
+  last_processed: 1250
+  lag:            2
+```
+
+- `lag > 0` means the worker is behind. Steady state should hover near
+  zero; sustained lag points at a slow handler or a stuck retry.
+- `failed (retryable=FALSE)` is always user-actionable. Cascade will
+  never auto-clear these — they represent malformed md the user must
+  edit.
+
+## Recovering from failures: `everos cascade fix`
+
+`cascade fix` (no flag) lists every failed row. With `--apply`:
+
+1. `UPDATE md_change_state SET status='pending', retry_count=0
+   WHERE status='failed' AND retryable=TRUE` (the partial index
+   `idx_md_change_retryable` makes this O(retryable)).
+2. Drain the worker once so the retry runs synchronously.
+
+Retryable failures cover transient embedding / HTTP errors (5xx, 429,
+network resets) after the inline `MAX_RETRY=3` was exhausted. The
+fix command resets the counter so a working backend gets a clean
+start.
+
+`retryable=FALSE` rows require the user to edit the md (typically a
+YAML frontmatter issue) and re-save; the watcher picks the change up
+naturally.
+
+## One-shot replay: `everos cascade sync [PATH]`
+
+Use this when the watcher missed an event (WSL mount, network share,
+external editor with no inotify) or when you want a deterministic
+flush before, say, a smoke test:
+
+```bash
+everos cascade sync                           # drain everything pending
+everos cascade sync users/u1/episodes/X.md    # re-enqueue + drain
+```
+
+The CLI builds the same `CascadeOrchestrator` as the daemon but only
+calls `sync_once` / `drain_once` — no watcher / scanner background
+task. So it's safe to run in parallel with a live `everos server`.
+
+## Recovery paths
+
+### LanceDB schema drift on startup
+
+`LanceDBLifespanProvider.startup` calls `verify_business_schemas`. If
+an on-disk table has columns the current Pydantic schema does not
+declare (or vice versa), the boot fails with:
+
+```
+LanceDB table 'episode' schema drift: missing=[...], extra=[...].
+The index is rebuildable from md — recover with
+`rm -rf ~/.everos/.index/lancedb` and restart.
+```
+
+This is the documented recovery: delete the index, restart the
+server, the scanner will pick up every md file on its first sweep and
+the worker repopulates LanceDB. Markdown is the source of truth, so
+no data is lost.
+
+### inotify watch-limit exhaustion (Linux)
+
+Default kernel limit is 8 192 watches per user. On a sizeable memory
+root the watcher may silently miss events. Symptoms:
+
+- Scanner catches the file changes but the watcher never logs an
+  event for the same path.
+- `cat /proc/sys/fs/inotify/max_user_watches` is at the limit.
+
+Fix by bumping the kernel parameter:
+
+```bash
+echo fs.inotify.max_user_watches=524288 | sudo tee -a /etc/sysctl.conf
+sudo sysctl -p
+```
+
+### WSL2 / network mounts
+
+Filesystem events do not propagate from the Windows host into WSL2
+(or across most SMB / NFS shares). The watcher will start without
+error and silently see nothing.
+
+Workarounds:
+
+- Rely on the scanner — at default 30 s interval, throughput is
+  bounded but eventually-consistent.
+- Drop the scan interval to ~5 s if the memory root is small.
+- Run `everos cascade sync` explicitly after batch edits.
+
+### Daemon process crash mid-batch
+
+`claim_pending_batch` flips rows to `processing` *atomically*. If the
+process dies before `mark_done` / `mark_failed`, those rows stay in
+`processing` until the next boot. **The orchestrator auto-recovers**
+on startup: `CascadeOrchestrator.start` calls
+`md_change_state_repo.recover_orphan_processing()` before launching
+the watcher / scanner / worker, which resets every `processing` row
+back to `pending`. Single-process cascade means no race — at boot
+time no other worker could legitimately own a `processing` row.
+
+No operator action required; the structured log line
+`cascade_recovered_orphan_processing` reports the count when it
+fires.
+
+### FD exhaustion (`os error 24` / EMFILE)
+
+Symptoms (any of these on a long-running daemon):
+
+- LanceDB query / index build fails with `lance error: ... Too many
+  open files (os error 24)`.
+- `lsof -p <pid> | wc -l` grows monotonically over hours / days.
+- Health log lines like `cascade_lancedb_optimize_failed` /
+  `cascade_lancedb_rebuild_failed` carrying `OSError: [Errno 24]`.
+
+Cause (verified against `lance crate 4.0`): the LanceDB *index* cache
+(`GlobalIndexCache`) holds one reader object per opened FTS / vector
+/ scalar index, and each reader pins the file descriptors of its
+`_indices/<uuid>/...` files. With a long-running daemon and steady-
+state cascade ingest, every `optimize()` call adds new readers; with
+LanceDB's own default (`index_cache_size_bytes=None`, unbounded), they
+**are never evicted** and the FDs leak monotonically.
+
+`drop_index` does **not** help — it is a manifest-only operation and
+leaves the on-disk UUID directories untouched. Even an explicit
+`optimize(cleanup_older_than=0)` `unlink()`-ing the files does not
+release FDs: POSIX keeps the inode alive as long as a process holds
+an open FD on it (the entries show as `(deleted)` in `lsof`). Only an
+LRU eviction inside the cache (or a connection close) actually closes
+the FDs.
+
+Fix (already wired in `LanceDBSettings.index_cache_size_bytes` —
+default 16 MB, ~290 FD ceiling): see
+[Tuning knobs § LanceDB index cache](#lancedb-index-cache-index_cache_size_bytes)
+for the sizing table and the env-var override path.
+
+If you have already hit EMFILE in a running process, the cleanest
+recovery is a daemon restart — the open connection closes, every FD
+is released, and the next start comes up with the capped Session in
+place.
+
+## Tuning knobs
+
+### Cascade scheduler knobs
+
+All defaults live in `everos.memory.cascade.orchestrator.CascadeConfig`
+and `everos.memory.cascade.worker.CascadeWorker`:
+
+| Knob | Default | Effect |
+|---|---|---|
+| `scan_interval_seconds` | 30 | Scanner sweep cadence |
+| `worker_batch_size` | 50 | Rows claimed per worker cycle |
+| `worker_max_retry` | 3 | Inline retries before `mark_failed(retryable=TRUE)` |
+| `worker_poll_interval_seconds` | 1 | Idle wait between empty drain attempts |
+| `worker_retry_backoff_seconds` | 2 | Linear backoff seed; doubles per attempt |
+
+Tuning surface is intentionally not in `Settings` yet — once we have
+wall-clock numbers from real workloads, the values that need
+operator override will surface there.
+
+### LanceDB index cache (`index_cache_size_bytes`)
+
+Lives in `LanceDBSettings`; overridable via the
+`EVEROS_LANCEDB__INDEX_CACHE_SIZE_BYTES` environment variable. This
+is the only knob that bounds the steady-state file-descriptor count
+of a long-running EverOS daemon — see
+[Recovery paths § FD exhaustion](#fd-exhaustion-os-error-24-emfile)
+for why nothing else (prune, rebuild, `drop_index`) helps.
+
+Measured cap → FD ceiling (30 add+optimize cycles + 100-query stress
+on the real `Episode` schema):
+
+| Cap | FD ceiling | Query latency (p50) | Safe under `ulimit -n` |
+|---|---|---|---|
+| `2 MB` | ~45 | ~5 ms | macOS default 256 (5× headroom) |
+| `4 MB` | ~52 | ~3 ms | macOS default 256 |
+| `8 MB` | ~140 | ~2.4 ms | macOS default 256 (1.8× headroom) |
+| **`16 MB`** (default) | **~290** | **~2.3 ms** | **Linux default 1024 (3.5× headroom); macOS needs `ulimit -n 1024`** |
+| `32 MB` | ~630 | ~1.4 ms | Linux default 1024 (1.6× headroom) |
+| `unbounded` | grows forever | ~1.3 ms | NEVER use in a daemon |
+
+EverOS's measured steady-state working set after a `rebuild_indexes`
+cycle is roughly **50-100 readers / 3-6 MB resident** (5 tables × ~7
+BM25 columns × ~10 `part_N` reader entries each), so the 16 MB default
+provides ~3× headroom for burst traffic and stale-but-not-yet-evicted
+readers.
+
+When to override:
+
+- **Tight `ulimit -n` environments** (containers; macOS dev boxes
+  that haven't bumped the default 256) → drop to `4 MB` or `8 MB`.
+  Query latency increases by ~1-3 ms but correctness is unaffected.
+- **Larger working sets** (many more tables or much wider FTS
+  indexes than the default schema set) → bump to `32-64 MB`. Verify
+  your platform's `ulimit -n` covers the corresponding FD ceiling
+  with at least 2× headroom.
+- **Diagnostic-only**: set to a tiny value (e.g. `1 MB`) to
+  *force* LRU thrashing and reproduce cache-miss latency in tests.
+
+Do **not** set `metadata_cache_size_bytes` — it is intentionally left
+at LanceDB's default (unbounded) because the metadata cache holds
+parsed manifests / fragment stats and has zero effect on FD count;
+capping it just thrashes parsing work without solving anything.
+
+## Concurrency
+
+The worker is async, not multi-process. Inside one drain cycle,
+`asyncio.gather(*[_process_one(row) for row in batch])` runs every
+claimed row concurrently — cascade is IO-bound (embedding HTTP calls
+dominate wall time) so single-process coroutine concurrency saturates
+the bottleneck. The `worker_batch_size` knob (default 50) caps
+in-flight rows.
+
+Multi-process workers are a scaling axis we'd reach for only if a
+single process becomes CPU-bound, which the current design does not
+anticipate. `claim_pending_batch` is already race-safe (the
+``WHERE status='pending'`` filter ensures each row lands in exactly
+one batch even if multiple workers raced), so adding processes later
+is a deployment-side change with no schema work.
+
+## What cascade does NOT do (yet)
+
+- **Schema migration**: LanceDB column changes require `rm -rf`.
+- **Parent-id back-link**: Episode rows currently carry
+  `parent_id=None`; the writer doesn't preserve the source memcell id
+  in the entry inline. Tracked separately.
+- **Reference-file change detection (agent_skill)**: edits to
+  `references/*.md` siblings won't trigger a re-index — only changes
+  to `SKILL.md` itself fire the watcher. Workaround: run
+  `everos cascade sync agents/<a>/skills/skill_<n>/SKILL.md` after
+  editing references.
--- a/docs/cli.md
+++ b/docs/cli.md
@ -0,0 +1,116 @@
+# CLI
+
+The `everos` command-line entry point covers **setup and operations** —
+generate a starter `.env` (`init`), run the HTTP API server (`server
+start`), and operate the md → LanceDB index queue (`cascade`). Hot-path
+business (`/add` `/flush` `/search` `/get`) is the **HTTP API**, not the
+CLI.
+
+CLI commands run **in-process** — they call into the `service/` /
+infrastructure layers directly rather than the HTTP loopback.
+
+## Installation
+
+The script is exposed via `pyproject.toml`:
+
+```toml
+[project.scripts]
+everos = "everos.entrypoints.cli.main:app"
+```
+
+After `uv sync` (or `pip install -e .`) the `everos` command resolves
+to [`src/everos/entrypoints/cli/main.py`](../src/everos/entrypoints/cli/main.py),
+a [Typer](https://typer.tiangolo.com/) app.
+
+## Subcommand layout
+
+```
+everos
+├── init                            Generate a starter .env from the packaged template
+├── server
+│   └── start                       Start the HTTP API server (uvicorn)
+└── cascade                         Inspect / operate the md → LanceDB sync queue
+    ├── status                      Queue / LSN summary
+    ├── sync                        Drain the queue now (force md → LanceDB)
+    └── fix                         List failed rows / re-enqueue retryable ones
+```
+
+Each subcommand lives in its own module under
+[`entrypoints/cli/commands/`](../src/everos/entrypoints/cli/commands/) and is
+registered in `cli/main.py`. The CLI is intentionally small — hot-path
+business (`/add` `/flush` `/search` `/get`) is the **HTTP API**, not the
+CLI; the CLI covers setup (`init`), running the server, and index ops
+(`cascade`). There is no `reindex` command — rebuild by deleting
+`<root>/.index/lancedb` and restarting, or run `everos cascade sync`.
+
+## `everos server start`
+
+Wraps `uvicorn` to launch the FastAPI app from
+[`entrypoints/api/app.py`](../src/everos/entrypoints/api/app.py)
+in *factory* mode.
+
+```bash
+everos server start \
+    --host 127.0.0.1 \
+    --port 8000 \
+    --log-level info \
+    --env-file .env
+```
+
+| Flag | Env var | Default |
+|---|---|---|
+| `--host` | `EVEROS_API__HOST` | `127.0.0.1` (loopback only; binding `0.0.0.0` logs a warning — EverOS ships no auth) |
+| `--port` | `EVEROS_API__PORT` | `8000` |
+| `--log-level` | `EVEROS_LOG_LEVEL` | `INFO` |
+| `--env-file` | — | searched: `./.env` → `$XDG_CONFIG_HOME/everos/.env` → `~/.everos/.env` |
+| `--reload` | — | off (use in development) |
+
+Lifespan startup wires the storage backends (SQLite engine + LanceDB
+connection) on app boot; see
+[`entrypoints/api/lifespans/`](../src/everos/entrypoints/api/lifespans/).
+
+## Configuration via env vars
+
+Both CLI and HTTP server read configuration from `pydantic-settings`:
+
+| Env var | Settings field |
+|---|---|
+| `EVEROS_MEMORY__ROOT` | `Settings.memory.root` (memory-root path) |
+| `EVEROS_MEMORY__TIMEZONE` | `Settings.memory.timezone` (e.g. `Asia/Shanghai`) |
+| `EVEROS_SQLITE__BUSY_TIMEOUT_MS` | `Settings.sqlite.busy_timeout_ms` |
+| `EVEROS_LANCEDB__READ_CONSISTENCY_SECONDS` | `Settings.lancedb.read_consistency_seconds` |
+
+Pattern: `EVEROS_<SECTION>__<KEY>` (double underscore = nesting). See
+[`config/settings.py`](../src/everos/config/settings.py).
+
+## Logging
+
+`configure_logging` runs at CLI startup and configures `structlog` with
+the resolved log level. All in-process logs (CLI command bodies +
+service / infra layers) flow through the same handler.
+
+```bash
+everos server start --log-level debug   # see all sql / lance traffic
+```
+
+## API ↔ CLI division of labour
+
+| Responsibility | API | CLI |
+|---|---|---|
+| Hot-path business (`/add` `/flush` `/search` `/get`) | ✅ | — (HTTP only) |
+| Setup (generate `.env`) | — | `everos init` |
+| Run the server | — | `everos server start` |
+| Index ops (drain / inspect / fix the cascade queue) | — | `everos cascade {status,sync,fix}` |
+| Health probe | `GET /health` | (use HTTP) |
+| Metrics scrape | `GET /metrics` | (use HTTP) |
+
+The CLI is the **shell-friendly** surface for ops + scripting; the
+HTTP API is the **process-friendly** surface for clients (web UIs,
+agents, automation).
+
+## See also
+
+- [architecture.md](architecture.md) — DDD layering between
+  entrypoints / service / memory / infra
+- [`entrypoints/cli/main.py`](../src/everos/entrypoints/cli/main.py)
+- [`entrypoints/cli/commands/server.py`](../src/everos/entrypoints/cli/commands/server.py)
--- a/docs/datetime.md
+++ b/docs/datetime.md
@ -0,0 +1,263 @@
+# Datetime & Timezones
+
+> Audience: contributors. Read this once before touching any code that
+> records a moment in time.
+
+## Table of contents
+
+- [The two-zone discipline](#the-two-zone-discipline)
+- [Why two zones](#why-two-zones)
+- [Helper reference](#helper-reference)
+- [Field-type rules](#field-type-rules)
+- [End-to-end data flow](#end-to-end-data-flow)
+- [Common pitfalls](#common-pitfalls)
+- [Testing guidance](#testing-guidance)
+
+## The two-zone discipline
+
+EverOS treats datetimes on **two separate rails**:
+
+| Rail | Where it lives | Helper |
+|---|---|---|
+| **UTC** (storage) | SQLite, LanceDB, OME events — anything persisted to disk | `get_utc_now`, `ensure_utc`, `UtcDatetime` |
+| **Display tz** | Markdown frontmatter, HTTP API responses, daily-log filename buckets, fallback zone for naive caller input | `get_now_with_timezone`, `today_with_timezone`, `to_display_tz` |
+
+The display timezone is set by the `EVEROS_MEMORY__TIMEZONE`
+environment variable (or `[memory] timezone` in TOML). Default `UTC`.
+
+**Inviolable rule**: the display tz must **never** reach storage. Once
+the user switches `EVEROS_MEMORY__TIMEZONE`, existing on-disk rows
+must not misalign.
+
+## Why two zones
+
+### What goes wrong with a single "configured" zone
+
+The naive design — "use one configured timezone everywhere" — has two
+failure modes, both subtle:
+
+1. **Configuration drift.** Day 1 the user configures
+   `EVEROS_MEMORY__TIMEZONE=Asia/Shanghai`. Everything stores
+   Shanghai-local datetimes. On Day 30 they switch to
+   `UTC`. SQLite (which strips tz on write and returns naive on read)
+   silently reinterprets the old Shanghai values as UTC — every old
+   row jumps eight hours into the future.
+2. **Cross-region replication.** If two deployments share storage
+   but configure different display zones, both interpret the same
+   naive bytes against their own local zone and diverge by the
+   offset delta. There is no "true" reading.
+
+UTC-only storage forecloses both: bytes on disk are zone-independent.
+
+### Why not UTC everywhere then?
+
+Users want to read timestamps in their wall-clock zone. Markdown
+frontmatter that says `2026-05-29T06:00:00Z` for a meeting that
+happened locally at 14:00 is jarring. The display rail solves this
+without polluting storage: render UTC bytes through `to_display_tz`
+at the boundary.
+
+## Helper reference
+
+All helpers live in [`everos.component.utils.datetime`](../src/everos/component/utils/datetime.py).
+
+### Storage rail
+
+| Helper | Behaviour |
+|---|---|
+| `get_utc_now() -> datetime` | Current UTC instant, `tzinfo=UTC`. Independent of any setting. Use as `default_factory` on any storage field. |
+| `ensure_utc(d) -> datetime` | Naive → attach display tz → convert to UTC. Aware → `astimezone(UTC)`. Use at the storage boundary if you receive a datetime you didn't construct. |
+| `UtcDatetime` | `Annotated[datetime, AfterValidator(ensure_utc)]`. Apply to any SQLite field. Pydantic auto-runs validation on both INSERT defaults and read-back rows. |
+
+### Display rail
+
+| Helper | Behaviour |
+|---|---|
+| `get_now_with_timezone() -> datetime` | Current instant in the configured display tz. `.isoformat()` produces e.g. `2026-05-29T14:00:00+08:00`. |
+| `today_with_timezone() -> date` | Today's date in the display tz. Use for daily-log filename buckets. |
+| `to_display_tz(d) -> datetime` | Convert any datetime to the display tz. Naive input is treated as already display-tz local. |
+
+### Parsing & rendering
+
+| Helper | Behaviour |
+|---|---|
+| `from_iso_format(value)` | Parse an ISO string / datetime / epoch. Naive input attaches **display tz** (the "if you didn't say a zone, assume your zone" rule). |
+| `from_timestamp(ts)` | Parse epoch seconds / milliseconds (auto-detects). Returns display-tz aware. |
+| `to_iso_format(d)` | `.isoformat()` after light validation. |
+| `to_timestamp_ms(d)` | Milliseconds epoch (`int`). |
+
+## Field-type rules
+
+### SQLite tables
+
+```python
+from everos.component.utils.datetime import UtcDatetime, get_utc_now
+from everos.core.persistence.sqlite import BaseTable, Field
+
+class MyRow(BaseTable, table=True):
+    happened_at: UtcDatetime = Field(default_factory=get_utc_now)
+```
+
+Why `UtcDatetime` and not plain `datetime`? SQLAlchemy silently strips
+tz on SQLite writes. `UtcDatetime`'s `AfterValidator` runs on
+**construction** to make sure whatever the caller hands in gets
+normalised to UTC before persistence.
+
+SQLModel's ORM hydrate path (rows from `select(...)`) **bypasses**
+the Pydantic validator — SQLAlchemy assigns column values straight
+to instance attributes. To close that gap,
+[core/persistence/sqlite/base.py](../src/everos/core/persistence/sqlite/base.py)
+registers a SQLAlchemy `load` event listener that re-attaches
+`tzinfo=UTC` to every `UtcDatetime` column after hydrate. Net effect:
+**callers never see a naive datetime from a SQLite repo**, whatever
+the code path.
+
+`BaseTable.created_at` / `updated_at` already use `UtcDatetime` and
+`get_utc_now` — any subclass inherits both the construction-time
+validator **and** the load-time hook for free.
+
+### LanceDB tables — zero configuration
+
+```python
+import datetime as _dt
+
+class MyLanceRow(BaseLanceTable):
+    ts: _dt.datetime   # automatically tz=UTC in the Arrow schema
+```
+
+LanceDB's Pydantic → PyArrow converter does not understand
+`typing.Annotated` metadata; using `UtcDatetime` as the annotation
+would raise `TypeError: Converting Pydantic type to Arrow Type`.
+Instead, `BaseLanceTable.to_arrow_schema()` walks the inferred schema
+and rewrites **every** naive `timestamp[us]` column to
+`timestamp[us, tz=UTC]`. PyArrow then:
+
+* **on write** — `astimezone(UTC)` any aware input automatically.
+* **on read** — returns aware UTC datetimes (not naive).
+
+No caller-side coercion needed, no per-table declaration. The
+response shapers only run `to_display_tz(...)` to convert UTC to the
+configured display zone.
+
+If a future schema genuinely needs a naive datetime column (project
+convention says storage is always UTC, so this would be unusual),
+override `to_arrow_schema` on that subclass and skip the patch for
+that one column.
+
+### OME events / in-memory state
+
+OME events are persisted-adjacent (the `run_record` / `counter` stores
+serialise them). Use `get_utc_now()` for any `default_factory` on the
+event payload.
+
+## Two centralised defenses
+
+| Backend | Defense | Where |
+|---|---|---|
+| **SQLite** | SQLAlchemy `load` event listener on `BaseTable` re-attaches `tzinfo=UTC` after every ORM hydrate | [core/persistence/sqlite/base.py](../src/everos/core/persistence/sqlite/base.py) |
+| **LanceDB** | `BaseLanceTable.to_arrow_schema()` rewrites `UTC_DATETIME_FIELDS` columns to `timestamp[us, tz=UTC]`; PyArrow handles UTC end-to-end | [core/persistence/lancedb/base.py](../src/everos/core/persistence/lancedb/base.py) |
+| **CI gate** | `scripts/check_datetime_discipline.py` fails the build on any code that bypasses `component/utils/datetime` | wired into `make lint` |
+
+These defenses replace what used to be an "every consumer must call
+`ensure_utc()`" shotgun discipline. With both in place, callers never
+observe a naive datetime from either backend.
+
+## End-to-end data flow
+
+```
+User input (any zone)
+        │
+        ▼
+   from_iso_format     ←  naive → attach display tz
+        │
+        ▼
+   ensure_utc          ←  storage boundary: → UTC
+        │
+        ▼
+┌────────────────┬────────────────┐
+│   SQLite       │   LanceDB      │
+│  (UtcDatetime  │   (Arrow       │
+│   re-attaches  │   stripped to  │
+│   UTC on read) │   UTC bytes)   │
+└────────────────┴────────────────┘
+        │
+        ▼
+   from_iso_format    ←  read path normalises naive → display tz
+        │
+        ▼
+   to_display_tz      ←  response boundary: → display tz
+        │
+        ▼
+   Pydantic .isoformat()  →  "2026-05-29T14:00:00+08:00"
+        │
+        ▼
+   HTTP API response / markdown frontmatter
+```
+
+The storage boundary and response boundary are the two points where
+the zone discipline is enforced. Everything in between just passes
+datetimes through.
+
+## Common pitfalls
+
+> [!WARNING]
+> **`datetime.now()` without `tz=`.** Forbidden. Always use
+> `get_utc_now()` (storage) or `get_now_with_timezone()` (display).
+> Linted by `.claude/rules/datetime-handling.md` and CI.
+
+> [!WARNING]
+> **Calling `astimezone()` on a value just read from SQLite.** If the
+> field isn't typed `UtcDatetime`, SQLite returns naive — and
+> `astimezone()` on a naive datetime silently interprets it as
+> **local process time**, not UTC. Always use `UtcDatetime` on SQLite
+> fields.
+
+> [!WARNING]
+> **Storing `get_now_with_timezone()` directly.** That returns
+> display-tz time. If the display tz later changes, your stored values
+> are stranded. Use `get_utc_now()` for any persisted field.
+
+> [!INFO]
+> **Migrating existing rows.** Q2 was rolled out on a clean codebase
+> with no production data. If you operate an instance where SQLite
+> values were written with display-tz-aware values (pre-Q2), you must
+> either drop the database or write a one-time migration that
+> reinterprets each row's naive value against the old display tz
+> before re-writing as UTC. The project does not ship such a
+> migration.
+
+## Testing guidance
+
+For unit tests that depend on display-tz behaviour, both caches must
+clear:
+
+```python
+import pytest
+from everos.component.utils import datetime as dt_module
+from everos.config import load_settings
+
+@pytest.fixture(autouse=True)
+def _isolate_tz(monkeypatch: pytest.MonkeyPatch) -> None:
+    monkeypatch.delenv("EVEROS_MEMORY__TIMEZONE", raising=False)
+    load_settings.cache_clear()
+    dt_module._display_tz.cache_clear()
+```
+
+The autouse fixture in [tests/conftest.py](../tests/conftest.py) does
+exactly this — it runs for every test by default. If you write a
+locally-scoped test that needs a non-default zone, monkeypatch the env
+var **and** clear both caches:
+
+```python
+def test_my_thing(monkeypatch):
+    monkeypatch.setenv("EVEROS_MEMORY__TIMEZONE", "Asia/Shanghai")
+    load_settings.cache_clear()
+    dt_module._display_tz.cache_clear()
+    ...
+```
+
+The full invariant set is covered in
+[tests/unit/test_component/test_utils/test_datetime.py](../tests/unit/test_component/test_utils/test_datetime.py)
+under the "Q2 two-zone discipline invariants" section. If you change
+the storage / display contract, those tests are the first line of
+defense — update them in lockstep.
--- a/docs/engineering.md
+++ b/docs/engineering.md
@ -0,0 +1,553 @@
+# Engineering & Dev-Efficiency Infrastructure
+
+> Companions: business architecture lives in [architecture.md](architecture.md);
+> hard coding constraints live in [../.claude/rules/](../.claude/rules/).
+> This document covers the surrounding tooling, configuration, and processes
+> — what we adopted, what role each piece plays, and how they fit together.
+> CI runs on GitHub Actions; all checks are invoked through the `Makefile`.
+
+---
+
+## 1. Scope
+
+Engineering / dev-efficiency infrastructure does not solve business problems —
+it solves **team + code + time** problems:
+
+```
+┌──────────────────────────────────────────────────────────┐
+│                                                          │
+│   Business architecture (docs/architecture.md)           │
+│      — answers "how to build the system"                 │
+│                                                          │
+│   Engineering rules (.claude/rules/)                     │
+│      — answers "how to write the code"                   │
+│                                                          │
+│   Engineering / dev-efficiency infrastructure (this doc) │
+│      — answers "how the team collaborates,               │
+│         how code is auto-checked,                        │
+│         how releases are automated,                      │
+│         how tools land in the project"                   │
+│                                                          │
+└──────────────────────────────────────────────────────────┘
+```
+
+Reasons this is documented separately:
+
+- **Cross-project reusable** — `CLAUDE.md` / rules / `pyproject.toml` are
+  patterns, not content. The next project can adopt them as-is.
+- **Decoupled from business** — business architecture changes do not affect
+  these; upgrading these does not affect business.
+- **Onboarding-oriented** — new contributors read this first to understand
+  what the tooling looks like.
+
+---
+
+## 2. Infrastructure overview
+
+```
+┌─────────────────────────────────────────────────────────────────────┐
+│            Team collaboration / Code quality / CI/CD                 │
+├─────────────────────────────────────────────────────────────────────┤
+│                                                                     │
+│   ┌─ Claude Code engineering layer ────────────────────────────┐    │
+│   │                                                            │    │
+│   │   CLAUDE.md  ←  team-shared context (auto loaded into     │    │
+│   │                 system prompt)                             │    │
+│   │   .claude/                                                 │    │
+│   │   ├── CLAUDE.md          subdir context (optional)        │    │
+│   │   ├── rules/  (10)       path-scoped hard coding rules    │    │
+│   │   ├── skills/ (3)        slash command workflows          │    │
+│   │   └── settings.json      permissions allowlist            │    │
+│   │                                                            │    │
+│   └────────────────────────────────────────────────────────────┘    │
+│                                                                     │
+│   ┌─ Code quality gates ───────────────────────────────────────┐    │
+│   │                                                            │    │
+│   │   pre-commit          runs locally before commit           │    │
+│   │     ├ ruff (lint+fmt)                                      │    │
+│   │     ├ trailing-whitespace / end-of-file-fixer              │    │
+│   │     ├ check-yaml / check-toml                              │    │
+│   │     ├ check-added-large-files (≥1MB warn)                  │    │
+│   │     ├ detect-private-key                                   │    │
+│   │     └ gitlint (commit-msg stage)                           │    │
+│   │                                                            │    │
+│   │   ruff                lint + format                        │    │
+│   │                       (replaces black / isort / flake8)    │    │
+│   │   import-linter       DDD layer-direction enforcement      │    │
+│   │   pytest              unit / integration                   │    │
+│   │                                                            │    │
+│   └────────────────────────────────────────────────────────────┘    │
+│                                                                     │
+│   ┌─ Dependencies & build ─────────────────────────────────────┐    │
+│   │                                                            │    │
+│   │   uv                  sole package manager                 │    │
+│   │                       (no `pip install`)                   │    │
+│   │   pyproject.toml      src layout + extras + groups         │    │
+│   │   uv.lock             checked in; CI uses --frozen         │    │
+│   │   hatchling           wheel build backend                  │    │
+│   │   Makefile            unified entry; CI calls it           │    │
+│   │   src/everos/templates/env.template                       │    │
+│   │                       environment variable template        │    │
+│   │                                                            │    │
+│   └────────────────────────────────────────────────────────────┘    │
+│                                                                     │
+│   ┌─ CI/CD (GitHub Actions) ───────────────────────────────────┐    │
+│   │                                                            │    │
+│   │   CI:    .github/workflows/ci.yml    lint / test / integ   │    │
+│   │   Docs:  .github/workflows/docs.yml  Markdown link check   │    │
+│   │   Both invoke Makefile targets; the Makefile is the        │    │
+│   │   single source of truth for commands.                     │    │
+│   │                                                            │    │
+│   └────────────────────────────────────────────────────────────┘    │
+│                                                                     │
+│   ┌─ Collaboration workflow ───────────────────────────────────┐    │
+│   │                                                            │    │
+│   │   Branch model: dev / master (GitFlow Lite)                │    │
+│   │   PR template: .github/PULL_REQUEST_TEMPLATE.md            │    │
+│   │   ISSUE_TEMPLATE: bug / feature / use-case / docs / config │    │
+│   │   CONTRIBUTING.md: contributor onboarding                  │    │
+│   │                                                            │    │
+│   └────────────────────────────────────────────────────────────┘    │
+│                                                                     │
+└─────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## 3. Claude Code engineering layer
+
+### 3.1 Loading mechanism
+
+Claude Code automatically loads the following into the system prompt at
+session start (no manual import):
+
+```
+┌────────────────────────┬──────────────────────────────────────────┐
+│  File                   │  Purpose                                 │
+├────────────────────────┼──────────────────────────────────────────┤
+│  CLAUDE.md (repo root)  │  Team-shared context: architecture       │
+│                         │  overview, commands, convention index    │
+│  .claude/rules/*.md     │  Hard coding constraints                 │
+│                         │  (path-scoped on-demand load)            │
+│  .claude/settings.json  │  Permissions allowlist (not in prompt)   │
+│  ~/.claude/CLAUDE.md    │  User-level (personal preferences)       │
+│  CLAUDE.local.md        │  Project-local personal (gitignored)     │
+└────────────────────────┴──────────────────────────────────────────┘
+```
+
+### 3.2 Rules (10 files, path-scoped)
+
+| File | Paths (auto-load condition) |
+|---|---|
+| architecture.md | always loaded (no paths) |
+| code-style.md | always loaded (no paths) |
+| language-policy.md | always loaded (no paths) |
+| imports.md | `src/**/*.py`, `tests/**/*.py` |
+| init-py-and-reexport.md | `src/**/__init__.py`, `src/**/*.py` |
+| module-docstring.md | `src/{infra,memory,service,component,core}/**/*.py` |
+| async-programming.md | `src/**/*.py`, `tests/**/*.py` |
+| datetime-handling.md | `src/**/*.py`, `tests/**/*.py` |
+| logging-observability.md | `src/**/*.py` |
+| testing.md | `tests/**/*.py` |
+
+**Why path-scoped**: avoid loading 1000+ lines of rules every session
+(~5–8K tokens). At startup only architecture + code-style + language-policy
+load (~1.5–2K tokens); the rest load on demand when Claude Code reads a
+matching `.py` file.
+
+### 3.3 Skills (3 slash commands)
+
+| Command | Purpose | When to use |
+|---|---|---|
+| `/commit` | Generate a Conventional Commits message | After a focused change, ready to commit |
+| `/new-branch` | Create branch under dev/master strategy | Starting a new feat / fix / hotfix |
+| `/pr` | Open a GitHub PR with the repo template | Ready to merge |
+
+Skills and rules use **independent loading mechanisms**: rules auto-load
+into the system prompt, skills only trigger when the user types `/<name>`.
+
+### 3.4 settings.json
+
+```json
+{
+  "permissions": {
+    "allow": ["Bash(uv sync*)", "Bash(make*)", "Bash(uv run pytest*)", ...]
+  }
+}
+```
+
+**Purpose**: reduce permission prompts. Team-shared config goes into
+`settings.json` (in git); personal preferences go into `settings.local.json`
+(gitignored).
+
+---
+
+## 4. Code quality gates
+
+```
+        ┌──────────────────────────────────────────────────────┐
+        │     Each stage can independently fail the change      │
+        └──────────────────────────────────────────────────────┘
+
+[Local editor]
+     │
+     ▼
+Stage 1: editor real-time feedback
+     ├ ruff (lint + format) on save
+     └ path-relevant .claude/rules guide Claude Code
+
+     │
+     ▼
+Stage 2: pre-commit (triggered by `git commit`)
+     ├ ruff fix + format
+     ├ trailing-whitespace, end-of-file-fixer
+     ├ check-yaml, check-toml
+     ├ check-added-large-files (≥1MB)
+     ├ detect-private-key
+     └ gitlint  (commit-msg stage; rejects malformed messages)
+
+     │
+     ▼
+Stage 3: local `make ci` (manual, before push)
+     ├ make lint        (ruff check + ruff format --check + import-linter)
+     ├ make test        (pytest tests/unit)
+     └ make integration (pytest tests/integration)
+
+     │
+     ▼
+Stage 4: CI (GitHub Actions, push + PR triggered)
+     └ re-runs the same `make lint / test / integration` targets
+
+     │
+     ▼
+Stage 5: PR review
+     ├ ≥ 1 approval
+     └ all threads resolved + all CI green
+```
+
+**Key design**: when any stage fails, **never merge** — there is no
+`--no-verify` / `--allow-failure` escape hatch.
+
+---
+
+## 5. Dependencies & build
+
+### 5.1 pyproject.toml overview
+
+```toml
+[project]
+name = "everos"
+requires-python = ">=3.12"
+dependencies = [...]               # runtime deps (minimal set)
+
+[project.optional-dependencies]
+multimodal = [...]                 # extras (install on demand)
+
+[build-system]
+requires = ["hatchling"]
+build-backend = "hatchling.build"
+
+[tool.hatch.build.targets.wheel]
+packages = ["src/everos"]          # src layout
+
+[project.scripts]
+everos = "everos.entrypoints.cli.main:app"  # exposes CLI command
+
+[tool.ruff]                        # code style
+[tool.pytest.ini_options]          # tests
+[tool.coverage.run]                # coverage config (gate lives in `make cov`)
+[tool.importlinter]                # dependency direction
+
+[dependency-groups]
+dev = ["ruff", "pytest", "pytest-asyncio", "pytest-cov",
+       "import-linter", "pre-commit", "ipdb"]
+```
+
+**Single-file principle**: configuration that used to live in `pylintrc`,
+`pytest.ini`, `.isort.cfg` is **all consolidated into `pyproject.toml`**.
+
+### 5.2 Makefile commands
+
+```
+make help          list all targets
+make install       uv sync --frozen
+make format        ruff fix + format
+make lint          ruff + import-linter + datetime discipline + openapi drift
+make test          pytest tests/unit
+make integration   pytest tests/integration
+make cov           pytest unit + integration, coverage gate (fail under 80%)
+make ci            lint + test + integration   ← CI invokes these targets
+make clean         clear caches
+```
+
+**Single source of truth**: CI only invokes `make <target>`, so local and CI
+run identical commands and cannot drift.
+
+### 5.3 env.template (slimmed down)
+
+The template lives at `src/everos/templates/env.template` (bundled
+inside the wheel as package data, copied to `./.env` via `everos init`).
+It groups settings by provider, each block sharing the OpenAI-protocol
+`MODEL` / `API_KEY` / `BASE_URL` triple:
+
+```
+EVEROS_LLM__*           # text model (model / api_key / base_url)
+EVEROS_MULTIMODAL__*    # vision model for image/office inputs
+EVEROS_EMBEDDING__*     # embedding model (vector index)
+EVEROS_RERANK__*        # cross-encoder reranker
+EVEROS_MEMORY__ROOT     # memory-root (md files + .index/{sqlite,lancedb}/)
+EVEROS_LOG_LEVEL        # DEBUG | INFO | WARNING | ERROR
+EVEROS_LOG_FORMAT       # json | text
+TZ                      # display timezone (storage is always UTC)
+```
+
+Every key has a sensible default except the `API_KEY` fields, which you fill in.
+
+---
+
+## 6. CI/CD (GitHub Actions)
+
+### 6.1 Strategy
+
+```
+┌──────────────────────────────────────────────────────────┐
+│                                                          │
+│   GitHub Actions   (.github/workflows/)                  │
+│     ci.yml    push (main/dev/master) + PR                │
+│       ├ make install-deps   (uv sync --frozen)           │
+│       ├ make lint           (ruff + import-linter +      │
+│       │                      datetime + openapi drift)   │
+│       ├ make test           (pytest tests/unit)          │
+│       └ make integration    (pytest tests/integration)   │
+│     docs.yml  Markdown link check + issue-template YAML  │
+│                                                          │
+│   Consistency:                                           │
+│     ├ astral-sh/setup-uv (cache keyed by uv.lock)        │
+│     ├ Makefile is the single source of CI commands       │
+│     └ pre-commit runs locally first to reduce CI churn   │
+│                                                          │
+└──────────────────────────────────────────────────────────┘
+```
+
+### 6.2 CI checklist
+
+| Check | Tool | Failure condition |
+|---|---|---|
+| Lint | `make lint` (ruff check + ruff format --check) | any error |
+| Layer direction | `make lint` (lint-imports inside) | layer violation |
+| Datetime discipline | `make lint` (check_datetime_discipline.py) | bypasses helper module |
+| OpenAPI drift | `make lint` (dump_openapi.py --check) | schema ≠ committed openapi.json |
+| Unit | `make test` (pytest tests/unit) | any failure |
+| Integration | `make integration` (pytest tests/integration) | any failure |
+
+Integration tests run with a `FakeLLMClient` — no live credentials are needed in CI.
+Commit message format is enforced **locally** via `gitlint` in the `commit-msg`
+pre-commit stage; it does not run in CI.
+
+### 6.3 Branch protection
+
+| Branch | Rule |
+|---|---|
+| **master** | branch protection: PR + 1 review + green CI; no direct push |
+| **dev** | same as above |
+| feat / fix / hotfix | free push; rebase parent before merge |
+
+---
+
+## 7. Collaboration workflow
+
+### 7.1 Branch model (GitFlow Lite)
+
+```
+                              v0.1                              v0.2                                v1.0
+                                ▲                                 ▲                                   ▲
+                                │ release PR                      │ release PR                        │ release PR
+                                │ (dev→master+tag)                │ (dev→master+tag)                  │ (dev→master+tag)
+master   ●──────────────────────●─────────────●──────────────────●──────────────────────────────────●────►  stable / released
+                                │             ▲                  │                                  │
+                                │             │ merge hotfix     │                                  │
+                                │             │                  │                                  │
+                                │       ●──●──┘                  │                                  │
+                                │       │ hotfix branch          │                                  │
+                                │       │ (cut from master)      │                                  │
+                                │       │                        │                                  │
+                                │       ▼ sync to dev            │                                  │
+                                │       │                        │                                  │
+dev   ●──●──●──●──●──●──●──●──●─●──●──●─●──●──●──●──●──●──●──●──●─●──●──●──●──●──●──●──●──●──●──●──●─────►  integration
+            ▲                   ↑                                ↑                                  ↑
+            │             release point                   release point                       release point
+       feat/A             (dev HEAD →                     (dev HEAD →                         (dev HEAD →
+       ●──●──●             master + v0.1)                  master + v0.2)                      master + v1.0)
+
+
+  feat/*   : cut from dev → PR → merge into dev
+  hotfix/* : cut from master → merge into master + sync into dev (double merge)
+  release  : dev → master + tag on master (no separate release branch)
+
+  Vertical │ in the diagram = "dev HEAD merged into master via release PR + v0.x tag"
+```
+
+Details in [../.claude/skills/new-branch/SKILL.md](../.claude/skills/new-branch/SKILL.md).
+
+### 7.2 PR template
+
+A single PR template at [`.github/PULL_REQUEST_TEMPLATE.md`](../.github/PULL_REQUEST_TEMPLATE.md)
+with five sections: **Summary / Area / Verification / Checklist / Notes for
+Reviewers**. The `/pr` skill fills it in (see
+[../.claude/skills/pr/SKILL.md](../.claude/skills/pr/SKILL.md)).
+
+### 7.3 Commit convention (Conventional Commits)
+
+Format: `<type>[(scope)][!]: <description>` per
+[Conventional Commits](https://www.conventionalcommits.org).
+
+```
+feat:     new feature
+fix:      bug fix
+refactor: restructuring (no behavior change)
+test:     add / update tests
+docs:     documentation
+style:    formatting
+perf:     performance optimization
+chore:    configuration / build / tooling
+build:    build system or dependencies
+ci:       CI configuration
+revert:   revert a previous commit
+```
+
+`gitlint` enforces the format **locally** via its `contrib-title-conventional-commits`
+rule in the commit-msg pre-commit stage. See
+[../.claude/skills/commit/SKILL.md](../.claude/skills/commit/SKILL.md).
+
+---
+
+## 8. Issue templates / user support
+
+```
+.github/ISSUE_TEMPLATE/
+├── bug_report.yml           structured bug report (form)
+├── feature_request.yml      feature proposal (form)
+├── use_case.yml             share a use case / integration
+├── docs.yml                 documentation issue
+└── config.yml               disable blank issues + community links
+
+CONTRIBUTING.md              contributor onboarding: setup / code style /
+                             branch / commit / PR / testing
+```
+
+---
+
+## 9. Infrastructure summary table
+
+```
+┌─────────────────────┬──────────────────────────────────────┬─────────────┐
+│  Facility            │  Location / file                      │  Failure    │
+│                      │                                       │  impact     │
+├─────────────────────┼──────────────────────────────────────┼─────────────┤
+│  CLAUDE.md           │  /CLAUDE.md                          │  cc loses   │
+│                      │                                      │  context    │
+│  Team rules          │  /.claude/rules/ (10)                │  cc unaware │
+│                      │                                      │  of conv.   │
+│  Team skills         │  /.claude/skills/ (3)                │  no slash   │
+│                      │                                      │  workflows  │
+│  Permissions         │  /.claude/settings.json              │  cc prompts │
+│                      │                                      │  on each op │
+├─────────────────────┼──────────────────────────────────────┼─────────────┤
+│  pyproject           │  /pyproject.toml                     │  build fail │
+│  Lock file           │  /uv.lock                            │  dep drift  │
+│  Makefile            │  /Makefile                           │  no unified │
+│                      │                                      │  entry      │
+│  pre-commit          │  /.pre-commit-config.yaml            │  no local   │
+│                      │                                      │  gate       │
+│  env template        │  /src/everos/templates/env.template │  newcomers  │
+│                      │                                      │  lost on env│
+├─────────────────────┼──────────────────────────────────────┼─────────────┤
+│  CI                  │  /.github/workflows/ci.yml           │  PR cannot  │
+│                      │                                      │  merge      │
+│  Docs CI             │  /.github/workflows/docs.yml         │  broken     │
+│                      │                                      │  doc links  │
+│  PR template         │  /.github/PULL_REQUEST_TEMPLATE.md   │  no PR temp │
+│  Issue templates     │  /.github/ISSUE_TEMPLATE/ (5)        │  scattered  │
+│  CONTRIBUTING        │  /CONTRIBUTING.md                    │  contrib.   │
+│                      │                                      │  confused   │
+└─────────────────────┴──────────────────────────────────────┴─────────────┘
+```
+
+---
+
+## 10. Future extensions
+
+```
+Near-term
+  □ /new-module    skill: scaffold a subpackage that complies with rules
+  □ ruff rule sets: add D (docstring), ANN (annotations)
+  □ Static type checking (pyright or mypy) once hot paths stabilize
+
+Mid-term
+  □ release-please / Conventional Commits → automated changelog
+  □ Automated PyPI wheel upload on tag
+  □ Multi-Python version matrix (3.12 / 3.13)
+  □ Performance benchmark CI with historical comparison
+
+Long-term
+  □ Mutation testing (mutmut)
+  □ Coverage ratchet (raise the 80% gate as the suite matures)
+```
+
+---
+
+## 11. On investing in engineering infrastructure
+
+```
+┌──────────────────────────────────────────────────────────┐
+│                                                          │
+│   Plain business code ≠ an engineering project            │
+│                                                          │
+│   Engineering project = business code +                   │
+│                         coding rules +                    │
+│                         quality gates (pre-commit + CI) + │
+│                         automation (Makefile + skills) +  │
+│                         collaboration (branch + PR) +     │
+│                         knowledge base (CLAUDE.md +       │
+│                                         rules + docs)     │
+│                                                          │
+│   The earlier this infrastructure lands, the faster and   │
+│   farther the team can run.                               │
+│                                                          │
+└──────────────────────────────────────────────────────────┘
+```
+
+Old project vs. new project after this rewrite:
+
+| Dimension | Old project | New project |
+|---|---|---|
+| Lint tools | black + isort + pylint | ruff (single tool) |
+| Config files | pyproject + pylintrc + pyrightconfig + pytest.ini | unified pyproject.toml |
+| pre-commit | basic | adds gitlint commit-msg + import / yaml / private-key checks |
+| Layer direction | not enforced | import-linter enforced in CI |
+| Commit format | freeform | gitlint pre-commit hook (Conventional Commits) |
+| Claude Code integration | partial rules | rules + skills + settings (full) |
+| CI platform | ad hoc | GitHub Actions calling Makefile targets |
+| Tests | basic | unit + integration + e2e + coverage report |
+
+These are not perfectionism — they are baseline requirements for
+**multi-person collaboration, long-term maintenance, and sustainable
+evolution**.
+
+---
+
+## 12. References
+
+- Hard coding rules: [../.claude/rules/](../.claude/rules/) (auto-loaded by Claude Code)
+- Slash command workflows: [../.claude/skills/](../.claude/skills/)
+- Contributor onboarding: [../CONTRIBUTING.md](../CONTRIBUTING.md)
+- Architecture: [architecture.md](architecture.md)
+- Claude Code memory mechanism: [code.claude.com/docs/en/memory.md](https://code.claude.com/docs/en/memory.md)
+- Claude Code skills: [code.claude.com/docs/en/skills.md](https://code.claude.com/docs/en/skills.md)
+- ruff: [docs.astral.sh/ruff](https://docs.astral.sh/ruff/)
+- import-linter: [import-linter.readthedocs.io](https://import-linter.readthedocs.io/)
+- gitlint: [jorisroovers.com/gitlint](https://jorisroovers.com/gitlint/)
+- uv: [docs.astral.sh/uv](https://docs.astral.sh/uv/)
+- pre-commit: [pre-commit.com](https://pre-commit.com/)
+- Conventional Commits: [conventionalcommits.org](https://www.conventionalcommits.org/)
+- GitHub Actions: [docs.github.com/en/actions](https://docs.github.com/en/actions)
--- a/docs/how-memory-works.md
+++ b/docs/how-memory-works.md
@ -0,0 +1,294 @@
+# How Memory Works
+
+How EverOS turns a stream of messages into durable, searchable memory —
+the storage stack, the path layout on disk, the write→index→read
+pipeline, and the consistency guarantees.
+
+This is the narrative companion to the reference docs: see
+[storage_layout.md](storage_layout.md) for the exact file encoding,
+[architecture.md](architecture.md) for the layer boundaries, and
+[api.md](api.md) for the HTTP contract.
+
+## Table of contents
+
+- [The storage stack](#the-storage-stack)
+- [Storage paths](#storage-paths)
+- [How a memory is born](#how-a-memory-is-born)
+- [Memory types & storage strategies](#memory-types--storage-strategies)
+- [The cascade daemon](#the-cascade-daemon)
+- [The Offline Memory Engine (OME)](#the-offline-memory-engine-ome)
+- [Consistency model](#consistency-model)
+- [Zero external services](#zero-external-services)
+- [Operating it](#operating-it)
+
+## The storage stack
+
+Three embedded pieces, each owning what it is best at. Markdown is the
+**source of truth**; the other two are **derived and rebuildable**.
+
+| Layer | Backed by | Holds | Rebuildable? |
+|---|---|---|---|
+| **Markdown + YAML frontmatter** | plain `.md` files | the memory content itself — the only portable, human-editable asset | — (it *is* the truth) |
+| **SQLite** (`aiosqlite`) | `.index/sqlite/*.db` | system state, audit log, the cascade queue, the boundary buffer, OME engine state | ✅ from markdown |
+| **LanceDB** (Arrow) | `.index/lancedb/*.lance` | vector + BM25 + scalar columns for retrieval | ✅ from markdown |
+
+!!! note "The one rule that follows from this"
+    Delete the entire `.index/` directory and **no memory is lost** — it
+    rebuilds from the `.md` tree. There is no separate "export"; the
+    markdown *is* the export. (How to trigger a rebuild:
+    [Operating it](#operating-it).)
+
+## Storage paths
+
+The default memory root is **`~/.everos/`** (override with
+`EVEROS_MEMORY__ROOT` or `[memory] root` in TOML). Configuration (the
+`.env` file) is separate from data (the memory root): the server searches
+`./.env` → `$XDG_CONFIG_HOME/everos/.env` → `~/.everos/.env`.
+
+Memory is partitioned by **`<app_id>/<project_id>`** *before* the
+user-visible directories, so different `(app, project)` spaces never share
+a directory or cross in search. The reserved id `"default"` materialises as
+`default_app` / `default_project` on disk (so a default space stays
+visually distinct from a user-named one).
+
+```
+~/.everos/                                  ← memory root (EVEROS_MEMORY__ROOT)
+├── default_app/                            ← <app_id>   ("default" → default_app)
+│   └── default_project/                    ← <project_id> ("default" → default_project)
+│       ├── users/                          ← user-visible (source of truth)
+│       │   └── <user_id>/
+│       │       ├── user.md                 single-file   (profile)
+│       │       ├── episodes/
+│       │       │   └── episode-<YYYY-MM-DD>.md      daily-log append
+│       │       ├── .atomic_facts/                   daily-log (hidden)
+│       │       │   └── atomic_fact-<YYYY-MM-DD>.md
+│       │       └── .foresights/                     daily-log (hidden)
+│       │           └── foresight-<YYYY-MM-DD>.md
+│       ├── agents/
+│       │   └── <agent_id>/
+│       │       ├── .cases/                          daily-log (hidden)
+│       │       │   └── agent_case-<YYYY-MM-DD>.md
+│       │       └── skills/                          skill-named dir
+│       │           └── skill_<name>/SKILL.md        (+ references/ scripts/)
+│       └── knowledge/                      ← shared / global (reserved)
+│
+├── .index/                                 ← system-managed, rebuildable (gitignore)
+│   ├── sqlite/
+│   │   ├── system.db                       state / audit / cascade queue (md_change_state) / buffer / LSN
+│   │   ├── ome.db                          Offline Memory Engine state
+│   │   ├── ome.aps.db                      APScheduler jobstore (split to avoid lock contention)
+│   │   └── ome.db.lock                     OME single-engine guard (portalocker)
+│   └── lancedb/
+│       └── <kind>.lance/                   one Arrow table per kind
+│
+├── ome.toml                                ← user-editable OME strategy overrides (hot-reloaded)
+└── .tmp/                                   atomic-write staging
+```
+
+!!! warning "Differences from older PRD-era docs"
+    The index dir is **`.index/`** (dot-prefixed), not `_index/`. The
+    cascade queue and LSN/audit state live in **SQLite** (`system.db`,
+    table `md_change_state`) — there is no `.cascade.log` / `.manifest.json`
+    file in the current implementation. The `<app>/<project>` nesting is
+    real and always present (`default_app/default_project` for the default
+    scope). There is **no `everos reindex` command** (see
+    [Operating it](#operating-it)).
+
+The path manager is
+[`MemoryRoot`](../src/everos/core/persistence/memory_root.py); every path
+above is a property on it. `MemoryRoot.ensure()` creates the runtime dirs
+(`.index/{sqlite,lancedb}/`, `.tmp/`) and copies the OME template to
+`ome.toml`; user-visible dirs appear on first write.
+
+## How a memory is born
+
+A message does not become memory immediately — it accumulates, a boundary
+is detected, an LLM extracts a cell, writers persist markdown, and the
+index catches up asynchronously.
+
+```
+ POST /add  ──▶  unprocessed_buffer (SQLite)        ← messages accumulate per (session, app, project)
+                      │
+                      ├─ boundary detector trips  ─┐
+ POST /flush ─────────┤  (or you force it)          │  one LLM call
+                      │                             ▼
+                      │                       extract MemCell  ──▶  memcell row (SQLite)
+                      │                             │
+                      │              ┌──────────────┴───────────────┐
+                      │              ▼                              ▼
+                      │   UserMemoryPipeline (sync)      AgentMemoryPipeline (fire-and-forget)
+                      │   writes episode .md NOW          emits AgentPipelineStarted
+                      ▼              │                              │
+   (response returns once md is on disk)                           │
+                                     ▼                              ▼
+                          ┌───────────────────  Offline Memory Engine (OME)  ───────────────────┐
+                          │  async strategies write derived .md:                                 │
+                          │  atomic_facts · foresight · user profile · agent cases · agent skills │
+                          └───────────────────────────────┬──────────────────────────────────────┘
+                                                           ▼
+                                            cascade daemon watches the .md tree
+                                                           ▼
+                                       md_change_state queue (SQLite, durable)
+                                                           ▼
+                                            rebuild LanceDB rows  ──▶  searchable
+```
+
+- **`/add`** appends messages to a per-`(session_id, app_id, project_id)`
+  buffer and returns `accumulated` (or `extracted` if the boundary tripped
+  on this call). See [api.md](api.md).
+- **`/flush`** forces the boundary now (one extraction LLM call), used at
+  the end of a chat/agent run.
+- Episode markdown is written **synchronously** — when `/flush` returns
+  `extracted`, the episode file is already on disk.
+- Everything else (atomic facts, foresight, profile, agent cases/skills)
+  is produced **asynchronously** by the OME — see
+  [the OME section](#the-offline-memory-engine-ome).
+- The **cascade daemon** turns every `.md` write into LanceDB rows so the
+  content becomes searchable.
+
+## Memory types & storage strategies
+
+Six business memory kinds today, each user- or agent-owned, each picking
+one of three on-disk patterns:
+
+| Kind | Owner | Dir / file | Strategy | Produced by |
+|---|---|---|---|---|
+| **episode** | user | `episodes/episode-<date>.md` | daily-log | extraction (sync) |
+| **atomic_fact** | user | `.atomic_facts/atomic_fact-<date>.md` (hidden) | daily-log | OME |
+| **foresight** | user | `.foresights/foresight-<date>.md` (hidden) | daily-log | OME |
+| **profile** | user | `user.md` | single-file rewrite | OME |
+| **agent_case** | agent | `.cases/agent_case-<date>.md` (hidden) | daily-log | OME |
+| **agent_skill** | agent | `skills/skill_<name>/SKILL.md` | skill-named dir | OME (clustering) |
+
+The three strategies:
+
+| Strategy | Shape | Why |
+|---|---|---|
+| **Daily-log append** | `<prefix>-<YYYY-MM-DD>.md`, one entry appended per memory | collapses thousands of per-entry files into one file per day |
+| **Single-file rewrite** | a fixed filename overwritten in place | for a single evolving document (a user/agent profile) |
+| **Skill-named dir** | one directory per skill | a skill is a richer unit (body + optional `references/` `scripts/`) |
+
+!!! note
+    The single-file writer also supports `agent.md` / `soul.md` /
+    `tools.md` / `behaviors.md`, but no shipped OME strategy produces those
+    yet — today only `user.md` is written. Detailed frontmatter and
+    entry-id encoding live in [storage_layout.md](storage_layout.md).
+
+## The cascade daemon
+
+The cascade subsystem keeps LanceDB in sync with the markdown tree. It runs
+**in-process** with the server (a coroutine started by the app lifespan),
+not as a separate OS daemon.
+
+1. A native filesystem watcher (`watchdog`: FSEvents on macOS, inotify on
+   Linux) sees a `.md` create/modify.
+2. The change is enqueued in the **`md_change_state`** table (SQLite) —
+   durable, so a crash mid-sync replays on restart.
+3. A worker drains the queue at **entry-level** granularity: it diffs the
+   file, re-embeds only changed entries (keyed by `content_sha256`), and
+   upserts the LanceDB rows.
+
+Because markdown is the source of truth, **editing a file directly is
+fully supported** — open an episode in VSCode / Obsidian / Vim, change an
+entry, save, and the daemon re-indexes just that entry. Operate the queue
+with `everos cascade` ([Operating it](#operating-it)); deeper runbook in
+[cascade_runbook.md](cascade_runbook.md).
+
+## The Offline Memory Engine (OME)
+
+Most memory kinds are **not** extracted on the request path — they are
+derived later by the OME, an in-process async strategy engine. When
+extraction carves a MemCell, it emits an event; OME strategies pick it up
+and write their markdown when ready:
+
+- `extract_atomic_facts` — single-sentence facts from an episode
+- `extract_foresight` — anticipatory notes
+- `extract_user_profile` — the aggregated `user.md`
+- `extract_agent_case` — a reusable agent trajectory (only when the cell is
+  substantive enough; thin trajectories are skipped by design)
+- `extract_agent_skill` — clusters related cases into a named skill
+
+Strategies are configurable without a code change via **`ome.toml`** at the
+memory root (hot-reloaded within ~2 s). Example — turn two off:
+
+```toml
+[strategies.extract_foresight]
+enabled = false
+
+[strategies.extract_user_profile]
+enabled = false
+```
+
+OME keeps its own state in `.index/sqlite/ome.db` (run records, counters)
+and its scheduler jobstore in `.index/sqlite/ome.aps.db` (split so the sync
+APScheduler writer and the async OME writer never contend for one file
+lock).
+
+!!! tip "Implication for clients"
+    After `/flush` returns `extracted`, the **episode** is queryable soon
+    (once cascade indexes it), but **atomic facts / profile / agent cases**
+    appear only after their OME strategy runs — typically seconds later.
+    Poll / retry if you need them immediately.
+
+## Consistency model
+
+Two paths, two guarantees:
+
+| Path | Guarantee | Detail |
+|---|---|---|
+| **Write** (`/add`, `/flush`) | **strong** | the episode `.md` is on disk before the call returns `extracted`; never blocks on LanceDB |
+| **Read** (`/search`, `/get`) | **eventual** | reads LanceDB, which lags md by the cascade processing time — sub-second typically, up to ~10–15 s under load |
+
+So a `/search` immediately after the `/flush` that produced a record may
+miss it. The markdown is durable regardless; index lag never loses data. If
+you need read-your-write, retry with backoff, or force the queue with
+`everos cascade sync`.
+
+Integrity is anchored by a few invariants (details in
+[storage_layout.md](storage_layout.md)): the frontmatter `id` /
+`entry_id` is the immutable join key; `content_sha256` decides whether an
+entry needs re-embedding; an LSN watermark (in `system.db`) orders
+rebuilds; the durable `md_change_state` queue is the replayable audit
+trail.
+
+## Zero external services
+
+No database server, message broker, or vector service to run. Vector ANN,
+full-text BM25, and scalar filtering all execute inside the **embedded
+LanceDB** engine in one query; SQLite is a local file. The whole stack is a
+single directory you can copy, back up, or check the user-visible parts of
+into git.
+
+!!! note
+    There is no automatic "grep over markdown" search fallback today — if
+    the LanceDB index is unavailable, rebuild it from markdown (it is
+    derived and disposable) rather than relying on a degraded search path.
+
+## Operating it
+
+The CLI ([cli.md](cli.md)) is intentionally small:
+
+| Command | What it does |
+|---|---|
+| `everos init` | write a starter `.env` |
+| `everos server start` | run the HTTP API (cascade + OME start with it) |
+| `everos cascade status` | queue / LSN summary |
+| `everos cascade sync` | drain the cascade queue now (force md → LanceDB) |
+| `everos cascade fix` | list failed rows / re-enqueue retryable ones |
+
+!!! warning "There is no `everos reindex` or `everos flush`"
+    - **Reindex** = the index is rebuildable: stop the server,
+      `rm -rf <memory-root>/.index/lancedb`, restart — the cascade
+      rebuilds from markdown. For an incremental catch-up, use
+      `everos cascade sync`.
+    - **Flush** is an HTTP endpoint (`POST /api/v1/memory/flush`), not a
+      CLI command — it forces *extraction* of the session buffer, which is
+      a different thing from forcing *index sync* (`cascade sync`).
+
+## References
+
+- [storage_layout.md](storage_layout.md) — exact file encoding, frontmatter
+  chassis, entry-id format, atomic-write semantics
+- [architecture.md](architecture.md) — DDD layers and dependency rules
+- [api.md](api.md) — the HTTP contract (`/add` `/flush` `/search` `/get`)
+- [cascade_runbook.md](cascade_runbook.md) — operating the sync queue
--- a/docs/index.md
+++ b/docs/index.md
@ -0,0 +1,63 @@
+# EverOS Documentation
+
+Documentation for [EverOS](../README.md) — md-first memory extraction
+framework. Organised by [Diátaxis](https://diataxis.fr/) — what kind of
+question you have determines which section to read.
+
+## Reference
+
+Technical reference: contracts, commands, schemas — read these when you
+already know what you want to do and need to know exactly how.
+
+| Doc | Purpose |
+|---|---|
+| [api.md](api.md) | HTTP API v1 reference — endpoints, request / response, error contracts |
+| [cli.md](cli.md) | `everos` CLI subcommands + env var conventions |
+| [storage_layout.md](storage_layout.md) | Memory-root tree + frontmatter chassis + EntryId encoding |
+| [prompt_slots.md](prompt_slots.md) | YamlConfigLoader + three-layer prompt override |
+
+## Explanation
+
+Design decisions and architectural concepts — read these to understand
+why the system is shaped the way it is.
+
+| Doc | Purpose |
+|---|---|
+| [overview.md](overview.md) | Project vision, scope, design philosophy |
+| [how-memory-works.md](how-memory-works.md) | Storage stack + on-disk paths + write→index→read pipeline + consistency |
+| [architecture.md](architecture.md) | DDD layered architecture + dependency rules |
+| [datetime.md](datetime.md) | Two-zone discipline — UTC at storage, display tz at boundaries |
+
+## How-to
+
+Task-driven operational guides — read these when you need to do a
+specific thing (drain a queue, recover from a stuck row, etc.).
+
+| Doc | Purpose |
+|---|---|
+| [cascade_runbook.md](cascade_runbook.md) | Cascade subsystem ops — drain queue, recover stuck rows |
+
+## Engineering / Internal
+
+For maintainers and contributors working on the framework itself,
+not for using it.
+
+| Doc | Purpose |
+|---|---|
+| [engineering.md](engineering.md) | Engineering & dev-efficiency infrastructure (CI / tooling / Claude Code) |
+
+## See also
+
+Top-level project files live next to the repo root:
+
+- [README.md](../README.md) — quick start & feature overview
+- [QUICKSTART.md](../QUICKSTART.md) — 5-minute walkthrough (install → service → search)
+- [CONTRIBUTING.md](../CONTRIBUTING.md) — how to contribute (issue-only model)
+- [CHANGELOG.md](../CHANGELOG.md) — release notes
+- [SECURITY.md](../SECURITY.md) — security policy & private vulnerability reporting
+- [CITATION.md](../CITATION.md) — academic citation info
+- [ACKNOWLEDGMENTS.md](../ACKNOWLEDGMENTS.md) — third-party acknowledgments
+
+Coding conventions and slash command workflows are auto-loaded by
+Claude Code from [.claude/rules/](../.claude/rules/) and
+[.claude/skills/](../.claude/skills/).
--- a/docs/locomo_benchmark.md
+++ b/docs/locomo_benchmark.md
@ -0,0 +1,126 @@
+# Running the LoCoMo Benchmark
+
+This guide walks through reproducing EverOS's LoCoMo retrieval scores
+locally using the `hybrid` and `agentic` search methods.
+
+## Contents
+
+- [Prerequisites](#prerequisites)
+- [1. Prepare the dataset](#1-prepare-the-dataset)
+- [2. Start the server](#2-start-the-server)
+- [3. Run `hybrid`](#3-run-hybrid)
+- [4. Run `agentic`](#4-run-agentic)
+- [5. Where the results land](#5-where-the-results-land)
+- [Notes](#notes)
+
+---
+
+## Prerequisites
+
+- Python **3.12**, [uv](https://docs.astral.sh/uv/)
+- A `.env` at the repo root with the LLM / embedding credentials EverOS
+  needs:
+  - `EVEROS_LLM__MODEL`, `EVEROS_LLM__API_KEY`, `EVEROS_LLM__BASE_URL`
+  - `EVEROS_EMBEDDING__*`
+  - `EVEROS_RERANK__*`
+  - The benchmark driver also reads `LLM_API_KEY` / `ANSWER_MODEL` /
+    `JUDGE_MODEL` for the answer + judge passes.
+
+Install the project:
+
+```bash
+uv sync
+```
+
+## 1. Prepare the dataset
+
+Place the LoCoMo file at `data/locomo10.json` (the dataset is
+distributed by the LoCoMo authors, not this repo). Override the path
+later with `--data-path` if you keep it elsewhere.
+
+## 2. Start the server
+
+```bash
+EVEROS_MEMORY__ROOT=~/.everos \
+uv run python -m everos.entrypoints.cli.main server start --port 8000
+```
+
+`EVEROS_MEMORY__ROOT` isolates one benchmark's corpus from another —
+change it (or `rm -rf` it) whenever you want a clean run.
+
+Leave the server running in one terminal; run the benchmark from
+another.
+
+## 3. Run `hybrid`
+
+Single conversation:
+
+```bash
+bash tests/run_locomo_batch.sh \
+  --conv-indices 0 \
+  --methods hybrid \
+  --base-url http://localhost:8000 \
+  --top-k 10
+```
+
+All 10 conversations, 2-way parallel:
+
+```bash
+bash tests/run_locomo_batch.sh \
+  --conv-indices 0-9 \
+  --methods hybrid \
+  --base-url http://localhost:8000 \
+  --top-k 10 \
+  --concurrency 2
+```
+
+The wrapper picks up `EVEROS_MEMORY__ROOT` from the environment so the
+cascade poll path matches the server's data root. If you set them
+differently, pass `--corpus-path` explicitly.
+
+## 4. Run `agentic`
+
+Same wrapper, swap `--methods`:
+
+```bash
+bash tests/run_locomo_batch.sh \
+  --conv-indices 0-9 \
+  --methods agentic \
+  --base-url http://localhost:8000 \
+  --top-k 10 \
+  --concurrency 2
+```
+
+You can also benchmark multiple methods in one go — they share the
+same ingested corpus:
+
+```bash
+bash tests/run_locomo_batch.sh \
+  --conv-indices 0-9 \
+  --methods hybrid,agentic \
+  --base-url http://localhost:8000 \
+  --top-k 10 \
+  --concurrency 2
+```
+
+## 5. Where the results land
+
+Default output root is `benchmark_results/run_<timestamp>/`. Override
+with `--output-root`:
+
+```
+<output_root>/
+├── conv0.json … conv9.json          # per-conv summary + per-question details
+├── conv0.log  … conv9.log           # per-conv stdout (only in --concurrency >1 mode)
+└── conv0_checkpoints/ …             # incremental search/answer/eval JSON
+```
+
+An aggregate accuracy table prints at the end of the wrapper run.
+
+## Notes
+
+- **Re-running on the same corpus**: add `--skip-add` to skip ingest and
+  reuse what's already in `~/.everos`. Useful when comparing methods
+  side by side.
+- **Judge variance**: `--judge-runs 3` runs the judge three times per
+  question and majority-votes; slower but reduces LLM-judge noise.
--- a/docs/openapi.json
+++ b/docs/openapi.json
--- a/docs/overview.md
+++ b/docs/overview.md
@ -0,0 +1,85 @@
+# EverOS — Project Overview
+
+## Vision
+
+Build an open-source Python memory framework where **AI agents' long-term memory is plain Markdown files on the user's disk**, not opaque rows in a hosted database.
+
+## Scope
+
+**In scope (v1)**:
+
+- Local deployment for personal agents or small teams
+- Conversation, workflow, agent-trace, file-knowledge → structured memory
+- Hybrid retrieval (BM25 + vector + scalar filter)
+- Cascade index sync (md edit → LanceDB sub-second)
+- Dual-track memory (user-track / agent-track)
+- Offline memory evolution (Foresight / AtomicFact / Profile / Skill)
+- CLI + HTTP API
+
+**Out of scope (v1, future v2)**:
+
+- Multi-tenant / group / community deployment (10K+ users)
+- End-to-cloud sync (planned for v2)
+- Distributed deployment / sharding
+
+## Design philosophy
+
+### 1. Markdown as Source of Truth
+
+```
+delete all LanceDB / SQLite files → can rebuild from md
+delete any md file               → memory is gone
+```
+
+User trust comes from physical visibility — the user can `cat` / `vim` / `grep` their own memory at any time.
+
+### 2. Three-piece storage with clear job boundaries
+
+| Component | Role | Does NOT do |
+|---|---|---|
+| Markdown files | Truth source — entries, frontmatter | Search (grep is degraded fallback only) |
+| SQLite | Queue, cascade audit log, sensitive data isolation | Vector / full-text |
+| LanceDB | Vector ANN + BM25 + scalar filter, single-query hybrid | Be the source of truth (loss = rebuild from md) |
+
+### 3. Algorithm-orchestration separation
+
+[`everalgo`](https://github.com/EverMind-AI/EverAlgo) (a separate library, published as the `everalgo-*` PyPI packages) holds the extraction algorithms (MemCell extraction, Episode generation, Profile evolution). EverOS calls everalgo via the PromptSlot interface; everalgo knows nothing about storage.
+
+This boundary lets the same algorithm power both this open-source lightweight version and other product forms.
+
+### 4. DDD layered architecture
+
+```
+entrypoints  →  service  →  memory  →  infra
+                              ↓
+                        component / core / config
+```
+
+Strict single-direction dependency, enforced by `import-linter` in CI.
+
+## Why src layout (`src/everos/`)
+
+- Standard PyPA project structure used when shipping to PyPI
+- Avoid namespace collision with system packages named `memory`, `infra`, etc.
+- Avoid accidental import of working-tree code in dev (PyPA recommendation)
+
+## Comparable projects (where EverOS differs)
+
+| Project | Position | Difference |
+|---|---|---|
+| [mem0](https://github.com/mem0ai/mem0) | API-first memory service | mem0 stores in vector DB; we store in md files |
+| [Letta](https://github.com/letta-ai/letta) | Agent OS w/ Core/Recall/Archival | Letta uses Postgres; we use markdown filesystem |
+| [MemOS](https://github.com/MemTensor/MemOS) | Multi-classification memory | MemOS targets enterprise; we target lightweight (single-user / small team) |
+| [memsearch](https://github.com/zilliztech/memsearch) | md-first search engine | Closest to us; we add memory extraction (not just search) |
+
+## Roadmap
+
+- **v0.1 (MVP)** — Phase 1 core loop: markdown + lancedb + cascade + episode extraction
+- **v0.2** — Full extraction pipeline (workspace / agent / knowledge), evolution framework
+- **v0.3** — Production hardening, full CLI, HTTP API, Obsidian demo
+- **v1.0** — Stable API, PyPI release, comprehensive docs
+- **v2** (future) — Edge-to-cloud sync via EverMe (separate project)
+
+## Status
+
+**Alpha — v0.1.0 in active development**. Core API may change before v1.0.
--- a/docs/prompt_slots.md
+++ b/docs/prompt_slots.md
@ -0,0 +1,111 @@
+# PromptSlot
+
+PromptSlot is the layer between the algorithm code (`everalgo`) and
+the prompts it sends to LLMs. Algorithm code receives a `PromptSlot`
+parameter; the *project* (EverOS) supplies defaults and lets operators
+override.
+
+> **Status (2026-05-07)**: the YAML loader is implemented; the higher-
+> level `PromptSlot` model + sandbox dry-run + three-layer overlay
+> resolution arrive when the memory layer ships (see Stage 2).
+
+## Three-layer overlay
+
+```
+config/prompt_slots/<name>.yaml          (Layer 1: defaults shipped with the package)
+       ↓
+~/.everos/prompt_slots/<name>.yaml       (Layer 2: app-level override; per-deployment)
+       ↓
+runtime override                         (Layer 3: per-call override; e.g. "force model X")
+```
+
+Effective prompt = layer 3 wins → layer 2 → layer 1. Layer 1 is
+loaded eagerly at startup; layer 2 is loaded on first reference (lazy);
+layer 3 is supplied at the call site.
+
+## Loader
+
+The category loader lives at
+[`src/everos/component/config/loader.py`](../src/everos/component/config/loader.py)
+as `YamlConfigLoader`:
+
+```python
+from pathlib import Path
+from everos.component.config import YamlConfigLoader
+
+loader = YamlConfigLoader(
+    root=Path("src/everos/config"),
+    categories={"prompt_slots": None},   # subdir == category name
+)
+
+# Reads <root>/prompt_slots/episode_extract.yaml → dict
+slot = loader.find("prompt_slots", "episode_extract")
+
+# Refresh after on-disk edits.
+loader.refresh()                         # drop the entire cache
+loader.refresh("prompt_slots")           # drop one category
+loader.refresh("prompt_slots", "episode_extract")  # drop one entry
+```
+
+Top-level YAML is required to be a mapping; a list / scalar root
+raises `TypeError` to fail-fast (loud, not silent).
+
+## YAML format (proposed; subject to change)
+
+```yaml
+# config/prompt_slots/episode_extract.yaml
+template: |
+  Extract a single episode from this conversation:
+  {{ memcell.text }}
+
+variables:
+  memcell: input memcell
+
+output_schema:
+  type: object
+  properties:
+    summary: { type: string }
+    participants: { type: array }
+
+llm:
+  model: gpt-4o-mini
+  temperature: 0.3
+  max_tokens: 2000
+
+validation:
+  test_cases:
+    - input: { memcell: { text: "Hi" } }
+      expected: { summary: "...", participants: [] }
+```
+
+When layer 2 supplies an override the loader will be re-pointed at
+`~/.everos/prompt_slots/`; the runtime resolution logic (currently TBD)
+sandbox-runs the merged slot before returning it.
+
+## Why YAML (not TOML)
+
+Two reasons:
+
+1. **Multiline templates** — TOML's basic-string grammar fights
+   prompt content (no easy `{{ jinja }}` variables, awkward escaping).
+   YAML's literal block scalar (`|`) preserves prompts as-is.
+2. **Comment + reference ergonomics** — operators frequently inherit
+   slots, tweak a few keys, and leave inline notes. YAML is more
+   forgiving for hand-editing.
+
+The Pydantic Settings file (`config/default.toml`) stays TOML — it's
+machine-managed and type-validated; YAML's flexibility costs more
+than it pays for that case.
+
+## Why a separate loader (not Pydantic Settings)
+
+Settings = **one** structured tree, validated at load time, tied to a
+single source of truth. PromptSlots = **many** separate templates
+discovered by name, layered per-deployment. They're different shapes;
+forcing one model on the other gets clunky.
+
+## See also
+
+- [`src/everos/component/config/loader.py`](../src/everos/component/config/loader.py)
+- [`tests/unit/test_component/test_config/test_loader.py`](../tests/unit/test_component/test_config/test_loader.py)
+- [`docs/architecture.md`](architecture.md) — layer placement
--- a/docs/storage_layout.md
+++ b/docs/storage_layout.md
@ -0,0 +1,222 @@
+# Storage Layout
+
+How `everos` lays out a memory-root on disk: directory tree, file
+naming, frontmatter chassis, and entry-id encoding.
+
+The contents are the **source of truth**; SQLite and LanceDB are
+derived indexes that can be rebuilt from markdown alone.
+
+## 1. Memory-root tree
+
+A memory-root is a single directory holding all persisted memory. The
+default location is `~/.everos/`; override via `EVEROS_MEMORY__ROOT`
+env var or `[memory] root` in the TOML config.
+
+Memory is partitioned by **`<app_id>/<project_id>`** *before* the
+user-visible scope dirs, so different `(app, project)` spaces never share
+a directory. The reserved id `"default"` materialises as `default_app` /
+`default_project` on disk. The scope is encoded **in the path**, not in
+the frontmatter (see [§3](#3-frontmatter-chassis-yaml)).
+
+```
+<memory-root>/                              default ~/.everos
+│
+├── <app_id>/                               user-visible; "default" → default_app
+│   └── <project_id>/                       "default" → default_project
+│       ├── users/
+│       │   └── <user_id>/
+│       │       ├── user.md                          single-file rewrite (profile)
+│       │       ├── episodes/                         daily-log append
+│       │       │   └── episode-<YYYY-MM-DD>.md
+│       │       ├── .atomic_facts/                    daily-log append (hidden)
+│       │       │   └── atomic_fact-<YYYY-MM-DD>.md
+│       │       └── .foresights/                      daily-log append (hidden)
+│       │           └── foresight-<YYYY-MM-DD>.md
+│       ├── agents/
+│       │   └── <agent_id>/
+│       │       ├── .cases/                           daily-log append (hidden)
+│       │       │   └── agent_case-<YYYY-MM-DD>.md
+│       │       └── skills/                           skill-named dir
+│       │           └── skill_<name>/
+│       │               ├── SKILL.md
+│       │               ├── references/               (optional)
+│       │               └── scripts/                  (optional)
+│       └── knowledge/                                user-visible (shared / global, reserved)
+│
+├── .index/                              system-managed, rebuildable (gitignore)
+│   ├── sqlite/
+│   │   ├── system.db                    state / cascade queue (md_change_state) / buffer / audit / LSN  (+ -wal / -shm)
+│   │   ├── ome.db                        Offline Memory Engine state
+│   │   ├── ome.aps.db                    APScheduler jobstore (split to avoid lock contention)
+│   │   └── ome.db.lock                   OME single-engine guard (portalocker)
+│   └── lancedb/
+│       └── <kind>.lance/                one directory per LanceDB table
+│
+├── ome.toml                             user-editable OME strategy overrides (hot-reloaded)
+└── .tmp/                                staging dir for batch / multi-step writes
+```
+
+> Cascade queue state, the LSN watermark, and the change audit all live in
+> SQLite (`system.db`, table `md_change_state`) — crash-recovery replays
+> from that durable queue, not a log file. (`MemoryRoot` also exposes a
+> `.lock` anchor for the `memory_root_lock` primitive; there is no
+> `.cascade.log` / `.manifest.json`.)
+
+The path manager is [`MemoryRoot`](../src/everos/core/persistence/memory_root.py),
+exposing every path as a property. `MemoryRoot.ensure()` creates the
+runtime-required dirs (`.index/{sqlite,lancedb}/`, `.tmp/`) and copies the
+OME template to `ome.toml`; the user-visible dirs are *not* pre-created —
+they appear on first write.
+
+> The single-file writer also supports `agent.md` / `soul.md` / `tools.md`
+> / `behaviors.md`, but no shipped strategy produces those today — only
+> `user.md` is written. `memcell` is a SQLite-only kind (the boundary
+> ledger); it has no markdown file.
+
+## 2. Three storage strategies
+
+Each business memory kind picks one of three on-disk patterns:
+
+| Strategy | Filename | Mutation | Examples |
+|---|---|---|---|
+| **Daily-log append** | `<FILE_PREFIX>-<YYYY-MM-DD>.md` under `<DIR_NAME>/` | append entries | episode / atomic_fact / foresight / agent_case |
+| **Skill-named dir** | `skills/skill_<name>/SKILL.md` (+ `references/` `scripts/`) | overwrite the file | agent skills (procedural memory) |
+| **Single-file rewrite** | `user.md` (writer also supports `agent.md` / `soul.md` / `tools.md` / `behaviors.md`, not yet produced) | overwrite the file | user profile |
+
+Markdown IO primitives live in
+[`core/persistence/markdown/`](../src/everos/core/persistence/markdown/);
+business-aware writers live in
+[`infra/persistence/markdown/writers/`](../src/everos/infra/persistence/markdown/writers/)
+and pick the right strategy via a base class.
+
+To add a new memory kind, define its per-kind frontmatter schema under
+[`infra/persistence/markdown/mds/`](../src/everos/infra/persistence/markdown/mds/)
+and add a matching writer/reader pair under
+[`writers/`](../src/everos/infra/persistence/markdown/writers/) and
+[`readers/`](../src/everos/infra/persistence/markdown/readers/).
+
+## 3. Frontmatter chassis (YAML)
+
+Every markdown file carries a YAML frontmatter block at the top:
+
+```markdown
+---
+id: episode_log_alice_2026-06-01
+type: episode_daily
+file_type: episode_daily
+schema_version: 1
+user_id: alice
+track: user
+date: '2026-06-01'
+entry_count: 11
+last_appended_at: '2026-06-01T09:12:13+00:00'
+---
+<!-- entry:ep_20260601_00000001 -->
+...content...
+<!-- /entry:ep_20260601_00000001 -->
+```
+
+Scope (`app_id` / `project_id`) is **not** a frontmatter field — it is
+carried by the `<app>/<project>` path segments and recovered by the
+cascade path parser. The frontmatter only holds the file-level owner
+(`user_id` / `agent_id`) and `track`.
+
+The chassis lives in [`core/persistence/markdown/frontmatter.py`](../src/everos/core/persistence/markdown/frontmatter.py)
+(Pydantic v2):
+
+```
+BaseFrontmatter            id / type / schema_version + SCOPE_DIR ClassVar
+   ├─ UserScopedFrontmatter   + user_id / track="user" + SCOPE_DIR="users"
+   └─ AgentScopedFrontmatter  + agent_id / track="agent" + SCOPE_DIR="agents"
+```
+
+Concrete business schemas subclass one of the scope mixins and add
+per-kind fields plus three more ClassVars that drive path resolution
+ entry-id assembly:
+
+```python
+class EpisodeDailyFrontmatter(DailyLogPathMixin, UserScopedFrontmatter):
+    ENTRY_ID_PREFIX: ClassVar[str] = "ep"
+    DIR_NAME: ClassVar[str] = "episodes"
+    FILE_PREFIX: ClassVar[str] = "episode"
+    type: Literal["episode_daily"] = "episode_daily"
+    date: dt.date
+    entry_count: int = 0
+    last_appended_at: dt.datetime | None = None
+```
+
+## 4. Entry-id encoding
+
+Inside daily-log files each entry is bracketed by HTML-comment markers
+so the raw markdown stays clean for human readers:
+
+```
+<!-- entry:<entry_id> -->
+...content...
+<!-- /entry:<entry_id> -->
+```
+
+`<entry_id>` is `<prefix>_<YYYYMMDD>_<NNNNNNNN>` (8-digit sequence),
+e.g. `ep_20260601_00000001`:
+
+| Segment | Source |
+|---|---|
+| `prefix` | `Frontmatter.ENTRY_ID_PREFIX` (declared by the schema subclass) |
+| `<YYYYMMDD>` | The daily-log file's date bucket |
+| `NNNNNNNN` | Per-file sequence, 8-digit zero-padded, restarts at `00000001` each day per scope |
+
+Implementation: [`core/persistence/markdown/entries.py`](../src/everos/core/persistence/markdown/entries.py)
+(`EntryId.parse / format / next_for`).
+
+> **File-level seq, not global**: the same `ep_20260601_00000001` may
+> appear across two different `user_id`s (each user has its own daily file).
+> Cross-table joins must therefore key on **`(scope_id, entry_id)`**
+> rather than `entry_id` alone — see SQLite/LanceDB tables that follow.
+
+## 5. SQLite + LanceDB derived indexes
+
+```
+.index/
+├── sqlite/
+│   └── system.db          state / audit log / task queue / LSN watermark
+│                           + per-kind business state tables (composite key)
+└── lancedb/
+    └── <kind>.lance/      one Arrow-based table per kind
+                            stores text / vector / tags / metadata
+```
+
+- **SQLite** schema lives in
+  [`infra/persistence/sqlite/tables/`](../src/everos/infra/persistence/sqlite/tables/);
+  every business table that joins back to markdown declares a
+  `UniqueConstraint("user_id", "entry_id")` (or `agent_id` symmetric).
+- **LanceDB** schemas live in
+  [`infra/persistence/lancedb/tables/`](../src/everos/infra/persistence/lancedb/tables/);
+  `Vector(N)` dimension matches the embedding model output.
+
+Both layers are **fully derivable from markdown** — wipe `.index/`
+and the in-process cascade subsystem re-builds everything by scanning the
+user-visible tree (the durable `md_change_state` SQLite queue covers
+crash-recovery replay).
+
+## 6. Atomic write semantics
+
+`MarkdownWriter` uses a same-directory temp file
+(`.<name>.tmp.<uuid>`) + `os.replace` for atomicity. Keeping the temp
+file in the same directory guarantees `os.replace` is atomic on POSIX
+(the rename is only atomic within a single filesystem).
+
+`MarkdownWriter.append_entry` reads → merges frontmatter →
+appends an entry block → atomic write back. The caller passes a full
+`EntryId` (built via `EntryId.next_for(prefix, date, current_count)`);
+this primitive is **schema-agnostic** — field-level semantics
+(`entry_count` / `last_appended_at`) are a business writer's job
+(see `BaseDailyAppender._frontmatter_updates` in
+[`infra/persistence/markdown/writers/base.py`](../src/everos/infra/persistence/markdown/writers/base.py)).
+
+## 7. References
+
+- Code:
+  - [`core/persistence/memory_root.py`](../src/everos/core/persistence/memory_root.py) — memory-root resolution
+  - [`core/persistence/markdown/`](../src/everos/core/persistence/markdown/) — schema-agnostic read/write chassis
+  - [`infra/persistence/markdown/mds/`](../src/everos/infra/persistence/markdown/mds/) — per-kind frontmatter schemas
+  - [`infra/persistence/{markdown,sqlite,lancedb}/`](../src/everos/infra/persistence/) — business-aware adapters