chore: initialize EverOS 1.0.0

md-first memory extraction framework for AI agents.

Markdown is the single source of truth; SQLite holds state and LanceDB
provides the rebuildable vector + BM25 + scalar index. The codebase follows
a single-direction DDD layering (entrypoints -> service -> memory -> infra,
with component / core / config cross-cutting) enforced by import-linter.

Engineering surface:
- Coding conventions in .claude/rules/ (path-scoped) and workflows in
  .claude/skills/ (/commit, /new-branch, /pr).
- GitHub Actions CI runs make lint + test + integration; pre-commit mirrors
  the gates locally (ruff, hygiene hooks, gitlint commit-msg).
- Commit messages follow Conventional Commits, enforced by gitlint.
- make lint also enforces datetime two-zone discipline and OpenAPI drift.
This commit is contained in:
Elliot Chen
2026-06-05 22:35:51 +08:00
commit 518b8eca85
636 changed files with 160553 additions and 0 deletions

View File

@ -0,0 +1,33 @@
{
"_doc": "Agent-mode end-to-end fixture. 23 items across 2 task threads: (1) production latency diagnosis with rollback, (2) DB connection-pool exhaustion investigation. Includes ToolCallRequest + ToolCallResult interleaved with user/assistant text turns, parallel tool calls, one assistant turn with empty content (only tool_calls), one extended chain-of-thought reasoning, and one tool failure + retry in task 2. Run with EVEROS_MEMORIZE__MODE=agent. Expected boundary detector cuts ~2 cells aligned with the two task threads; tool items fold into the cell containing the preceding chat turn.",
"session_id_hint": "e2e_agent",
"messages": [
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747396800000, "content": "Production API p99 latency spiked from 280ms to 2.5s starting around 21:00 UTC. Customer reports rolling in. Can you investigate?"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396820000, "content": "I'll pull the recent deploys and check for correlation with the spike timestamp.", "tool_calls": [{"id": "call_001", "type": "function", "function": {"name": "list_recent_deploys", "arguments": "{\"hours_back\": 4, \"service\": \"api-gateway\"}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747396825000, "tool_call_id": "call_001", "content": "[{\"deploy_id\": \"d-9f3a\", \"service\": \"api-gateway\", \"deployed_at\": \"2026-05-16T20:47:12Z\", \"author\": \"bob\", \"commit_sha\": \"a1b2c3d\", \"diff_summary\": \"refactor: switch upstream client to keepalive pool\"}, {\"deploy_id\": \"d-9f29\", \"service\": \"api-gateway\", \"deployed_at\": \"2026-05-16T15:12:03Z\", \"author\": \"carol\", \"commit_sha\": \"e4f5g6h\", \"diff_summary\": \"chore: bump otel-sdk to 1.31\"}]"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396870000, "content": "", "tool_calls": [{"id": "call_002", "type": "function", "function": {"name": "get_deploy_diff", "arguments": "{\"deploy_id\": \"d-9f3a\", \"files\": [\"upstream/client.go\", \"config/pool.yaml\"]}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747396880000, "tool_call_id": "call_002", "content": "config/pool.yaml: max_idle_conns_per_host changed from 32 → 4. upstream/client.go: new transport with IdleConnTimeout=10s (was 90s). The default Go transport now reuses far fewer connections."},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396930000, "content": "Two changes in d-9f3a stand out, and the interaction between them likely explains the cliff. First, max_idle_conns_per_host went from 32 down to 4 — that alone caps how many warm sockets the upstream client can pin per backend. Second, IdleConnTimeout dropped from 90s to 10s, so even the few warm sockets we do keep get torn down between requests during normal traffic gaps. Combined effect: under sustained load p99 of ~250rps × ~6 backends, the pool is permanently saturated and every spillover request pays a TCP+TLS handshake (~800ms-1.5s on the affected region) instead of reusing a connection. That maps directly to the 280ms → 2.5s latency jump. Before I file a rollback recommendation I want to confirm the prediction against live pool metrics — if reuse_ratio collapsed and new-connection rate is up >10x, that's enough signal.", "tool_calls": [{"id": "call_003", "type": "function", "function": {"name": "query_pool_metrics", "arguments": "{\"service\": \"api-gateway\", \"window_min\": 15}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747396938000, "tool_call_id": "call_003", "content": "{\"idle_conns_avg\": 1.2, \"new_conn_rate_per_s\": 184, \"new_conn_rate_baseline\": 6, \"reuse_ratio\": 0.04, \"reuse_ratio_baseline\": 0.91}"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747396980000, "content": "Confirmed: reuse_ratio collapsed from 0.91 to 0.04, new-connection rate is 30x baseline. This is the root cause. Recommend rollback of d-9f3a — the keepalive refactor needs a config review before re-deploying. Want me to initiate rollback?"},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747397040000, "content": "Yes, roll it back. Page Bob async so he sees the rollback when he comes online tomorrow — no need to wake him up."},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747397070000, "content": "Initiating rollback to d-9f29 and queuing a non-paging Slack DM for Bob.", "tool_calls": [{"id": "call_004", "type": "function", "function": {"name": "rollback_deploy", "arguments": "{\"service\": \"api-gateway\", \"to_deploy_id\": \"d-9f29\", \"reason\": \"p99 latency regression from keepalive pool refactor (max_idle_conns 32->4)\"}"}}, {"id": "call_005", "type": "function", "function": {"name": "slack_dm", "arguments": "{\"user\": \"bob\", \"channel\": \"async\", \"message\": \"Rolled back api-gateway d-9f3a at 21:04 UTC — keepalive pool change caused p99 spike. Logs in #incident-2026-05-16.\"}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747397115000, "tool_call_id": "call_004", "content": "{\"status\": \"ok\", \"rolled_back_to\": \"d-9f29\", \"completed_at\": \"2026-05-16T21:05:01Z\", \"verification\": {\"p99_5min_after\": \"312ms\", \"reuse_ratio_after\": 0.88}}"},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747397116000, "tool_call_id": "call_005", "content": "{\"status\": \"queued\", \"message_id\": \"slack_msg_xyz\"}"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747397140000, "content": "Rollback complete. p99 dropped back to 312ms within 5 minutes, reuse ratio recovered to 0.88. Slack DM queued for Bob. Incident-2026-05-16 is the channel I logged it to."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747400400000, "content": "Different issue — we're seeing connection pool exhaustion warnings on the auth-service Postgres pool the last hour. Can you check whether it's a leak or just a sustained traffic spike?"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400420000, "content": "Pulling pool stats and recent traffic shape.", "tool_calls": [{"id": "call_006", "type": "function", "function": {"name": "query_db_pool", "arguments": "{\"service\": \"auth-service\", \"window_min\": 60}"}}, {"id": "call_007", "type": "function", "function": {"name": "query_request_rate", "arguments": "{\"service\": \"auth-service\", \"window_min\": 60}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747400430000, "tool_call_id": "call_006", "content": "{\"max_conns\": 50, \"in_use_now\": 49, \"in_use_p99_1h\": 50, \"idle_p99_1h\": 1, \"wait_queue_depth_p99\": 18, \"long_running_queries_above_30s\": 12}"},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747400431000, "tool_call_id": "call_007", "content": "{\"rps_now\": 142, \"rps_baseline\": 138, \"rps_p99_1h\": 151}"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400470000, "content": "Traffic is flat (rps 142 vs baseline 138), but 12 long-running queries above 30s and wait queue depth p99 of 18. This is a leak / unbounded query, not a traffic surge. Let me find the offending queries.", "tool_calls": [{"id": "call_008", "type": "function", "function": {"name": "list_long_running_queries", "arguments": "{\"service\": \"auth-service\", \"min_duration_s\": 30, \"window_min\": 60}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747400478000, "tool_call_id": "call_008", "content": "{\"error\": \"timeout\", \"detail\": \"query against pg_stat_activity exceeded 15s while pool was saturated; retry with a narrower window\"}"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400490000, "content": "Tool itself is starved by the same pool exhaustion — fair enough. Retrying against the last 10 minutes only so pg_stat_activity has a tighter slice to scan.", "tool_calls": [{"id": "call_008b", "type": "function", "function": {"name": "list_long_running_queries", "arguments": "{\"service\": \"auth-service\", \"min_duration_s\": 30, \"window_min\": 10}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747400499000, "tool_call_id": "call_008b", "content": "[{\"query_sha\": \"q-71f\", \"sql_truncated\": \"SELECT u.* FROM users u JOIN sessions s ON s.user_id = u.id WHERE s.expired_at > now() - interval '30 day'\", \"avg_duration_s\": 84, \"calls_last_hour\": 11, \"missing_index\": \"sessions.expired_at\"}]"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400510000, "content": "Single offender — query q-71f is doing a full sessions-table scan because sessions.expired_at is unindexed. Each call takes ~84s and holds a connection. 11 calls in the last hour saturate the pool. Recommend: (1) add the index now (concurrent), (2) add a query timeout cap on this endpoint long-term. Want me to add the index?"},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747400560000, "content": "Yes please add the index. Use CREATE INDEX CONCURRENTLY so we don't block writes. Then post in #incident-2026-05-16 with the runbook entry."},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400590000, "content": "On it. Creating the concurrent index and posting the runbook update.", "tool_calls": [{"id": "call_009", "type": "function", "function": {"name": "execute_sql", "arguments": "{\"db\": \"auth-service\", \"sql\": \"CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_sessions_expired_at ON sessions(expired_at)\", \"timeout_s\": 600}"}}]},
{"sender_id": "tool_runner", "sender_name": null, "role": "tool", "timestamp": 1747400650000, "tool_call_id": "call_009", "content": "{\"status\": \"ok\", \"duration_s\": 47, \"rows_indexed\": 1841229}"},
{"sender_id": "ops_agent", "sender_name": "OpsAgent", "role": "assistant", "timestamp": 1747400720000, "content": "Index created in 47s over 1.84M rows. Query q-71f should drop from ~84s to sub-second. Pool will drain in the next minute as long-running queries finish. Posted runbook to #incident-2026-05-16."}
]
}

View File

@ -0,0 +1,31 @@
{
"_doc": "Chat-mode end-to-end fixture. 22 messages across 3 topics + a multi-user handoff. Run with EVEROS_MEMORIZE__MODE=chat. Expected boundary detector cuts ~3-4 cells (Python bug → weekend plans → cross-team meeting).",
"session_id_hint": "e2e_chat",
"messages": [
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747396800000, "content": "I'm getting a weird KeyError on a dict comprehension when the source list has None entries. The traceback points at the dict literal, not the comprehension itself."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747396830000, "content": "Can you share a minimal repro? Something like the comprehension expression plus 2-3 sample input items. I want to see whether the None is on the key side or the value side."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747396890000, "content": "Sure: `{r['id']: r['name'] for r in rows}` where rows comes from a left-join and sometimes the right side returns None. So r is None for those rows."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747396920000, "content": "Got it. The None is the row itself, not a field. `None['id']` raises TypeError actually, but if you preprocess the rows to dicts upstream, the missing key on a partial dict would be KeyError. Either way, filter before the comprehension: `{r['id']: r['name'] for r in rows if r is not None and 'id' in r}`."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747396980000, "content": "That worked, thanks. Also occasionally the SQL itself wraps the row in a tuple instead of dict — pyodbc behaviour. Got bitten by it last quarter too."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747397010000, "content": "pyodbc returns Row objects by default; you'd need `cursor.execute(...).fetchall()` then iterate. Or set `cursor.row_factory = lambda c, r: dict(zip([col[0] for col in c.description], r))` to normalise upstream."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747400400000, "content": "Switching topics — any weekend plans? I'm thinking of trying the new ramen place on Carmine Street, supposedly the tonkotsu is decent."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747400430000, "content": "Tonkotsu sounds good. If you go Saturday before 1pm you'll skip the queue. There's also a coffee place two doors down that does decent cold brew if you want a pre-meal stop."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747400490000, "content": "Cold brew yes. Bob mentioned wanting to come too, do you think the ramen place takes reservations for three?"},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747400520000, "content": "Their site says walk-in only on weekends. For three people the wait is usually 25-40 minutes around noon — manageable but worth knowing."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747400580000, "content": "OK we'll show up at 11:30 then. Will send Bob the address."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747400610000, "content": "Sensible. Reminder it's cash-only for the first 6 months apparently — saw that on a Reddit thread last week."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747407600000, "content": "One more thing — the Q3 revenue review is next Thursday at 2pm, who from engineering needs to attend? I want to make sure platform team is represented since we're presenting the infra cost reductions."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747407630000, "content": "Standard attendees are eng director + relevant TL. For platform cost reductions, your eng lead Carol should be there since she owns the runbook. Add Dan as a backup in case Carol's on PTO."},
{"sender_id": "u_alice", "sender_name": "Alice", "role": "user", "timestamp": 1747407690000, "content": "Good call. Carol mentioned PTO that week actually. Let me ping Dan to confirm he can do the slides handoff."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747407720000, "content": "If Dan has bandwidth, also ask him to pre-share the deck Monday so finance has time to review the cost numbers before the meeting. Saves a lot of meeting time clarifying figures."},
{"sender_id": "u_bob", "sender_name": "Bob", "role": "user", "timestamp": 1747411200000, "content": "Hey, Alice forwarded me about the ramen plan and the Thursday meeting. I can do both but the Thursday slot conflicts with the SRE oncall handoff at 1:45 — anyone we can swap with?"},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747411230000, "content": "Erin usually does the Wednesday → Thursday handoff for SRE. She owes you one from last cycle. Ask her to take the 1:45 slot and you can ack the comms async at 1:30."},
{"sender_id": "u_bob", "sender_name": "Bob", "role": "user", "timestamp": 1747411290000, "content": "Erin's good. I'll ping her on Slack today. About the ramen — Alice said 11:30 Saturday, that works for me, walking distance from my place."},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747411320000, "content": "Great. Just remind Alice it's cash-only for now — she'll forget otherwise based on past experience."},
{"sender_id": "u_bob", "sender_name": "Bob", "role": "user", "timestamp": 1747411380000, "content": "Ha, true. I'll grab some cash on the way. Also can you remind me about the Q3 revenue deck — when does it need to be done by, EOD Tuesday?"},
{"sender_id": "assistant", "sender_name": "Bot", "role": "assistant", "timestamp": 1747411410000, "content": "Yes — Dan agreed to pre-share Monday, so practical deadline is EOD Tuesday for finance to review Wednesday. After Wednesday review they may ask for last-minute number tweaks; build slack into Thursday morning for that."}
]
}