update memory gateway

This commit is contained in:
2026-04-30 16:09:28 +08:00
parent e6b1520bce
commit ba84b1ddb3
98 changed files with 1341 additions and 6783 deletions

475
README.md
View File

@ -1,166 +1,99 @@
# SOC Memory POC # Memory Gateway
面向 SOC case 研判辅助场景的记忆系统 POC。这个项目不是泛化记忆平台而是验证 AI agent 在处理钓鱼邮件、O365 异常登录等告警时,能否稳定获得更好的历史 case、KB / Playbook、Obsidian 研判笔记和可沉淀结论 Memory Gateway 是一个通用记忆网关,用于给 AI agent / harness 提供统一的记忆检索、文档上传、LLM 总结和知识沉淀能力
当前项目阶段:**最小可运行 POC / Hermes 集成验证阶段** 它的定位不是某个单一业务场景的垂直应用,而是一个可复用的本地 memory/context gateway上层 agent 通过 REST、MCP 或 Hermes skill 调用它,底层由 OpenViking 承载 memory/resource由 Obsidian 承载人工可维护的 Markdown 知识
## 当前已经完成什么 ## 当前能力
### 1. 总体设计与文档 - 搜索 OpenViking memory / resource。
- 写入普通 memory。
- 写入结构化 resource。
- 对任意文本调用 LLM 总结,并按需沉淀到 OpenViking。
- 上传文档,使用 MarkItDown 转 Markdown。
- 将上传文档保存到 Obsidian vault。
- 将文档摘要和结构化 artifact 写入 OpenViking knowledge。
- 给 Hermes 提供通用 `memory-gateway` skill。
已完成 SOC 记忆系统的核心设计文档: ## 架构
- [docs/architecture.md](/home/tom/soc_memory_poc/docs/architecture.md):整体架构、模块边界、数据流
- [docs/poc-scope.md](/home/tom/soc_memory_poc/docs/poc-scope.md):第一阶段 POC 范围
- [docs/data-model.md](/home/tom/soc_memory_poc/docs/data-model.md)SOC memory 数据模型
- [docs/namespaces.md](/home/tom/soc_memory_poc/docs/namespaces.md)OpenViking namespace / URI 设计
- [docs/sample-data-spec.md](/home/tom/soc_memory_poc/docs/sample-data-spec.md)mock case / KB 样本规范
- [docs/hermes-demo-prompts.md](/home/tom/soc_memory_poc/docs/hermes-demo-prompts.md)Hermes demo 输入样例
### 2. Mock 数据集
当前没有真实 SOC 数据,所以项目先构造了两类典型场景的数据:
- 钓鱼邮件4 个历史 case
- O365 异常登录5 个历史 case
- KB / Playbook7 个条目
- normalized case9 个
- normalized KB / Playbook7 个
数据目录:
```text ```text
evaluation/datasets/ Agent / Harness / CLI
├── mock_cases/ -> Memory Gateway REST / MCP
├── mock_kb/ -> OpenViking memory / resource
├── normalized_cases/ -> Obsidian markdown vault
└── normalized_kb/ -> OpenAI-compatible LLM summary
``` ```
这些数据用于验证检索、研判、Obsidian note 生成和 Hermes skill 的完整链路。 ## 目录结构
### 3. Normalize Pipeline
已完成基础数据标准化脚本:
- [pipeline/transforms/normalize_case.py](/home/tom/soc_memory_poc/pipeline/transforms/normalize_case.py):把 mock / 原始 case 转成统一 case memory 格式
- [pipeline/transforms/normalize_kb.py](/home/tom/soc_memory_poc/pipeline/transforms/normalize_kb.py):把 KB / Playbook 转成统一 knowledge memory 格式
- [pipeline/jobs/ingest_case.py](/home/tom/soc_memory_poc/pipeline/jobs/ingest_case.py):批量生成 normalized case
- [pipeline/jobs/ingest_kb.py](/home/tom/soc_memory_poc/pipeline/jobs/ingest_kb.py):批量生成 normalized KB / Playbook
### 4. Memory Gateway + OpenViking 接入
[Memory Gateway](/home/tom/soc_memory_poc/memory_gateway/server.py) 已经可以作为统一入口访问 OpenViking
- REST `/health`
- REST `/api/search`
- REST `/api/memories`
- REST `/api/resources`
- MCP `tools/list`
- MCP `search`
- MCP `add_memory`
- MCP `add_resource`
- API Key 校验已生效
- FastAPI lifespan 已挂载,启动时会检查 OpenViking 健康状态
OpenViking resource URI 当前采用:
```text ```text
viking://resources/soc-memory-poc/case/<scenario>/<case_id>.json memory-gateway/
viking://resources/soc-memory-poc/knowledge/<doc_type>/<doc_id>.json ├── memory_gateway/ # Gateway 服务代码
│ ├── server.py # REST / MCP 接口
│ ├── openviking_client.py # OpenViking client
│ ├── llm.py # OpenAI-compatible LLM summary
│ ├── document_ingest.py # MarkItDown + Obsidian write helpers
│ ├── config.py
│ └── types.py
├── integrations/hermes/
│ └── memory-gateway/ # 通用 Hermes skill
├── obsidian-vault/
│ ├── 01_Knowledge/Uploaded/ # 上传文档转成的 Markdown
│ └── 05_Templates/ # 通用知识模板
├── tests/
├── config.example.yaml
└── pyproject.toml
``` ```
### 5. Skills ## 环境
当前已实现 3 个项目内 skill 使用当前虚拟环境
- [retrieve_context_skill](/home/tom/soc_memory_poc/skills/retrieve_context_skill/retrieve_context.py):支持本地 normalized 数据检索和 OpenViking 检索
- [commit_memory_skill](/home/tom/soc_memory_poc/skills/commit_memory_skill/commit_to_openviking.py):把 normalized case / KB 写入 OpenViking resource
- [summarize_case_skill](/home/tom/soc_memory_poc/skills/summarize_case_skill/generate_case_note.py):从 normalized case 生成 Obsidian case note并可用 OpenViking 检索结果增强关联内容
### 6. Obsidian Vault
已创建 Obsidian vault 骨架和模板:
```text
obsidian-vault/
├── 02_Cases/
│ ├── phishing/
│ └── o365_suspicious_login/
└── 05_Templates/
```
当前已生成 9 篇 case note覆盖钓鱼邮件和 O365 异常登录两类场景。
模板包括:
- [case-note-template.md](/home/tom/soc_memory_poc/obsidian-vault/05_Templates/case-note-template.md)
- [playbook-template.md](/home/tom/soc_memory_poc/obsidian-vault/05_Templates/playbook-template.md)
- [report-summary-template.md](/home/tom/soc_memory_poc/obsidian-vault/05_Templates/report-summary-template.md)
### 7. Hermes Agent 集成
已在本机 Hermes skill 目录创建 `soc-memory-poc` skill并在仓库中保留了一份可版本化副本
- 本机 Hermes 实际加载路径:`/home/tom/.hermes/skills/soc-memory-poc/`
- 仓库副本路径:`integrations/hermes/soc-memory-poc/`
本机 Hermes skill 文件结构:
```text
/home/tom/.hermes/skills/soc-memory-poc/
├── SKILL.md
└── scripts/
├── search_context.py
├── search_obsidian_docs.py
├── triage_alert.py
├── triage_email.py
├── triage_from_text.py
├── generate_case_note.py
└── commit_case_memory.py
```
当前 Hermes 可通过该 skill 完成:
- 输入结构化告警或原始邮件文本
- 自动抽取 sender、subject、attachment、URL、IP、host、user 等关键信息
- 查询 OpenViking 中相似历史 case
- 查询 OpenViking 中相关 KB / Playbook
- 查询 Obsidian 中相关 case note
- 输出包含 `研判结果``关键证据``关联 Memory Retrieval``关联 Obsidian 文档``建议动作` 的研判报告
- 从 normalized case 生成 Obsidian case note
## 当前还没有完成什么
当前项目还不能算生产可用,主要缺口如下:
- 还没有接入真实 SOC 数据源,例如 SIEM、EDR、邮件网关、ticket system、情报平台、月报、PO、历史报告。
- 还没有自动 Obsidian sync。现在 Obsidian 新增或修改 md 不会自动总结并写入 OpenViking。
- 还没有实现 EverMemOS worker。长期记忆抽取、合并、衰减、演化目前仍停留在设计和目录占位。
- 检索排序仍是 POC 级别。当前可用,但还没有基于真实 SOC 数据做 rerank、误报模式识别、字段权重优化。
- 评估闭环还不完整。已有 mock 数据,但还缺少批量 evaluation scripts、命中率统计、人工满意度记录和对比实验。
- 安全治理还未生产化。缺少真实环境需要的权限隔离、审计日志、敏感字段脱敏、租户隔离、数据保留策略。
- Agent 写回策略还比较保守。当前支持提交 normalized artifact但还没有完善的“高价值记忆判定 -> 审核 -> 写回 -> 去重”工作流。
## 如何启动
使用已有环境:
```bash ```bash
cd /home/tom/soc_memory_poc cd /home/tom/memory-gateway
source /home/tom/OpenViking/.venv/bin/activate source /home/tom/OpenViking/.venv/bin/activate
``` ```
本地配置文件:
```text
/home/tom/memory-gateway/config.yaml
```
关键配置:
```yaml
memory:
default_namespace: memory-gateway
llm:
base_url: https://oai.bwgdi.com/v1
api_key: <local secret, git ignored>
model: MiniMaxAI
obsidian:
vault_path: /home/tom/memory-gateway/obsidian-vault
knowledge_dir: 01_Knowledge/Uploaded
```
`config.yaml` 已被 `.gitignore` 忽略,不会提交密钥。
## 启动
OpenViking 需要先运行在 `127.0.0.1:1933`
```bash
source /home/tom/OpenViking/.venv/bin/activate
openviking-server --host 127.0.0.1 --port 1933
```
启动 Memory Gateway 启动 Memory Gateway
```bash ```bash
python -m memory_gateway.server --config /home/tom/soc_memory_poc/config.yaml cd /home/tom/memory-gateway
``` source /home/tom/OpenViking/.venv/bin/activate
python -m memory_gateway.server --config /home/tom/memory-gateway/config.yaml
默认监听:
```text
http://127.0.0.1:1934
``` ```
健康检查: 健康检查:
@ -169,173 +102,181 @@ http://127.0.0.1:1934
curl http://127.0.0.1:1934/health curl http://127.0.0.1:1934/health
``` ```
如果 `1934` 端口被占用,先检查已有进程,不要重复启动: ## REST 接口
### `GET /health`
检查 Gateway 和 OpenViking 状态。
### `POST /api/search`
搜索 OpenViking memory / resource。
```bash ```bash
ss -ltnp | grep 1934 curl -X POST http://127.0.0.1:1934/api/search \
-H "Content-Type: application/json" \
-d '{
"query": "memory gateway document upload summary",
"uri": "viking://resources",
"limit": 5
}'
``` ```
## 如何重新生成样本数据 ### `POST /api/memory`
写入普通 memory。
```bash ```bash
cd /home/tom/soc_memory_poc curl -X POST http://127.0.0.1:1934/api/memory \
source /home/tom/OpenViking/.venv/bin/activate -H "Content-Type: application/json" \
-d '{
PYTHONPATH=/home/tom/soc_memory_poc python pipeline/jobs/ingest_case.py \ "namespace": "memory-gateway",
--input-dir evaluation/datasets/mock_cases \ "memory_type": "preference",
--output-dir evaluation/datasets/normalized_cases "content": "The user prefers concise technical summaries."
}'
PYTHONPATH=/home/tom/soc_memory_poc python pipeline/jobs/ingest_kb.py \
--input-dir evaluation/datasets/mock_kb \
--output-dir evaluation/datasets/normalized_kb
``` ```
## 如何写入 OpenViking ### `POST /api/resource`
写入 case 写入结构化 resource。
```bash ```bash
cd /home/tom/soc_memory_poc curl -X POST http://127.0.0.1:1934/api/resource \
source /home/tom/OpenViking/.venv/bin/activate -H "Content-Type: application/json" \
-d '{
PYTHONPATH=/home/tom/soc_memory_poc python skills/commit_memory_skill/commit_to_openviking.py \ "uri": "viking://resources/memory-gateway/knowledge/example.json",
--directory evaluation/datasets/normalized_cases "resource_type": "json",
"content": "{\"title\":\"example\"}"
}'
``` ```
写入 KB / Playbook ### `POST /api/summary`
调用 LLM 总结任意文本,并按需沉淀到 OpenViking。
```bash ```bash
PYTHONPATH=/home/tom/soc_memory_poc python skills/commit_memory_skill/commit_to_openviking.py \ curl -X POST http://127.0.0.1:1934/api/summary \
--directory evaluation/datasets/normalized_kb -H "Content-Type: application/json" \
-d '{
"title": "Project decision summary",
"content": "需要总结和沉淀的内容...",
"namespace": "memory-gateway",
"memory_type": "decision",
"tags": ["project", "decision"],
"persist_as": "resource"
}'
``` ```
## 如何测试 `persist_as` 支持:`none``memory``resource``both`
运行单元测试: ### `POST /api/knowledge/upload`
上传文档MarkItDown 转 Markdown保存到 ObsidianLLM 总结后写入 OpenViking knowledge。
```bash ```bash
cd /home/tom/soc_memory_poc curl -X POST http://127.0.0.1:1934/api/knowledge/upload \
source /home/tom/OpenViking/.venv/bin/activate -F "file=@/path/to/document.pdf" \
PYTHONPATH=/home/tom/soc_memory_poc pytest -q -F "title=Design Notes" \
-F "namespace=memory-gateway" \
-F "knowledge_type=design_doc" \
-F "tags=project,design,reference" \
-F "persist_as=resource"
``` ```
运行 Python 编译检查 默认保存到
```bash
python -m compileall -q memory_gateway pipeline skills tests
python -m py_compile /home/tom/.hermes/skills/soc-memory-poc/scripts/*.py
```
运行一次核心 triage smoke test
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
```
预期输出应包含:
- `研判结果`
- `关键证据`
- `关联 Memory Retrieval`
- `关联 Obsidian 文档`
- `建议动作`
## 如何用 Hermes 演示
确保 Memory Gateway 已启动,然后执行:
```bash
/home/tom/.local/bin/hermes chat --quiet --skills soc-memory-poc -q "Use the soc-memory-poc skill. Triage this email alert and include Memory Retrieval and Obsidian references.
From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
```
演示重点不是让 Hermes 凭空研判,而是展示它会先调用 SOC Memory POC skill检索历史 case、KB / Playbook 和 Obsidian note再把这些证据带入最终研判。
## 如何生成 Obsidian Case Note
```bash
cd /home/tom/soc_memory_poc
source /home/tom/OpenViking/.venv/bin/activate
PYTHONPATH=/home/tom/soc_memory_poc python skills/summarize_case_skill/generate_case_note.py \
--input evaluation/datasets/normalized_cases/CASE-2026-0001.json \
--enrich-from-openviking \
--top-k 3
```
输出文件位于:
```text ```text
obsidian-vault/02_Cases/<scenario>/ obsidian-vault/01_Knowledge/Uploaded/
``` ```
## 当前效果 ## MCP Tools
当前 POC 已经可以完成一条基础 SOC 研判辅助链路 `POST /mcp/rpc` 支持
1. analyst 或 Hermes 输入告警 / 邮件内容。 - `search`
2. Hermes skill 抽取关键字段并判断场景。 - `add_memory`
3. 通过 Memory Gateway 查询 OpenViking 中的相似历史 case。 - `add_resource`
4. 通过 Memory Gateway 查询相关 KB / Playbook。 - `commit_summary`
5. 本地检索 Obsidian case note。 - `get_status`
6. 输出带证据来源的研判报告。 - `list_memories`
7. 对成熟 case可生成 Obsidian case note并可将 normalized artifact 写回 OpenViking。 - `list_resources`
在真实 SOC 场景中,当前版本适合做 demo、POC 验证和离线评估;不建议直接作为生产系统接入真实告警闭环。 ## Hermes Skill
## 下一阶段开发计划 通用 Hermes skill
### P0补齐真实输入与评估闭环 ```text
/home/tom/.hermes/skills/memory-gateway/
```
- 设计真实 ticket / alert 的最小字段映射,不先接全量日志。 仓库副本:
- 增加 `evaluation/scripts/`,批量跑 mock case统计 case 命中率、KB 命中率、输出完整率。
- 为 phishing / O365 两类场景定义人工标注答案,用于评估检索质量。
### P1增强 Obsidian 与 OpenViking 的同步 ```text
integrations/hermes/memory-gateway/
```
- 新增 Obsidian md 扫描脚本,读取新增 / 修改 note。 主要脚本:
- 从 Obsidian note 中抽取 title、tags、scenario、case_id、summary、IOC、verdict。
- 生成 normalized knowledge / case artifact。
- 经过去重和质量阈值后写入 OpenViking。
### P1完善 Hermes 研判工作流 ```text
scripts/retrieve_memory.py
scripts/commit_summary.py
scripts/upload_knowledge.py
scripts/search_obsidian.py
```
- 让 Hermes 输出更稳定地引用具体 `case_id``doc_id`、Obsidian 相对路径。 检索记忆:
- 增加 O365 异常登录专用 triage prompt 和 demo。
- 增加 case 结束后的“是否沉淀为记忆”判断模板。
### P2实现 EverMemOS 长期记忆整理层 ```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/retrieve_memory.py \
--query "document upload summary memory gateway" \
--uri viking://resources \
--limit 5
```
- 从 case note、process summary、agent final report 中抽取长期可复用记忆。 总结沉淀:
- 对重复 case pattern 做合并。
- 对低价值、过时、误导性记忆做衰减或清理。
- 将高价值 pattern 回灌到 OpenViking必要时生成 Obsidian 摘要 note。
### P2生产化安全与治理 ```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/commit_summary.py \
--title "Reusable conclusion" \
--namespace memory-gateway \
--memory-type decision \
--tag project \
--persist-as resource \
--text "最终结论或可复用知识..."
```
- 增加 API Key / token 的部署规范。 上传知识:
- 增加数据脱敏策略。
- 增加写入审计日志。
- 增加 namespace / workspace / agent 级别隔离。
## 推荐落地顺序 ```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/upload_knowledge.py \
--file /path/to/document.md \
--title "Knowledge note" \
--namespace memory-gateway \
--knowledge-type reference \
--tags project,reference \
--persist-as resource
```
1. 固定 phishing 和 O365 两类场景的 normalized schema。 ## 测试
2. 建立 20 到 50 条脱敏样本,先覆盖真实高频告警。
3. 完成批量 evaluation明确检索命中率和输出质量基线。
4. 接入 Obsidian -> OpenViking 的半自动同步。
5. 接入一个真实 agent 工作流,例如 Hermes triage。
6. 再实现 EverMemOS 长期记忆整理,不要过早做复杂长期记忆平台。
## License ```bash
cd /home/tom/memory-gateway
source /home/tom/OpenViking/.venv/bin/activate
PYTHONPATH=/home/tom/memory-gateway pytest -q
```
MIT 当前测试覆盖:
- API key 校验。
- MCP tools/list。
- OpenViking search 透传。
- LLM summary artifact 构建。
- document upload -> markdown -> Obsidian -> OpenViking resource。
## 下一步
- 在 Gateway 层加强 `/api/search` 的 URI prefix 过滤和去重。
-`/api/knowledge/upload` 增加文件大小限制、类型白名单和 dry-run。
- 增加 Obsidian -> OpenViking 增量同步脚本。
- 给 Memory Gateway skill 增加更稳定的“回答时引用 memory/resource/Obsidian note”输出约束。
- 增加更多文档解析格式和异常处理测试。

File diff suppressed because it is too large Load Diff

View File

@ -22,7 +22,7 @@ openviking:
# 记忆配置 # 记忆配置
memory: memory:
# 默认命名空间 # 默认命名空间
default_namespace: "soc" default_namespace: "memory-gateway"
# 默认搜索返回数量 # 默认搜索返回数量
search_limit: 10 search_limit: 10
@ -30,3 +30,17 @@ memory:
logging: logging:
level: "INFO" level: "INFO"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s" format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
# LLM 配置:用于 /api/summary 和 /api/knowledge/upload
# 兼容 OpenAI Chat Completions API也可指向本地 vLLM / Ollama OpenAI-compatible endpoint。
llm:
base_url: "https://api.openai.com/v1"
api_key: ""
model: ""
timeout: 60
max_input_chars: 24000
# Obsidian 配置:用于 /api/knowledge/upload 保存 Markdown 笔记
obsidian:
vault_path: "/home/tom/memory-gateway/obsidian-vault"
knowledge_dir: "01_Knowledge/Uploaded"

View File

@ -1,190 +0,0 @@
# Architecture
## 整体目标
构建一个面向 SOC case 研判辅助的记忆系统 POC用于提升 AI agent 在以下环节的效果:
- 告警研判
- 历史 case 检索
- 上下文补全
- 结论生成
- 高价值记忆沉淀
## 总体架构图
```text
┌────────────────────────────┐
│ 知识源 / 数据源 │
│ KB / Playbook / 月报 / 报告 │
│ Ticket / Intel / 历史 Case │
└─────────────┬──────────────┘
│ ingest / normalize
┌──────────────────────────────┐
│ Pipeline 层 │
│ connectors / transforms / jobs│
└─────────────┬────────────────┘
│ extracted inputs
┌──────────────────────────────┐
│ Skills 层 │
│ ingest / classify / retrieve │
│ summarize / commit / prune │
└───────┬─────────────┬────────┘
│ │
query/write │ │ write notes / long-term
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ Memory Gateway │ │ Obsidian Vault │
│ MCP / REST / Auth │ │ Human-maintained │
└─────────┬──────────┘ └────────────────────┘
┌────────────────────┐
│ OpenViking │
│ context / memory │
│ resources / skills │
└─────────┬──────────┘
┌─────────┴──────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Session / Online │ │ EverMemOS │
│ retrieval │ │ long-term memory │
└──────────────────┘ └──────────────────┘
┌────────────────────┐
│ AI Agent / Harness │
│ Nanobot / Hermes │
│ OpenClaw / others │
└────────────────────┘
```
## 分层说明
### 1. 知识源层
外部系统和已有资料:
- KB
- Playbook
- 月报
- 报告
- Ticket system
- 情报系统
- 历史 case
特点:
- 来源多样
- 结构不一致
- 不能直接全部当记忆使用
### 2. Pipeline 层
负责:
- 数据接入
- 格式标准化
- 提取元数据
- 过滤噪声
边界:
- 不做最终检索
- 不做最终长期沉淀判断
### 3. Skills 层
负责:
- 抽取高价值记忆
- 分类为 knowledge / case / process / session
- 检索相关上下文
- 生成 case 总结
- 写回 OpenViking / Obsidian / EverMemOS
这是整套系统的流程编排层。
### 4. Memory Gateway 层
负责:
- 给 AI agent 提供统一入口
- 屏蔽 OpenViking 细节
- 提供 MCP / REST 接口
- 处理鉴权和协议兼容
### 5. OpenViking 统一上下文层
负责:
- 保存 memory
- 保存 resources
- 组织 skills
- 按 namespace 管理不同类型上下文
### 6. Obsidian 层
负责人工可维护的知识沉淀:
- 高质量 case note
- playbook
- 月报 / 报告摘要
- 关键实体说明
### 7. EverMemOS 层
负责后台长期记忆整理:
- episode -> long-term memory
- 去重
- 合并
- 更新
- 衰减
## 多 Agent 共享方式
多 agent 不直接彼此共享临时内存,而是通过统一上下文层协作:
- 公共稳定知识走 `soc/knowledge`
- 历史案例走 `soc/case`
- 当前任务走 `session/<session_id>`
- agent 私有偏好走 `agent/<agent_id>`
这样可以做到:
- 公共知识共享
- 当前会话隔离
- 不同 agent 框架可复用同一体系
## 检索质量控制原则
为避免“所有东西全塞进去”导致检索质量下降,必须坚持:
- 原始资料不直接全部进入长期记忆
- 只保留高价值摘要、模式、结论、证据
- session / process memory 默认短期保留
- 历史 case 和 playbook 优先于泛知识
- Obsidian 只放人工维护内容,不放全量原文
## 第一阶段默认方案
第一阶段推荐组合:
- OpenViking统一 context / memory 层
- Memory Gateway统一访问入口
- Skills检索、总结、沉淀
- Obsidian人工可维护知识沉淀
- EverMemOS后台长期记忆整理
推荐原因:
- 模块边界清晰
- 最适合 POC 小步快跑
- 最容易控制系统复杂度
- 最容易对不同 agent 框架复用

View File

@ -1,138 +0,0 @@
# Data Model
## 目标
这个数据模型面向 SOC case 研判辅助场景,不追求全量归档,而强调高价值记忆抽取。
## 数据分层
### 1. Knowledge Memory
适用内容:
- KB
- Playbook
- 月报摘要
- 报告摘要
- PO
- 检测规则说明
特点:
- 偏稳定、可复用
- 面向方法、知识、模式
- 适合长期保存
建议字段:
- `id`
- `title`
- `source_type`
- `summary`
- `tags`
- `entities`
- `ttp`
- `confidence`
- `updated_at`
### 2. Case Memory
适用内容:
- 历史 case
- 最终研判结论
- 关键证据
- 误报 / 真报模式
- 处置建议
特点:
- 面向具体案例
- 适合检索相似 case
- 是 POC 阶段最重要的数据层
建议字段:
- `case_id`
- `title`
- `alert_type`
- `verdict`
- `summary`
- `key_evidence`
- `entities`
- `detection_logic`
- `lessons_learned`
- `source_links`
### 3. Process Memory
适用内容:
- agent 中间步骤
- 工具调用结果
- 推理路径
- 临时分析结论
特点:
- 生命周期短
- 价值不均匀
- 只应抽取高价值部分转化为长期记忆
建议字段:
- `session_id`
- `step_id`
- `tool_name`
- `observation`
- `intermediate_conclusion`
- `value_score`
- `timestamp`
### 4. Profile / Preference Memory
适用内容:
- analyst 偏好
- 默认输出风格
- 常用研判路径
特点:
- 数量小
- 用于个性化辅助
建议字段:
- `user_id`
- `preference_type`
- `value`
- `scope`
### 5. Session Memory
适用内容:
- 当前 case 的上下文
- 当前轮对话、当前任务的临时缓存
特点:
- 强时效
- 默认不长期保留
建议字段:
- `session_id`
- `task_id`
- `active_entities`
- `active_hypotheses`
- `recent_observations`
- `expires_at`
## 设计原则
- 原始材料不直接当记忆
- 只沉淀对后续研判有帮助的高价值信息
- Process Memory 默认短期,经过抽取后才升级为长期记忆
- Knowledge 与 Case 是 POC 阶段优先建设的两层

View File

@ -1,91 +0,0 @@
# Hermes Demo Prompts
## Recommended: Raw Email / Freeform Alert
Use this when you want to show that Hermes does not need a rigid input schema. The `soc-memory-poc` skill should route the content through `triage_email.py`, extract useful fields, retrieve memory, search Obsidian, and return the fixed SOC triage sections.
```text
Use the soc-memory-poc skill. Triage this email alert and include Memory Retrieval and Obsidian references.
From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12.
Return exactly these sections:
研判结果
关键证据
关联 Memory Retrieval
关联 Obsidian 文档
建议动作
```
Equivalent direct script check:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
```
## Structured Phishing Alert
Use this when you want maximum repeatability with explicit fields.
```text
Use the soc-memory-poc skill. Treat the following as a structured SOC alert and use the preferred Scheme A path.
Scenario: phishing
Alert type: mail_suspicious_attachment
User: alice@corp.example
Host: FIN-LAPTOP-12
Sender: billing@vendor-payments.com
Subject: Invoice overdue notice
Attachment: invoice_review.html
URL: https://vendor-payments-login.com/review
IP: 198.51.100.20
Known facts:
- DMARC failed
- User may have clicked the link
Return exactly these sections:
研判结果
关键证据
关联 Memory Retrieval
关联 Obsidian 文档
建议动作
```
## Structured O365 Alert
```text
Use the soc-memory-poc skill. Treat the following as a structured SOC alert and use the preferred Scheme A path.
Scenario: o365_suspicious_login
Alert type: azuread_impossible_travel
User: david@corp.example
Host: WS-DAVID-01
IP: 203.0.113.150
Known facts:
- Impossible travel observed between Shanghai and Amsterdam within 15 minutes
- MFA fatigue occurred before final success
- User denied initiating the overseas login
- Inbox rule creation was observed after login
Return exactly these sections:
研判结果
关键证据
关联 Memory Retrieval
关联 Obsidian 文档
建议动作
```
## Generate Case Note
```text
Use the soc-memory-poc skill. Generate an Obsidian case note for /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0003.json with OpenViking enrichment, then tell me the output path and confirm whether the note was written successfully.
```

View File

@ -1,120 +0,0 @@
# OpenViking Namespaces
## 目标
通过明确 namespace 和 URI 组织方式,把 OpenViking 用作统一的 context / memory gateway。
## 推荐 namespace
### 1. `soc/knowledge`
用于稳定知识:
- KB
- Playbook
- 月报摘要
- 报告摘要
- PO
示例:
- `viking://soc/knowledge/kb/phishing-mail-header-analysis`
- `viking://soc/knowledge/playbook/o365-suspicious-login`
### 2. `soc/case`
用于历史案例和 case 结论:
- 历史 case
- 真报 / 误报模式
- 关键证据
示例:
- `viking://soc/case/true-positive/case-2026-00128`
- `viking://soc/case/false-positive/case-2026-00072`
### 3. `soc/process`
用于流程级记忆:
- agent 中间分析
- 工具输出摘要
- 可复用的中间判断模式
示例:
- `viking://soc/process/session-abc123/step-04`
### 4. `session/<session_id>`
用于当前任务的临时上下文。
示例:
- `viking://session/incident-20260421-001/context`
- `viking://session/incident-20260421-001/tools`
### 5. `agent/<agent_id>`
用于 agent 级别的私有或半私有上下文。
示例:
- `viking://agent/hermes-soc/default`
- `viking://agent/nanobot-soc/preferences`
### 6. `user/<user_id>`
用于 analyst 偏好、展示习惯等小规模 profile 信息。
示例:
- `viking://user/alice/preferences`
## 资源组织建议
### memory
适用于:
- 高价值摘要
- case 结论
- pattern
- lesson learned
### resources
适用于:
- 原始附件链接
- 外部文档引用
- Obsidian note 路径
- ticket / report / intel 引用
### skills
适用于:
- 检索 skill
- 记忆抽取 skill
- case 沉淀 skill
## 检索顺序建议
当前 case 发生检索时,建议按以下顺序召回:
1. `session/<session_id>`
2. `soc/case`
3. `soc/knowledge`
4. `agent/<agent_id>`
5. `user/<user_id>`
这样可以优先保证“当前上下文”和“历史相似 case”的相关性不让通用知识淹没 case 信号。
## 约束建议
- 不要把所有原始资料直接写入 `soc/knowledge`
- `soc/process` 默认应该设置清理策略
- 长期稳定内容再写入 `soc/knowledge``soc/case`
- Obsidian 只存人工可维护的摘要和结构化沉淀,不做全量原文仓

View File

@ -1,130 +0,0 @@
# POC Scope
## 目标
第一阶段 POC 只验证一件事:
**高价值记忆抽取 + 相似 case / 知识召回,是否能有效提升 SOC case 研判效率和质量。**
## POC 范围
### 聚焦 case 类型
建议只选 1 到 2 类典型场景:
1. 钓鱼邮件 / 恶意附件
2. O365 异常登录 / 疑似账号被盗
原因:
- 数据可获得性较高
- 历史 case 重用价值高
- playbook / KB 通常较完整
- 便于定义“相似 case 命中率”
## 第一阶段只接入的数据
### 必接
- 历史 case
- KB
- Playbook
### 可选接入
- 月报摘要
- 报告摘要
### 暂不接入
- ticket system 双向同步
- 全量情报系统自动拉取
- 全量报告原文
- 大规模 process trace 持久化
- analyst 偏好个性化
## 第一阶段要做的能力
### 必做
- 历史 case 导入
- KB / Playbook 导入
- 高价值信息抽取
- 基于当前 case 的相关上下文检索
- case 总结沉淀
- 结构化写回 OpenViking
- 生成 Obsidian case note
### 第二阶段再做
- EverMemOS 长期整理自动化
- 更复杂的去重和衰减
- 多数据源自动同步
- 多 agent 协同策略优化
## 不做的事情
为了保证 POC 可落地,第一阶段明确不做:
- 泛化的企业级记忆平台
- 所有原始数据全量入库
- 全量全文检索系统重构
- 覆盖所有 SOC 告警类型
- 复杂权限系统
- 完整的在线标注平台
## 交付物
第一阶段建议交付:
1. 可运行的 memory gateway
2. 一批可导入的历史 case 与 KB / Playbook 样本
3. 最小的 ingest / retrieve / summarize / commit 闭环
4. Obsidian 模板和样例 note
5. 一份 baseline 与 POC 对比评估结果
## 2 到 4 周实施建议
### 第 1 周
- 冻结 POC 范围
- 整理样本数据
- 完成数据模型与 namespace 约定
- 建好 Obsidian 模板
### 第 2 周
- 完成历史 case / KB 导入脚本
- 完成 `retrieve_context_skill`
- 接通 OpenViking 的 `soc/case``soc/knowledge`
### 第 3 周
- 完成 `summarize_case_skill`
- 完成 `commit_memory_skill`
- 输出标准 case note 到 Obsidian
### 第 4 周
- 跑评估脚本
- 做人工 review
- 收敛下一阶段需求
## 评估指标
建议至少跟踪以下指标:
- 相似 case 命中率
- 检索上下文相关性
- 平均研判时间
- 最终结论准确率
- 人工满意度
## 验收标准
POC 第一阶段可以认为成功,当同时满足:
- 能稳定召回相关历史 case 或知识
- 能辅助生成结构化 case note
- 人工评估认为上下文质量有明显提升
- 没有因为“塞入太多资料”导致检索明显劣化

View File

@ -1,188 +0,0 @@
# Sample Data Spec
## 目标
这个文档定义 SOC Memory POC 在无真实数据阶段使用的 mock 数据格式,用于:
- 验证 ingestion pipeline
- 验证标准化脚本
- 验证 context retrieval
- 验证 case summary 与 memory commit 流程
当前只覆盖两类场景:
- 钓鱼邮件
- O365 异常登录 / 疑似账号被盗
## 目录约定
```text
evaluation/datasets/
├── mock_cases/
│ ├── phishing/
│ └── o365_suspicious_login/
└── mock_kb/
├── playbooks/
├── kb/
└── reports/
```
## Mock Case 原始格式
每个 case 使用一个 JSON 文件,文件名建议:
```text
<case_id>.json
```
### 字段定义
| 字段 | 类型 | 必填 | 说明 |
|---|---|---:|---|
| `case_id` | string | 是 | case 唯一 ID |
| `title` | string | 是 | 简短标题 |
| `scenario` | string | 是 | `phishing``o365_suspicious_login` |
| `alert_type` | string | 是 | 告警类型 |
| `severity` | string | 是 | `low` / `medium` / `high` / `critical` |
| `status` | string | 是 | `confirmed` / `false_positive` / `pending` |
| `time_window` | object | 是 | 开始和结束时间 |
| `summary` | string | 是 | 一句话摘要 |
| `alert_source` | string | 是 | 告警来源系统 |
| `entities` | object | 是 | 关键实体 |
| `observables` | object | 否 | IOC/可观测对象 |
| `evidence` | array | 是 | 关键证据列表 |
| `investigation_steps` | array | 是 | 关键调查步骤 |
| `conclusion` | object | 是 | 研判结论 |
| `related_refs` | object | 否 | 相关 KB / playbook / case |
| `lessons_learned` | array | 否 | 复用经验 |
| `tags` | array | 否 | 标签 |
### 示例骨架
```json
{
"case_id": "CASE-2026-0001",
"title": "Potential phishing email targeting finance user",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "high",
"status": "confirmed",
"time_window": {
"start": "2026-04-01T09:10:00+08:00",
"end": "2026-04-01T11:30:00+08:00"
},
"summary": "Finance user received an invoice-themed phishing email with a malicious HTML attachment.",
"alert_source": "Secure Email Gateway",
"entities": {
"users": ["alice@corp.example"],
"hosts": ["FIN-LAPTOP-12"],
"mailboxes": ["alice@corp.example"]
},
"observables": {
"sender_emails": ["billing@vendor-payments.com"],
"domains": ["vendor-payments.com"],
"urls": ["https://vendor-payments-login.com/review"],
"hashes": ["sha256:..."],
"ips": ["198.51.100.20"]
},
"evidence": [
"The sender domain was newly observed and failed DMARC.",
"The attachment redirected the user to a credential harvesting page."
],
"investigation_steps": [
"Validate sender reputation and authentication results.",
"Detonate attachment in sandbox.",
"Check click telemetry and account sign-in logs."
],
"conclusion": {
"verdict": "true_positive",
"reason": "Multiple aligned phishing indicators and confirmed click behavior.",
"recommended_actions": [
"Reset the impacted account password.",
"Block the sender domain and landing URL."
]
},
"related_refs": {
"playbooks": ["PB-PHISH-001"],
"kb": ["KB-PHISH-HEADER-CHECK"],
"cases": []
},
"lessons_learned": [
"Invoice-themed phishing remains effective against finance users."
],
"tags": ["phishing", "email", "credential-harvest"]
}
```
## Mock KB / Playbook 原始格式
每个知识条目使用一个 JSON 文件,文件名建议:
```text
<doc_id>.json
```
### 字段定义
| 字段 | 类型 | 必填 | 说明 |
|---|---|---:|---|
| `doc_id` | string | 是 | 文档唯一 ID |
| `doc_type` | string | 是 | `kb` / `playbook` / `report_summary` |
| `title` | string | 是 | 标题 |
| `scenario` | string | 是 | 适用场景 |
| `summary` | string | 是 | 核心摘要 |
| `applicability` | array | 否 | 适用条件 |
| `key_points` | array | 是 | 核心知识点 |
| `investigation_guidance` | array | 否 | 调查建议 |
| `decision_points` | array | 否 | 判定关键点 |
| `related_entities` | object | 否 | 相关实体/TTP/IOC |
| `related_refs` | object | 否 | 相关文档 |
| `tags` | array | 否 | 标签 |
| `updated_at` | string | 否 | 更新时间 |
## 标准化输出目标
### 标准化后的 Case 结构
标准化脚本输出建议字段:
- `id`
- `memory_type` = `case`
- `scenario`
- `title`
- `abstract`
- `verdict`
- `severity`
- `entities`
- `observables`
- `evidence`
- `patterns`
- `related_refs`
- `source_path`
- `tags`
### 标准化后的 KB 结构
标准化脚本输出建议字段:
- `id`
- `memory_type` = `knowledge`
- `doc_type`
- `scenario`
- `title`
- `abstract`
- `key_points`
- `investigation_guidance`
- `decision_points`
- `related_refs`
- `source_path`
- `tags`
## 检索测试建议
在 mock 数据阶段,优先验证:
- 钓鱼 case 是否能召回 phishing playbook 和相似 phishing case
- O365 登录异常 case 是否能召回登录异常 KB 和相似 case
- 真报与误报 case 是否能被区分并保留不同模式
- 召回结果是否包含关键 evidence / decision points

View File

@ -1,68 +0,0 @@
# System Positioning
## 当前项目定位
`memory_gateway` 不是完整的 SOC 记忆系统,而是整套方案里的统一上下文入口层。
它当前承担的职责是:
- 为 AI agent 提供统一的 MCP / REST 访问入口
- 将检索和写入请求转发给 OpenViking
- 提供基础鉴权、协议兼容和网关能力
- 作为多 agent 共享记忆体系的最薄接入层
它不直接承担以下职责:
- 原始知识源的批量导入
- 高价值记忆抽取和筛选
- Obsidian Vault 的人工知识沉淀
- EverMemOS 的长期记忆整理与演化
- 评估数据集与实验流程管理
## 在整套 SOC 记忆系统中的位置
```text
SOC 数据源
KB / Playbook / 月报 / 报告 / Ticket / Intel / 历史 Case
|
v
Skills / Pipeline
ingest / extract / classify / summarize / commit / prune
|
v
memory_gateway
统一入口层MCP / REST / Auth / Routing
|
v
OpenViking
统一 context / memory / resource / skill 层
| |
v v
Obsidian Vault EverMemOS
人工沉淀层 长期整理层
```
## 下一阶段模块建议
建议把后续 POC 能力分成以下模块:
- `docs/`
保存系统设计、数据模型、命名空间规范
- `poc/skills/`
保存检索、抽取、沉淀相关的 skills
- `poc/pipeline/`
保存接入 ticket、intel、历史 case 的导入流程
- `poc/obsidian-vault/`
保存人工维护知识和 case note 模板
- `poc/evermemos/`
保存长期记忆整理逻辑和策略
- `poc/evaluation/`
保存数据集、评估脚本和结果
## 当前仓库边界建议
建议继续把本仓库控制在“网关项目”边界内:
- 保留服务入口、OpenViking 接入、配置、协议、测试
- 新增系统设计文档、POC 骨架目录
- 不建议继续堆积大量业务规则、海量导入脚本、Vault 内容本体

View File

@ -1,12 +0,0 @@
# Evaluation
这个目录用于保存 POC 评估相关内容。
建议评估指标:
- 相似 case 命中率
- 研判时间缩短比例
- 结论准确率
- 人工满意度
建议 POC 先聚焦 1 到 2 类 SOC case。

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-1001",
"title": "Impossible travel login followed by MFA prompt fatigue",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_impossible_travel",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-02T22:10:00+08:00", "end": "2026-04-02T23:30:00+08:00"},
"summary": "User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["david@corp.example"], "hosts": ["WS-DAVID-01"], "mailboxes": ["david@corp.example"]},
"observables": {"ips": ["203.0.113.150", "198.51.100.61"], "domains": [], "urls": [], "hashes": []},
"evidence": ["Two successful sign-ins from geographically impossible locations within 15 minutes.", "MFA challenge volume increased abnormally before final success.", "User confirmed they did not initiate overseas login."],
"investigation_steps": ["Review sign-in logs and device IDs.", "Check MFA event sequence.", "Validate user travel status with manager."],
"conclusion": {"verdict": "true_positive", "reason": "Impossible travel plus user denial and MFA fatigue pattern.", "recommended_actions": ["Revoke sessions and reset credentials.", "Review mailbox rules and app consent."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-IMPOSSIBLE-TRAVEL", "KB-O365-MFA-FATIGUE"], "cases": []},
"lessons_learned": ["Impossible travel needs to be combined with user confirmation and MFA telemetry."],
"tags": ["o365", "login", "impossible-travel", "mfa-fatigue"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-1002",
"title": "Legacy protocol sign-in from unfamiliar IP blocked by policy",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_legacy_auth_attempt",
"severity": "medium",
"status": "false_positive",
"time_window": {"start": "2026-04-04T07:50:00+08:00", "end": "2026-04-04T08:10:00+08:00"},
"summary": "Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["svc-migration@corp.example"], "hosts": [], "mailboxes": ["svc-migration@corp.example"]},
"observables": {"ips": ["192.0.2.24"], "domains": [], "urls": [], "hashes": []},
"evidence": ["The account is a known migration service account.", "Source IP matched approved cloud migration vendor range.", "No successful sign-in occurred due to policy block."],
"investigation_steps": ["Review service account inventory.", "Check change ticket for migration activity.", "Validate source IP against vendor allowlist."],
"conclusion": {"verdict": "false_positive", "reason": "Expected migration tool behavior with policy block and approved change window.", "recommended_actions": ["Tune alert suppression for approved migration windows."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-LEGACY-AUTH"], "cases": []},
"lessons_learned": ["Service account context is essential before escalating legacy auth alerts."],
"tags": ["o365", "login", "false-positive", "legacy-auth"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-1003",
"title": "Suspicious inbox rule creation after successful foreign login",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_suspicious_inbox_rule_after_login",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-06T19:20:00+08:00", "end": "2026-04-06T20:45:00+08:00"},
"summary": "An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.",
"alert_source": "Microsoft Defender for Cloud Apps",
"entities": {"users": ["emma@corp.example"], "hosts": ["WS-EMMA-07"], "mailboxes": ["emma@corp.example"]},
"observables": {"ips": ["198.51.100.98"], "domains": [], "urls": [], "hashes": []},
"evidence": ["Successful sign-in from untrusted ASN.", "Inbox rule moved wire transfer emails to RSS Feeds folder.", "Mailbox audit showed rule creation minutes after login."],
"investigation_steps": ["Review mailbox audit logs.", "Export suspicious inbox rules.", "Check for OAuth app consent and forwarding settings."],
"conclusion": {"verdict": "true_positive", "reason": "Account compromise indicators plus malicious inbox rule persistence.", "recommended_actions": ["Remove malicious rules.", "Reset account and revoke refresh tokens."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-INBOX-RULE-ABUSE", "KB-O365-IMPOSSIBLE-TRAVEL"], "cases": []},
"lessons_learned": ["Mailbox rule inspection should be default for suspicious O365 login cases."],
"tags": ["o365", "login", "inbox-rule", "account-compromise"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-1004",
"title": "Multiple failed logins from residential proxy but no successful access",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_password_spray_attempt",
"severity": "medium",
"status": "pending",
"time_window": {"start": "2026-04-08T02:00:00+08:00", "end": "2026-04-08T03:10:00+08:00"},
"summary": "Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["frank@corp.example"], "hosts": [], "mailboxes": ["frank@corp.example"]},
"observables": {"ips": ["203.0.113.201"], "domains": [], "urls": [], "hashes": []},
"evidence": ["High-volume failed attempts over a short period.", "Source IP attributed to a residential proxy provider.", "No matching successful sign-in or MFA event found."],
"investigation_steps": ["Check password spray pattern across tenant.", "Confirm user recent password reset history.", "Review conditional access outcomes."],
"conclusion": {"verdict": "uncertain", "reason": "Suspicious authentication pattern but no confirmed access or downstream activity.", "recommended_actions": ["Monitor account closely.", "Consider temporary sign-in risk remediation."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-IMPOSSIBLE-TRAVEL"], "cases": []},
"lessons_learned": ["Pending cases should still capture reusable spray indicators without overcommitting verdict."],
"tags": ["o365", "login", "password-spray", "pending"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-1005",
"title": "Traveling executive triggered impossible travel but activity was legitimate",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_impossible_travel",
"severity": "medium",
"status": "false_positive",
"time_window": {"start": "2026-04-09T09:00:00+08:00", "end": "2026-04-09T09:40:00+08:00"},
"summary": "Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["grace@corp.example"], "hosts": ["VIP-LAPTOP-01"], "mailboxes": ["grace@corp.example"]},
"observables": {"ips": ["192.0.2.90", "203.0.113.77"], "domains": [], "urls": [], "hashes": []},
"evidence": ["Approved travel request existed.", "One login originated from corporate VPN exit node.", "Device and user agent were consistent with known user profile."],
"investigation_steps": ["Check travel approval and itinerary.", "Review VPN egress mapping.", "Compare user agent and managed device posture."],
"conclusion": {"verdict": "false_positive", "reason": "Legitimate travel combined with VPN routing caused impossible travel signal.", "recommended_actions": ["Document travel context and improve analyst checklist."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-IMPOSSIBLE-TRAVEL"], "cases": []},
"lessons_learned": ["Impossible travel should consider approved travel and VPN topology before escalation."],
"tags": ["o365", "login", "false-positive", "travel"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-0001",
"title": "Finance user received invoice-themed phishing email",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-01T09:10:00+08:00", "end": "2026-04-01T11:30:00+08:00"},
"summary": "Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["alice@corp.example"], "hosts": ["FIN-LAPTOP-12"], "mailboxes": ["alice@corp.example"]},
"observables": {"sender_emails": ["billing@vendor-payments.com"], "domains": ["vendor-payments.com", "vendor-payments-login.com"], "urls": ["https://vendor-payments-login.com/review"], "ips": ["198.51.100.20"], "hashes": ["sha256:phish0001"]},
"evidence": ["Sender domain was newly observed and failed DMARC.", "Attachment redirected to a fake Microsoft 365 login page.", "User clicked the link before mail quarantine completed."],
"investigation_steps": ["Validate sender authentication results.", "Detonate HTML attachment in sandbox.", "Check mailbox click telemetry and account sign-in logs."],
"conclusion": {"verdict": "true_positive", "reason": "Aligned phishing indicators and confirmed click behavior.", "recommended_actions": ["Reset impacted account password.", "Block sender domain and landing URL.", "Hunt for similar emails in tenant."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-PHISH-HEADER-CHECK", "KB-CRED-HARVEST-PATTERNS"], "cases": []},
"lessons_learned": ["Invoice lure remains effective against finance users."],
"tags": ["phishing", "email", "credential-harvest", "finance"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-0002",
"title": "Payroll notification email flagged but determined benign",
"scenario": "phishing",
"alert_type": "mail_suspicious_link",
"severity": "medium",
"status": "false_positive",
"time_window": {"start": "2026-04-03T08:40:00+08:00", "end": "2026-04-03T09:20:00+08:00"},
"summary": "Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["bob@corp.example"], "hosts": ["HR-LAPTOP-03"], "mailboxes": ["bob@corp.example"]},
"observables": {"sender_emails": ["notify@hr-vendor.example"], "domains": ["hr-vendor.example"], "urls": ["https://bit.ly/hr-portal-example"], "ips": [], "hashes": []},
"evidence": ["Sender domain aligned with SPF and DKIM.", "Destination domain matched approved supplier inventory.", "No credential prompt anomaly observed."],
"investigation_steps": ["Expand shortened URL.", "Validate vendor domain against allowlist.", "Review prior communication pattern with HR users."],
"conclusion": {"verdict": "false_positive", "reason": "Trusted vendor communication with expected destination.", "recommended_actions": ["Tune mail rule to reduce noisy alerts for approved HR vendor."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-PHISH-HEADER-CHECK"], "cases": []},
"lessons_learned": ["Short URLs alone should not drive phishing conclusion without destination validation."],
"tags": ["phishing", "email", "false-positive", "vendor"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-0003",
"title": "Executive impersonation email requested urgent wire transfer",
"scenario": "phishing",
"alert_type": "mail_bec_impersonation",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-05T13:15:00+08:00", "end": "2026-04-05T15:00:00+08:00"},
"summary": "An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["carol@corp.example"], "hosts": ["FIN-LAPTOP-08"], "mailboxes": ["carol@corp.example"]},
"observables": {"sender_emails": ["ceo@c0rp-example.com"], "domains": ["c0rp-example.com"], "urls": [], "ips": ["203.0.113.45"], "hashes": []},
"evidence": ["Lookalike domain used numeric substitution.", "Language pressure matched prior BEC pattern.", "No historical communication from sender domain."],
"investigation_steps": ["Compare sender domain with corporate domain.", "Review historical communication graph.", "Confirm with executive assistant out of band."],
"conclusion": {"verdict": "true_positive", "reason": "Strong BEC indicators and confirmed spoofed sender identity.", "recommended_actions": ["Block sender domain.", "Notify finance team and update awareness content."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-CRED-HARVEST-PATTERNS"], "cases": []},
"lessons_learned": ["Lookalike domains need strong entity normalization in retrieval and detection logic."],
"tags": ["phishing", "bec", "executive-impersonation"]
}

View File

@ -1,19 +0,0 @@
{
"case_id": "CASE-2026-0004",
"title": "Shared mailbox received OneDrive lure with HTML attachment",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "medium",
"status": "confirmed",
"time_window": {"start": "2026-04-07T10:00:00+08:00", "end": "2026-04-07T12:05:00+08:00"},
"summary": "Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["shared-finance@corp.example"], "hosts": [], "mailboxes": ["shared-finance@corp.example"]},
"observables": {"sender_emails": ["noreply@sharepoint-notify.com"], "domains": ["sharepoint-notify.com"], "urls": ["https://onedrive-review-login.example"], "ips": ["198.51.100.87"], "hashes": ["sha256:phish0004"]},
"evidence": ["Attachment rendered a fake Microsoft sign-in page.", "Landing page hosted outside Microsoft IP space.", "Mail body reused branding from previous phishing campaign."],
"investigation_steps": ["Render attachment safely.", "Review URL hosting provider reputation.", "Search tenant for same subject and sender."],
"conclusion": {"verdict": "true_positive", "reason": "Credential harvesting lure with campaign reuse indicators.", "recommended_actions": ["Block sender and URL.", "Search and purge duplicate emails."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-CRED-HARVEST-PATTERNS"], "cases": ["CASE-2026-0001"]},
"lessons_learned": ["Campaign reuse makes historical phishing similarity especially valuable."],
"tags": ["phishing", "email", "onedrive-lure"]
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "KB-CRED-HARVEST-PATTERNS",
"doc_type": "kb",
"title": "Credential Harvesting Indicators",
"scenario": "phishing",
"summary": "Common indicators that a phishing case involves credential harvesting rather than simple spam or benign mail.",
"applicability": ["mail_suspicious_attachment", "mail_suspicious_link"],
"key_points": ["Landing page mimics Microsoft 365 or common SaaS login pages.", "HTML attachment often acts as a redirector rather than containing malware.", "Credential harvest campaigns frequently reuse branding and lures across tenants."],
"investigation_guidance": ["Capture full redirect chain.", "Look for post-click login anomalies in identity logs.", "Search for same lure across multiple mailboxes."],
"decision_points": ["User click plus sign-in anomaly greatly increases confidence.", "Branding reuse can help link separate phishing cases into one campaign."],
"related_entities": {"ttps": ["T1566.002"], "iocs": []},
"related_refs": {"playbooks": ["PB-PHISH-001"], "cases": []},
"tags": ["kb", "phishing", "credential-harvest"],
"updated_at": "2026-04-10T09:25:00+08:00"
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "KB-O365-IMPOSSIBLE-TRAVEL",
"doc_type": "kb",
"title": "Interpreting O365 Impossible Travel Alerts",
"scenario": "o365_suspicious_login",
"summary": "Guidance for validating impossible travel alerts, including VPN, proxy, and approved travel false-positive conditions.",
"applicability": ["azuread_impossible_travel"],
"key_points": ["Impossible travel must be validated against user travel context.", "VPN egress and cloud proxy routing are common false-positive sources.", "Pair sign-in anomaly with MFA, mailbox, or device anomalies before concluding compromise."],
"investigation_guidance": ["Validate source ASN and IP history.", "Check user-approved travel or remote work context.", "Compare device ID and user agent consistency."],
"decision_points": ["User denial of travel plus new device strongly increases confidence.", "Approved travel and trusted VPN topology reduce confidence."],
"related_entities": {"ttps": ["T1078"], "iocs": []},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "cases": []},
"tags": ["kb", "o365", "impossible-travel"],
"updated_at": "2026-04-10T09:30:00+08:00"
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "KB-O365-INBOX-RULE-ABUSE",
"doc_type": "kb",
"title": "Inbox Rule Abuse After Account Compromise",
"scenario": "o365_suspicious_login",
"summary": "Common mailbox persistence behaviors after O365 account compromise, especially rule creation to hide or forward finance emails.",
"applicability": ["azuread_suspicious_inbox_rule_after_login"],
"key_points": ["Attackers often hide financial emails using move-to-folder rules.", "Forwarding and delete rules are strong post-compromise indicators.", "Mailbox audit logs should be reviewed immediately after suspicious login confirmation."],
"investigation_guidance": ["Enumerate all inbox rules and forwarding settings.", "Check mailbox audit timeline around suspicious sign-in.", "Review OAuth consents if inbox rules are absent but suspicious mail actions continue."],
"decision_points": ["Inbox rule creation shortly after suspicious login strongly supports compromise verdict."],
"related_entities": {"ttps": ["T1114"], "iocs": []},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "cases": []},
"tags": ["kb", "o365", "inbox-rule"],
"updated_at": "2026-04-10T09:40:00+08:00"
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "KB-O365-MFA-FATIGUE",
"doc_type": "kb",
"title": "MFA Fatigue Detection Notes",
"scenario": "o365_suspicious_login",
"summary": "Patterns for identifying MFA fatigue / push bombing during account compromise attempts.",
"applicability": ["azuread_impossible_travel", "azuread_suspicious_login"],
"key_points": ["Repeated MFA prompts preceding one successful prompt is suspicious.", "User-reported prompt fatigue is strong supporting evidence.", "MFA fatigue is often coupled with credential theft rather than password spray alone."],
"investigation_guidance": ["Review MFA event counts and timing.", "Check if the user acknowledged unexpected prompts.", "Look for subsequent session hijacking or mailbox abuse."],
"decision_points": ["Prompt flood plus user denial usually warrants immediate containment."],
"related_entities": {"ttps": ["T1621"], "iocs": []},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "cases": []},
"tags": ["kb", "o365", "mfa-fatigue"],
"updated_at": "2026-04-10T09:35:00+08:00"
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "KB-PHISH-HEADER-CHECK",
"doc_type": "kb",
"title": "Phishing Header Validation Checklist",
"scenario": "phishing",
"summary": "Checklist for validating sender identity, domain reputation, and authentication results in suspected phishing emails.",
"applicability": ["mail_suspicious_attachment", "mail_suspicious_link", "mail_bec_impersonation"],
"key_points": ["Review SPF, DKIM, and DMARC alignment.", "Compare display name, envelope sender, and reply-to anomalies.", "Check domain age and known-good communication history."],
"investigation_guidance": ["Use message trace and header parser.", "Compare sender domain with vendor allowlist.", "Escalate lookalike domains even when content appears business-relevant."],
"decision_points": ["Newly observed domains with failed auth are high-risk.", "Benign vendor mail often has consistent historical sending patterns."],
"related_entities": {"ttps": ["T1566.001"], "iocs": []},
"related_refs": {"playbooks": ["PB-PHISH-001"], "cases": []},
"tags": ["kb", "phishing", "email-header"],
"updated_at": "2026-04-10T09:20:00+08:00"
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "PB-O365-LOGIN-001",
"doc_type": "playbook",
"title": "O365 Suspicious Login Investigation Playbook",
"scenario": "o365_suspicious_login",
"summary": "Standard investigation steps for suspicious Entra ID sign-ins, impossible travel, MFA abuse, and follow-on mailbox abuse.",
"applicability": ["azuread_impossible_travel", "azuread_legacy_auth_attempt", "azuread_suspicious_inbox_rule_after_login", "azuread_password_spray_attempt"],
"key_points": ["Confirm user travel and business context.", "Review sign-in logs, device IDs, and user agents.", "Inspect downstream actions such as inbox rules, app consent, and forwarding."],
"investigation_guidance": ["Correlate MFA telemetry with sign-in sequence.", "Check risky sign-ins and risky users views.", "Revoke sessions and reset credentials when compromise is confirmed."],
"decision_points": ["Impossible travel alone is insufficient without corroborating evidence.", "Inbox rule creation after foreign login strongly increases confidence of compromise."],
"related_entities": {"ttps": ["T1078"], "iocs": []},
"related_refs": {"kb": ["KB-O365-IMPOSSIBLE-TRAVEL", "KB-O365-MFA-FATIGUE", "KB-O365-INBOX-RULE-ABUSE"], "cases": []},
"tags": ["playbook", "o365", "login"],
"updated_at": "2026-04-10T09:10:00+08:00"
}

View File

@ -1,15 +0,0 @@
{
"doc_id": "PB-PHISH-001",
"doc_type": "playbook",
"title": "Phishing Email Investigation Playbook",
"scenario": "phishing",
"summary": "Standard investigation steps for suspicious email, credential harvesting, and BEC-like cases.",
"applicability": ["mail_suspicious_attachment", "mail_suspicious_link", "mail_bec_impersonation"],
"key_points": ["Validate sender authentication results.", "Inspect landing URL and attachment behavior.", "Check whether the user clicked or submitted credentials."],
"investigation_guidance": ["Query email telemetry for same sender, subject, or URL.", "Review mailbox click logs and endpoint browser artifacts.", "Reset credentials if submission is suspected."],
"decision_points": ["If sender auth fails and user interaction exists, treat as likely phishing.", "If destination is allowlisted and communication pattern is expected, investigate false positive path."],
"related_entities": {"ttps": ["T1566"], "iocs": []},
"related_refs": {"kb": ["KB-PHISH-HEADER-CHECK", "KB-CRED-HARVEST-PATTERNS"], "cases": []},
"tags": ["playbook", "phishing", "email"],
"updated_at": "2026-04-10T09:00:00+08:00"
}

View File

@ -1,65 +0,0 @@
{
"id": "CASE-2026-0001",
"memory_type": "case",
"scenario": "phishing",
"title": "Finance user received invoice-themed phishing email",
"abstract": "Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"alice@corp.example"
],
"hosts": [
"FIN-LAPTOP-12"
],
"mailboxes": [
"alice@corp.example"
]
},
"observables": {
"sender_emails": [
"billing@vendor-payments.com"
],
"domains": [
"vendor-payments.com",
"vendor-payments-login.com"
],
"urls": [
"https://vendor-payments-login.com/review"
],
"ips": [
"198.51.100.20"
],
"hashes": [
"sha256:phish0001"
]
},
"evidence": [
"Sender domain was newly observed and failed DMARC.",
"Attachment redirected to a fake Microsoft 365 login page.",
"User clicked the link before mail quarantine completed."
],
"patterns": [
"verdict:true_positive",
"scenario:phishing",
"alert_type:mail_suspicious_attachment"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-PHISH-HEADER-CHECK",
"KB-CRED-HARVEST-PATTERNS"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0001.json",
"tags": [
"phishing",
"email",
"credential-harvest",
"finance"
]
}

View File

@ -1,59 +0,0 @@
{
"id": "CASE-2026-0002",
"memory_type": "case",
"scenario": "phishing",
"title": "Payroll notification email flagged but determined benign",
"abstract": "Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.",
"verdict": "false_positive",
"severity": "medium",
"entities": {
"users": [
"bob@corp.example"
],
"hosts": [
"HR-LAPTOP-03"
],
"mailboxes": [
"bob@corp.example"
]
},
"observables": {
"sender_emails": [
"notify@hr-vendor.example"
],
"domains": [
"hr-vendor.example"
],
"urls": [
"https://bit.ly/hr-portal-example"
],
"ips": [],
"hashes": []
},
"evidence": [
"Sender domain aligned with SPF and DKIM.",
"Destination domain matched approved supplier inventory.",
"No credential prompt anomaly observed."
],
"patterns": [
"verdict:false_positive",
"scenario:phishing",
"alert_type:mail_suspicious_link"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-PHISH-HEADER-CHECK"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0002.json",
"tags": [
"phishing",
"email",
"false-positive",
"vendor"
]
}

View File

@ -1,58 +0,0 @@
{
"id": "CASE-2026-0003",
"memory_type": "case",
"scenario": "phishing",
"title": "Executive impersonation email requested urgent wire transfer",
"abstract": "An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"carol@corp.example"
],
"hosts": [
"FIN-LAPTOP-08"
],
"mailboxes": [
"carol@corp.example"
]
},
"observables": {
"sender_emails": [
"ceo@c0rp-example.com"
],
"domains": [
"c0rp-example.com"
],
"urls": [],
"ips": [
"203.0.113.45"
],
"hashes": []
},
"evidence": [
"Lookalike domain used numeric substitution.",
"Language pressure matched prior BEC pattern.",
"No historical communication from sender domain."
],
"patterns": [
"verdict:true_positive",
"scenario:phishing",
"alert_type:mail_bec_impersonation"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-CRED-HARVEST-PATTERNS"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0003.json",
"tags": [
"phishing",
"bec",
"executive-impersonation"
]
}

View File

@ -1,62 +0,0 @@
{
"id": "CASE-2026-0004",
"memory_type": "case",
"scenario": "phishing",
"title": "Shared mailbox received OneDrive lure with HTML attachment",
"abstract": "Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.",
"verdict": "true_positive",
"severity": "medium",
"entities": {
"users": [
"shared-finance@corp.example"
],
"hosts": [],
"mailboxes": [
"shared-finance@corp.example"
]
},
"observables": {
"sender_emails": [
"noreply@sharepoint-notify.com"
],
"domains": [
"sharepoint-notify.com"
],
"urls": [
"https://onedrive-review-login.example"
],
"ips": [
"198.51.100.87"
],
"hashes": [
"sha256:phish0004"
]
},
"evidence": [
"Attachment rendered a fake Microsoft sign-in page.",
"Landing page hosted outside Microsoft IP space.",
"Mail body reused branding from previous phishing campaign."
],
"patterns": [
"verdict:true_positive",
"scenario:phishing",
"alert_type:mail_suspicious_attachment"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-CRED-HARVEST-PATTERNS"
],
"cases": [
"CASE-2026-0001"
]
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0004.json",
"tags": [
"phishing",
"email",
"onedrive-lure"
]
}

View File

@ -1,56 +0,0 @@
{
"id": "CASE-2026-1001",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Impossible travel login followed by MFA prompt fatigue",
"abstract": "User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"david@corp.example"
],
"hosts": [
"WS-DAVID-01"
],
"mailboxes": [
"david@corp.example"
]
},
"observables": {
"ips": [
"203.0.113.150",
"198.51.100.61"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"Two successful sign-ins from geographically impossible locations within 15 minutes.",
"MFA challenge volume increased abnormally before final success.",
"User confirmed they did not initiate overseas login."
],
"patterns": [
"verdict:true_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_impossible_travel"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL",
"KB-O365-MFA-FATIGUE"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1001.json",
"tags": [
"o365",
"login",
"impossible-travel",
"mfa-fatigue"
]
}

View File

@ -1,52 +0,0 @@
{
"id": "CASE-2026-1002",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Legacy protocol sign-in from unfamiliar IP blocked by policy",
"abstract": "Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.",
"verdict": "false_positive",
"severity": "medium",
"entities": {
"users": [
"svc-migration@corp.example"
],
"hosts": [],
"mailboxes": [
"svc-migration@corp.example"
]
},
"observables": {
"ips": [
"192.0.2.24"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"The account is a known migration service account.",
"Source IP matched approved cloud migration vendor range.",
"No successful sign-in occurred due to policy block."
],
"patterns": [
"verdict:false_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_legacy_auth_attempt"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-LEGACY-AUTH"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1002.json",
"tags": [
"o365",
"login",
"false-positive",
"legacy-auth"
]
}

View File

@ -1,55 +0,0 @@
{
"id": "CASE-2026-1003",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Suspicious inbox rule creation after successful foreign login",
"abstract": "An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"emma@corp.example"
],
"hosts": [
"WS-EMMA-07"
],
"mailboxes": [
"emma@corp.example"
]
},
"observables": {
"ips": [
"198.51.100.98"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"Successful sign-in from untrusted ASN.",
"Inbox rule moved wire transfer emails to RSS Feeds folder.",
"Mailbox audit showed rule creation minutes after login."
],
"patterns": [
"verdict:true_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_suspicious_inbox_rule_after_login"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-INBOX-RULE-ABUSE",
"KB-O365-IMPOSSIBLE-TRAVEL"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1003.json",
"tags": [
"o365",
"login",
"inbox-rule",
"account-compromise"
]
}

View File

@ -1,52 +0,0 @@
{
"id": "CASE-2026-1004",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Multiple failed logins from residential proxy but no successful access",
"abstract": "Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.",
"verdict": "uncertain",
"severity": "medium",
"entities": {
"users": [
"frank@corp.example"
],
"hosts": [],
"mailboxes": [
"frank@corp.example"
]
},
"observables": {
"ips": [
"203.0.113.201"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"High-volume failed attempts over a short period.",
"Source IP attributed to a residential proxy provider.",
"No matching successful sign-in or MFA event found."
],
"patterns": [
"verdict:uncertain",
"scenario:o365_suspicious_login",
"alert_type:azuread_password_spray_attempt"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1004.json",
"tags": [
"o365",
"login",
"password-spray",
"pending"
]
}

View File

@ -1,55 +0,0 @@
{
"id": "CASE-2026-1005",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Traveling executive triggered impossible travel but activity was legitimate",
"abstract": "Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.",
"verdict": "false_positive",
"severity": "medium",
"entities": {
"users": [
"grace@corp.example"
],
"hosts": [
"VIP-LAPTOP-01"
],
"mailboxes": [
"grace@corp.example"
]
},
"observables": {
"ips": [
"192.0.2.90",
"203.0.113.77"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"Approved travel request existed.",
"One login originated from corporate VPN exit node.",
"Device and user agent were consistent with known user profile."
],
"patterns": [
"verdict:false_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_impossible_travel"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1005.json",
"tags": [
"o365",
"login",
"false-positive",
"travel"
]
}

View File

@ -1,34 +0,0 @@
{
"id": "KB-CRED-HARVEST-PATTERNS",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "phishing",
"title": "Credential Harvesting Indicators",
"abstract": "Common indicators that a phishing case involves credential harvesting rather than simple spam or benign mail.",
"key_points": [
"Landing page mimics Microsoft 365 or common SaaS login pages.",
"HTML attachment often acts as a redirector rather than containing malware.",
"Credential harvest campaigns frequently reuse branding and lures across tenants."
],
"investigation_guidance": [
"Capture full redirect chain.",
"Look for post-click login anomalies in identity logs.",
"Search for same lure across multiple mailboxes."
],
"decision_points": [
"User click plus sign-in anomaly greatly increases confidence.",
"Branding reuse can help link separate phishing cases into one campaign."
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-CRED-HARVEST-PATTERNS.json",
"tags": [
"kb",
"phishing",
"credential-harvest"
]
}

View File

@ -1,34 +0,0 @@
{
"id": "KB-O365-IMPOSSIBLE-TRAVEL",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "o365_suspicious_login",
"title": "Interpreting O365 Impossible Travel Alerts",
"abstract": "Guidance for validating impossible travel alerts, including VPN, proxy, and approved travel false-positive conditions.",
"key_points": [
"Impossible travel must be validated against user travel context.",
"VPN egress and cloud proxy routing are common false-positive sources.",
"Pair sign-in anomaly with MFA, mailbox, or device anomalies before concluding compromise."
],
"investigation_guidance": [
"Validate source ASN and IP history.",
"Check user-approved travel or remote work context.",
"Compare device ID and user agent consistency."
],
"decision_points": [
"User denial of travel plus new device strongly increases confidence.",
"Approved travel and trusted VPN topology reduce confidence."
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-O365-IMPOSSIBLE-TRAVEL.json",
"tags": [
"kb",
"o365",
"impossible-travel"
]
}

View File

@ -1,33 +0,0 @@
{
"id": "KB-O365-INBOX-RULE-ABUSE",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "o365_suspicious_login",
"title": "Inbox Rule Abuse After Account Compromise",
"abstract": "Common mailbox persistence behaviors after O365 account compromise, especially rule creation to hide or forward finance emails.",
"key_points": [
"Attackers often hide financial emails using move-to-folder rules.",
"Forwarding and delete rules are strong post-compromise indicators.",
"Mailbox audit logs should be reviewed immediately after suspicious login confirmation."
],
"investigation_guidance": [
"Enumerate all inbox rules and forwarding settings.",
"Check mailbox audit timeline around suspicious sign-in.",
"Review OAuth consents if inbox rules are absent but suspicious mail actions continue."
],
"decision_points": [
"Inbox rule creation shortly after suspicious login strongly supports compromise verdict."
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-O365-INBOX-RULE-ABUSE.json",
"tags": [
"kb",
"o365",
"inbox-rule"
]
}

View File

@ -1,33 +0,0 @@
{
"id": "KB-O365-MFA-FATIGUE",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "o365_suspicious_login",
"title": "MFA Fatigue Detection Notes",
"abstract": "Patterns for identifying MFA fatigue / push bombing during account compromise attempts.",
"key_points": [
"Repeated MFA prompts preceding one successful prompt is suspicious.",
"User-reported prompt fatigue is strong supporting evidence.",
"MFA fatigue is often coupled with credential theft rather than password spray alone."
],
"investigation_guidance": [
"Review MFA event counts and timing.",
"Check if the user acknowledged unexpected prompts.",
"Look for subsequent session hijacking or mailbox abuse."
],
"decision_points": [
"Prompt flood plus user denial usually warrants immediate containment."
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-O365-MFA-FATIGUE.json",
"tags": [
"kb",
"o365",
"mfa-fatigue"
]
}

View File

@ -1,34 +0,0 @@
{
"id": "KB-PHISH-HEADER-CHECK",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "phishing",
"title": "Phishing Header Validation Checklist",
"abstract": "Checklist for validating sender identity, domain reputation, and authentication results in suspected phishing emails.",
"key_points": [
"Review SPF, DKIM, and DMARC alignment.",
"Compare display name, envelope sender, and reply-to anomalies.",
"Check domain age and known-good communication history."
],
"investigation_guidance": [
"Use message trace and header parser.",
"Compare sender domain with vendor allowlist.",
"Escalate lookalike domains even when content appears business-relevant."
],
"decision_points": [
"Newly observed domains with failed auth are high-risk.",
"Benign vendor mail often has consistent historical sending patterns."
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-PHISH-HEADER-CHECK.json",
"tags": [
"kb",
"phishing",
"email-header"
]
}

View File

@ -1,36 +0,0 @@
{
"id": "PB-O365-LOGIN-001",
"memory_type": "knowledge",
"doc_type": "playbook",
"scenario": "o365_suspicious_login",
"title": "O365 Suspicious Login Investigation Playbook",
"abstract": "Standard investigation steps for suspicious Entra ID sign-ins, impossible travel, MFA abuse, and follow-on mailbox abuse.",
"key_points": [
"Confirm user travel and business context.",
"Review sign-in logs, device IDs, and user agents.",
"Inspect downstream actions such as inbox rules, app consent, and forwarding."
],
"investigation_guidance": [
"Correlate MFA telemetry with sign-in sequence.",
"Check risky sign-ins and risky users views.",
"Revoke sessions and reset credentials when compromise is confirmed."
],
"decision_points": [
"Impossible travel alone is insufficient without corroborating evidence.",
"Inbox rule creation after foreign login strongly increases confidence of compromise."
],
"related_refs": {
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL",
"KB-O365-MFA-FATIGUE",
"KB-O365-INBOX-RULE-ABUSE"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/playbooks/PB-O365-LOGIN-001.json",
"tags": [
"playbook",
"o365",
"login"
]
}

View File

@ -1,35 +0,0 @@
{
"id": "PB-PHISH-001",
"memory_type": "knowledge",
"doc_type": "playbook",
"scenario": "phishing",
"title": "Phishing Email Investigation Playbook",
"abstract": "Standard investigation steps for suspicious email, credential harvesting, and BEC-like cases.",
"key_points": [
"Validate sender authentication results.",
"Inspect landing URL and attachment behavior.",
"Check whether the user clicked or submitted credentials."
],
"investigation_guidance": [
"Query email telemetry for same sender, subject, or URL.",
"Review mailbox click logs and endpoint browser artifacts.",
"Reset credentials if submission is suspected."
],
"decision_points": [
"If sender auth fails and user interaction exists, treat as likely phishing.",
"If destination is allowlisted and communication pattern is expected, investigate false positive path."
],
"related_refs": {
"kb": [
"KB-PHISH-HEADER-CHECK",
"KB-CRED-HARVEST-PATTERNS"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/playbooks/PB-PHISH-001.json",
"tags": [
"playbook",
"phishing",
"email"
]
}

View File

@ -1,9 +0,0 @@
# EverMemOS Layer
这个目录用于保存长期记忆整理层的工作逻辑。
主要职责:
- 从 episode / process memory 中抽取长期记忆
- 去重、合并、更新、衰减
- 反哺 OpenViking 和 Obsidian

View File

@ -0,0 +1,116 @@
---
name: memory-gateway
description: Use this skill when an agent or harness needs reusable memory: search prior context, retrieve OpenViking resources, upload documents into knowledge, summarize arbitrary content with the Memory Gateway LLM, commit final conclusions, or cite related Obsidian notes. This skill is domain-neutral.
version: 2.0.0
metadata:
hermes:
tags: [memory, openviking, obsidian, knowledge, retrieval, summarization, document-ingestion, agent-context]
---
# Memory Gateway
Use this skill as a generic memory layer for any agent / harness. It connects Hermes to the local Memory Gateway at `http://127.0.0.1:1934`, which fronts OpenViking and an Obsidian vault.
## Trigger Rule
Use this skill when the user asks to:
- search prior memory or retrieve related context
- upload a document and make it reusable knowledge
- summarize content and store it as memory/resource
- commit final conclusions, decisions, lessons learned, or research notes
- cite related OpenViking resources or Obsidian notes
- prepare context for another agent or workflow
Do not assume any domain-specific workflow. Treat Memory Gateway as a reusable memory and knowledge entrypoint.
## Environment
Defaults:
- Memory Gateway URL: `http://127.0.0.1:1934`
- Obsidian vault: `/home/tom/memory-gateway/obsidian-vault`
- Default namespace: `memory-gateway`
Optional env vars:
- `MEMORY_GATEWAY_URL`
- `MEMORY_GATEWAY_API_KEY`
- `MEMORY_GATEWAY_OBSIDIAN_VAULT`
## Core Workflows
### 1. Retrieve Context
```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/retrieve_memory.py \
--query "project decision memory gateway LLM summary" \
--uri viking://resources \
--limit 5
```
Use retrieval before answering when prior context may materially improve correctness.
### 2. Summarize And Commit
```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/commit_summary.py \
--title "Project decision summary" \
--namespace memory-gateway \
--memory-type decision \
--tag project --tag decision \
--persist-as resource \
--text "<final conclusion or reusable knowledge>"
```
This calls `POST /api/summary`, which uses the configured LLM and writes to OpenViking when `persist-as` is not `none`.
### 3. Upload Document As Knowledge
```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/upload_knowledge.py \
--file /path/to/document.pdf \
--title "Design Notes" \
--namespace memory-gateway \
--knowledge-type design_doc \
--tags project,design,reference \
--persist-as resource
```
This calls `POST /api/knowledge/upload`: document -> MarkItDown Markdown -> Obsidian note -> LLM summary -> OpenViking resource.
### 4. Search Obsidian Notes
```bash
python /home/tom/.hermes/skills/memory-gateway/scripts/search_obsidian.py \
--query "design notes memory gateway" \
--limit 5
```
## Output Template
When using this skill, answer with:
```markdown
## Answer
<direct answer or synthesis>
## Memory / Resource References
- `<title or URI>``<viking://...>` — why it matters
## Obsidian References
- `<note.md>``<relative path>` — why it matters
## Suggested Memory Commit
- commit: yes/no
- namespace:
- memory_type:
- tags:
- resource_uri: if committed
```
## Guardrails
- Do not store raw noisy data as long-term memory when a concise summary is enough.
- Prefer LLM summaries and structured artifacts over full chat transcripts.
- Do not commit secrets, credentials, tokens, private keys, or unnecessary personal data.
- If content is sensitive, summarize and redact before committing.
- If retrieval quality looks noisy, state that and cite only useful results.
- Always report whether a commit/upload actually succeeded and include the returned resource URI when available.

View File

@ -0,0 +1,19 @@
from __future__ import annotations
import json
import os
import urllib.request
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("MEMORY_GATEWAY_API_KEY", "")
def post_json(path: str, payload: dict[str, Any], gateway_url: str = DEFAULT_GATEWAY_URL, api_key: str = DEFAULT_GATEWAY_API_KEY, timeout: int = 120) -> dict[str, Any]:
data = json.dumps(payload, ensure_ascii=False).encode("utf-8")
req = urllib.request.Request(gateway_url.rstrip("/") + path, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=timeout) as resp:
return json.loads(resp.read().decode("utf-8"))

View File

@ -0,0 +1,53 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import sys
from pathlib import Path
from _client import DEFAULT_GATEWAY_API_KEY, DEFAULT_GATEWAY_URL, post_json
def load_text(args: argparse.Namespace) -> str:
if args.file:
return Path(args.file).read_text(encoding="utf-8")
if args.text:
return args.text
return sys.stdin.read().strip()
def main() -> None:
parser = argparse.ArgumentParser(description="Summarize arbitrary content with the Gateway LLM and commit it as memory/resource.")
parser.add_argument("--text", help="Text to summarize; stdin is used if omitted")
parser.add_argument("--file", help="File containing text to summarize")
parser.add_argument("--title", default="")
parser.add_argument("--summary", default="", help="Optional summary hint")
parser.add_argument("--namespace", default="memory-gateway")
parser.add_argument("--memory-type", default="summary")
parser.add_argument("--tag", action="append", default=[])
parser.add_argument("--source", default="hermes:memory-gateway")
parser.add_argument("--resource-uri", default="")
parser.add_argument("--persist-as", choices=["memory", "resource", "both", "none"], default="resource")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL)
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY)
args = parser.parse_args()
content = load_text(args)
if not content:
parser.error("No content provided via --text, --file, or stdin")
payload = {
"content": content,
"title": args.title or None,
"summary": args.summary or None,
"namespace": args.namespace,
"memory_type": args.memory_type,
"tags": args.tag,
"source": args.source,
"resource_uri": args.resource_uri or None,
"persist_as": args.persist_as,
}
print(json.dumps(post_json("/api/summary", payload, args.gateway_url, args.api_key), ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,29 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
from _client import DEFAULT_GATEWAY_API_KEY, DEFAULT_GATEWAY_URL, post_json
def main() -> None:
parser = argparse.ArgumentParser(description="Retrieve memory/resources from Memory Gateway.")
parser.add_argument("--query", required=True, help="Search query")
parser.add_argument("--uri", default="", help="Optional OpenViking URI scope, e.g. viking://resources/project")
parser.add_argument("--namespace", default="", help="Optional namespace if URI is not provided")
parser.add_argument("--limit", type=int, default=5)
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL)
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY)
args = parser.parse_args()
payload = {"query": args.query, "limit": args.limit}
if args.uri:
payload["uri"] = args.uri
if args.namespace:
payload["namespace"] = args.namespace
result = post_json("/api/search", payload, args.gateway_url, args.api_key)
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,55 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import re
from pathlib import Path
DEFAULT_VAULT = os.environ.get("MEMORY_GATEWAY_OBSIDIAN_VAULT", "/home/tom/memory-gateway/obsidian-vault")
def tokenize(query: str) -> list[str]:
return [t.lower() for t in re.split(r"[^\w\u4e00-\u9fff.-]+", query) if len(t.strip()) > 1]
def main() -> None:
parser = argparse.ArgumentParser(description="Search local Obsidian Markdown notes by keyword.")
parser.add_argument("--query", required=True)
parser.add_argument("--vault-root", default=DEFAULT_VAULT)
parser.add_argument("--limit", type=int, default=5)
args = parser.parse_args()
root = Path(args.vault_root)
tokens = tokenize(args.query)
results = []
for file in root.rglob("*.md"):
try:
text = file.read_text(encoding="utf-8")
except UnicodeDecodeError:
continue
haystack = (file.name + "\n" + text).lower()
matched = [token for token in tokens if token in haystack]
if not matched:
continue
summary = ""
for line in text.splitlines():
line = line.strip("# -\t")
if len(line) > 30:
summary = line[:240]
break
results.append({
"score": len(matched) * 10 + min(len(matched), 10),
"file_name": file.name,
"relative_path": str(file.relative_to(root)),
"absolute_path": str(file),
"matched_terms": matched,
"summary": summary,
})
results.sort(key=lambda item: item["score"], reverse=True)
print(json.dumps({"query": args.query, "vault_root": str(root), "matched_docs": results[:args.limit]}, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,65 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import mimetypes
import urllib.request
from pathlib import Path
from _client import DEFAULT_GATEWAY_API_KEY, DEFAULT_GATEWAY_URL
def multipart_upload(url: str, fields: dict[str, str], file_path: Path, api_key: str = "") -> dict:
boundary = "----memorygatewayboundary"
body = bytearray()
for name, value in fields.items():
if value == "":
continue
body.extend(f"--{boundary}\r\n".encode())
body.extend(f'Content-Disposition: form-data; name="{name}"\r\n\r\n{value}\r\n'.encode())
body.extend(f"--{boundary}\r\n".encode())
mime = mimetypes.guess_type(file_path.name)[0] or "application/octet-stream"
body.extend(f'Content-Disposition: form-data; name="file"; filename="{file_path.name}"\r\n'.encode())
body.extend(f"Content-Type: {mime}\r\n\r\n".encode())
body.extend(file_path.read_bytes())
body.extend(b"\r\n")
body.extend(f"--{boundary}--\r\n".encode())
req = urllib.request.Request(url, data=bytes(body), method="POST")
req.add_header("Content-Type", f"multipart/form-data; boundary={boundary}")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=180) as resp:
return json.loads(resp.read().decode("utf-8"))
def main() -> None:
parser = argparse.ArgumentParser(description="Upload a document, convert to Markdown, save to Obsidian, summarize with LLM, and commit to OpenViking.")
parser.add_argument("--file", required=True)
parser.add_argument("--title", default="")
parser.add_argument("--namespace", default="memory-gateway")
parser.add_argument("--knowledge-type", default="knowledge")
parser.add_argument("--tags", default="")
parser.add_argument("--source", default="")
parser.add_argument("--obsidian-dir", default="")
parser.add_argument("--resource-uri", default="")
parser.add_argument("--persist-as", choices=["memory", "resource", "both", "none"], default="resource")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL)
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY)
args = parser.parse_args()
fields = {
"title": args.title,
"namespace": args.namespace,
"knowledge_type": args.knowledge_type,
"tags": args.tags,
"source": args.source,
"obsidian_dir": args.obsidian_dir,
"resource_uri": args.resource_uri,
"persist_as": args.persist_as,
}
result = multipart_upload(args.gateway_url.rstrip("/") + "/api/knowledge/upload", fields, Path(args.file), args.api_key)
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,314 +0,0 @@
---
name: soc-memory-poc
description: Load this skill whenever Hermes is handling SOC alert triage, phishing investigation, suspicious O365 login analysis, historical case lookup, Obsidian note lookup, case-note generation, or committing high-value SOC findings into the SOC Memory POC. It provides a strict triage workflow using the SOC Memory Gateway for search/write operations, local Obsidian vault search, and local SOC Memory POC scripts for Obsidian case note generation.
version: 1.3.0
metadata:
hermes:
tags: [soc, memory, openviking, obsidian, incident-response, case-triage, phishing, o365]
related_skills: [hermes-agent]
---
# SOC Memory POC
Use this skill for SOC case workflows only. It is the default procedure for phishing-style alerts, suspicious O365 / Entra ID login cases, historical case comparison, Obsidian knowledge lookup, and case-note generation.
## Mandatory Trigger Rule
Load this skill immediately when the user asks Hermes to do any of the following:
- investigate or triage a SOC alert
- find similar phishing or O365 suspicious-login cases
- retrieve related KB or playbook context before concluding a case
- check whether Obsidian already has a related case note or knowledge note
- generate an Obsidian case note from a normalized case
- commit a normalized case or knowledge artifact into the SOC memory system
If the task is clearly SOC triage related, do not proceed without using this skill.
## What This Skill Connects To
This skill assumes:
- SOC Memory POC root: `/home/tom/soc_memory_poc`
- Memory Gateway URL: `http://127.0.0.1:1934`
- Gateway API key: empty by default unless configured otherwise
- Obsidian vault root: `/home/tom/soc_memory_poc/obsidian-vault`
Override with environment variables when needed:
- `SOC_MEMORY_POC_ROOT`
- `SOC_MEMORY_GATEWAY_URL`
- `SOC_MEMORY_GATEWAY_API_KEY`
Capabilities:
- search SOC case / knowledge context through the Memory Gateway
- search existing Obsidian notes by case ID, scenario, keywords, or tags
- commit normalized case / knowledge JSON through the Memory Gateway
- generate Obsidian case notes from normalized case JSON
## Triage Workflow
Follow this order unless the user explicitly asks for something narrower.
### Preferred Path For Structured Alerts (Scheme A)
If the user provides a structured alert summary with fields like user, host, sender, subject, attachment, URL, IP, alert type, or known facts, do **not** manually improvise the final answer from memory search results alone.
Use the deterministic triage helper first:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_alert.py \
--scenario phishing \
--alert-type mail_suspicious_attachment \
--user alice@corp.example \
--host FIN-LAPTOP-12 \
--sender billing@vendor-payments.com \
--subject "Invoice overdue notice" \
--attachment invoice_review.html \
--url https://vendor-payments-login.com/review \
--ip 198.51.100.20 \
--summary "Invoice-themed phishing email with HTML attachment and credential harvesting link" \
--fact "DMARC failed" \
--fact "User may have clicked the link"
```
This script performs:
- case retrieval from the SOC Memory Gateway
- knowledge retrieval from the SOC Memory Gateway
- Obsidian note lookup from the local vault
- final markdown rendering with all required sections populated
For scheme A, prefer returning the script output with only light cleanup. Do not drop the `关联 Memory Retrieval` or `关联 Obsidian 文档` sections.
### Preferred Path For Freeform Alerts Or Raw Email Content
If the user does **not** provide neatly separated fields, or pastes raw email content / ticket text / freeform alert text, do not force them into Scheme A manually.
Use the unified triage helper:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
```
Or point it at a file:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --file /path/to/raw_email.txt
```
This helper will:
- infer the most likely scenario and alert type
- extract sender, user, subject, attachment, URL, IP, and host when possible
- carry over important facts like DMARC failure, user click, MFA fatigue, inbox rule, or OAuth consent
- run the deterministic triage pipeline so the final answer still contains `关联 Memory Retrieval` and `关联 Obsidian 文档`
For non-structured input, prefer this helper over freehand reasoning.
For all SOC triage inputs, `triage_email.py` is the preferred single entrypoint. It accepts raw text, a file, or optional structured overrides, then calls the deterministic retrieval pipeline.
### Phase 1: Ground The Case
First identify:
- scenario: `phishing`, `o365_suspicious_login`, or another SOC scenario
- likely alert type
- short case summary in one sentence
- key observables if available: sender, URL, domain, IP, mailbox, user, hash
Do not start by writing memory. Start by grounding the case.
### Phase 2: Retrieve Memory Context Before Judging
Before concluding the case, search both related history and related knowledge.
1. Search similar historical cases.
2. Search KB / playbook context.
3. Compare the current case against what comes back.
Run these separately for better precision.
Case search example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
--query "invoice phishing html attachment credential harvesting" \
--kind case --limit 5
```
Knowledge search example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
--query "invoice phishing html attachment credential harvesting" \
--kind knowledge --limit 5
```
O365 example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
--query "impossible travel MFA fatigue inbox rule oauth consent" \
--kind knowledge --limit 5
```
Search scopes:
- `case` -> `viking://resources/soc-memory-poc/case`
- `knowledge` -> `viking://resources/soc-memory-poc/knowledge`
- `all` -> `viking://resources/soc-memory-poc`
### Phase 3: Retrieve Obsidian References
After memory retrieval, look for related notes in the Obsidian vault so the final answer can reference existing human-readable documentation.
Example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_obsidian_docs.py \
--query "invoice phishing html attachment credential harvesting" \
--scenario phishing \
--limit 5
```
Use this to surface:
- existing case notes
- related scenario notes
- notes whose names, tags, or content closely match the current case
When reporting Obsidian references, include at least:
- note title or file name
- relative path under `obsidian-vault/`
- why the note is relevant
### Phase 4: Produce The Triage Output
After retrieval, synthesize a result that includes:
- likely verdict or current assessment
- strongest evidence
- closest matching historical cases
- most relevant KB / playbook guidance
- related Obsidian notes
- recommended next investigation or response actions
Do not just paste raw search output. Summarize why the returned items matter.
## Final Output Template
Unless the user asks for a different format, use this structure for final SOC triage answers:
### 研判结果
- one short paragraph with the likely verdict / current assessment
### 关键证据
- 2 to 5 flat bullets with the strongest evidence
### 关联 Memory Retrieval
- one flat bullet per retrieved case / knowledge item
- include: ID + short relevance reason
- example: `CASE-2026-0001`: same invoice lure + HTML attachment + credential harvesting flow
### 关联 Obsidian 文档
- one flat bullet per note
- include: note name + relative path + one-line relevance reason
- example: `CASE-2026-0001 - Finance user ...md``02_Cases/phishing/...` — already documents a near-identical phishing pattern
### 建议动作
- 2 to 5 flat bullets with next investigation or response steps
If no Obsidian note matches, explicitly say `未找到直接关联的 Obsidian 文档`.
### Phase 5: Generate Case Note When The Case Is Mature Enough
If the task includes documenting the result, or the case already has a normalized JSON artifact, generate an Obsidian case note.
Example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/generate_case_note.py \
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json \
--enrich-from-openviking \
--top-k 3
```
This writes under `obsidian-vault/02_Cases/<scenario>/`.
Use `--enrich-from-openviking` by default when the gateway is available.
### Phase 6: Commit Only High-Value Artifacts
If Hermes has a normalized case or knowledge JSON that is worth preserving, commit it through the Gateway.
Example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/commit_case_memory.py \
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json
```
Only commit normalized, reusable artifacts. Do not commit raw logs, raw tool traces, or ad hoc chat text.
## Recommended Defaults By Scenario
### Phishing
Default order:
1. search `case`
2. search `knowledge`
3. search related Obsidian notes
4. assess sender auth, lure type, landing page, user interaction
5. generate case note if the case is already structured
6. commit only if the case artifact is normalized and high value
Good query ingredients:
- lure theme
- attachment type
- credential harvesting
- fake M365 login
- sender domain
- landing URL pattern
### O365 Suspicious Login
Default order:
1. search `case`
2. search `knowledge`
3. search related Obsidian notes
4. assess impossible travel, MFA fatigue, inbox rule abuse, OAuth consent, legacy auth
5. generate case note if the case is already structured
6. commit only if the case artifact is normalized and high value
Good query ingredients:
- impossible travel
- MFA fatigue
- inbox rule
- foreign login
- OAuth consent
- legacy protocol
## Failure Handling
If Gateway search fails:
- say explicitly that the SOC Memory Gateway is unavailable
- do not pretend retrieval succeeded
- continue with local reasoning only if the user still wants that
If Obsidian search fails:
- say explicitly that Obsidian references could not be retrieved
- do not invent note names or paths
If note generation fails:
- report the failing path or command
- do not claim the note was written
If commit fails:
- report the URI or file that failed
- do not claim the memory was stored
## Guardrails
- Search `case` and `knowledge` separately before concluding a triage result.
- Search Obsidian notes after memory retrieval so final output can point to human-readable references.
- Prefer narrow, scenario-specific queries over vague long prompts.
- Do not dump raw investigative process into memory.
- Generate case notes from normalized case JSON, not from freeform chat.
- Commit only high-value, reusable artifacts.
- When Gateway results look noisy, explain that retrieval quality may still need SOC-specific reranking.

View File

@ -1,66 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("SOC_MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("SOC_MEMORY_GATEWAY_API_KEY", "")
def load_item(path: str | Path) -> dict[str, Any]:
with Path(path).open("r", encoding="utf-8") as f:
return json.load(f)
def build_resource_uri(item: dict[str, Any]) -> str:
memory_type = item.get("memory_type")
item_id = item["id"]
if memory_type == "case":
scenario = item.get("scenario", "general")
return f"viking://resources/soc-memory-poc/case/{scenario}/{item_id}.json"
if memory_type == "knowledge":
doc_type = item.get("doc_type", "general")
return f"viking://resources/soc-memory-poc/knowledge/{doc_type}/{item_id}.json"
raise SystemExit(f"Unsupported memory_type: {memory_type}")
def post_json(url: str, payload: dict[str, Any], api_key: str = "") -> dict[str, Any]:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=60) as resp:
return json.loads(resp.read().decode("utf-8"))
def main() -> None:
parser = argparse.ArgumentParser(description="Commit a normalized SOC case / knowledge JSON through the Memory Gateway.")
parser.add_argument("--input", required=True, help="Normalized JSON file path")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL, help="Memory Gateway base URL")
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY, help="Gateway API key if required")
args = parser.parse_args()
item = load_item(args.input)
payload = {
"uri": build_resource_uri(item),
"content": json.dumps(item, ensure_ascii=False, indent=2),
"resource_type": "json",
}
try:
result = post_json(args.gateway_url.rstrip("/") + "/api/resource", payload, api_key=args.api_key)
except urllib.error.URLError as exc:
raise SystemExit(f"Gateway resource commit failed: {exc}") from exc
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,48 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import subprocess
import sys
from pathlib import Path
DEFAULT_POC_ROOT = os.environ.get("SOC_MEMORY_POC_ROOT", "/home/tom/soc_memory_poc")
def main() -> None:
parser = argparse.ArgumentParser(description="Generate an Obsidian case note from a normalized SOC case JSON file.")
parser.add_argument("--input", required=True, help="Normalized case JSON path")
parser.add_argument("--output-dir", default=None, help="Override Obsidian output directory")
parser.add_argument("--enrich-from-openviking", action="store_true", help="Enrich with OpenViking recommendations")
parser.add_argument("--top-k", type=int, default=3, help="Recommendation count per type")
parser.add_argument("--poc-root", default=DEFAULT_POC_ROOT, help="SOC Memory POC root")
args = parser.parse_args()
poc_root = Path(args.poc_root)
script_path = poc_root / "skills" / "summarize_case_skill" / "generate_case_note.py"
if not script_path.exists():
raise SystemExit(f"SOC Memory POC summarize script not found: {script_path}")
output_dir = args.output_dir or str(poc_root / "obsidian-vault" / "02_Cases")
cmd = [
sys.executable,
str(script_path),
"--input",
args.input,
"--output-dir",
output_dir,
"--top-k",
str(args.top_k),
]
if args.enrich_from_openviking:
cmd.append("--enrich-from-openviking")
env = os.environ.copy()
existing = env.get("PYTHONPATH", "")
env["PYTHONPATH"] = str(poc_root) + (os.pathsep + existing if existing else "")
subprocess.run(cmd, check=True, env=env)
if __name__ == "__main__":
main()

View File

@ -1,85 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import urllib.error
import urllib.request
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("SOC_MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("SOC_MEMORY_GATEWAY_API_KEY", "")
URI_PREFIXES = {
"case": "viking://resources/soc-memory-poc/case",
"knowledge": "viking://resources/soc-memory-poc/knowledge",
"all": "viking://resources/soc-memory-poc",
}
def post_json(url: str, payload: dict[str, Any], api_key: str = "") -> dict[str, Any]:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))
def canonicalize_uri(uri: str) -> str:
if ".json/" in uri:
return uri.split(".json/", 1)[0] + ".json"
return uri
def filter_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
deduped: dict[str, dict[str, Any]] = {}
for item in results:
uri = item.get("uri") or ""
canonical = canonicalize_uri(uri)
if not canonical.startswith(prefix):
continue
score = item.get("score") or 0
payload = dict(item)
payload["uri"] = canonical
if canonical not in deduped or score > (deduped[canonical].get("score") or 0):
deduped[canonical] = payload
return sorted(deduped.values(), key=lambda entry: entry.get("score") or 0, reverse=True)
def main() -> None:
parser = argparse.ArgumentParser(description="Search SOC Memory Gateway for case / knowledge context.")
parser.add_argument("--query", required=True, help="Search query")
parser.add_argument("--kind", choices=["case", "knowledge", "all"], default="all", help="SOC resource scope")
parser.add_argument("--limit", type=int, default=5, help="Max results")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL, help="Memory Gateway base URL")
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY, help="Gateway API key if required")
args = parser.parse_args()
prefix = URI_PREFIXES[args.kind]
payload = {
"query": args.query,
"limit": max(args.limit * 5, 10),
"uri": prefix,
}
try:
result = post_json(args.gateway_url.rstrip("/") + "/api/search", payload, api_key=args.api_key)
except urllib.error.URLError as exc:
raise SystemExit(f"Gateway search failed: {exc}") from exc
raw_results = result.get("results", [])
filtered = filter_results(raw_results, prefix)
output = {
"query": args.query,
"kind": args.kind,
"uri_prefix": prefix,
"results": filtered[: args.limit],
"total": len(filtered),
}
print(json.dumps(output, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,205 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import re
from pathlib import Path
from typing import Any
DEFAULT_POC_ROOT = os.environ.get("SOC_MEMORY_POC_ROOT", "/home/tom/soc_memory_poc")
DEFAULT_VAULT_ROOT = str(Path(DEFAULT_POC_ROOT) / "obsidian-vault")
TOKEN_RE = re.compile(r"[A-Za-z0-9_./:-]+")
SKIP_DIRS = {"05_Templates"}
SKIP_FILES = {"README.md"}
def tokenize(text: str) -> list[str]:
lowered = (text or "").lower()
tokens = TOKEN_RE.findall(lowered)
return [token for token in tokens if len(token) >= 3]
def parse_frontmatter(text: str) -> tuple[dict[str, str], str]:
if not text.startswith("---\n"):
return {}, text
parts = text.split("\n---\n", 1)
if len(parts) != 2:
return {}, text
raw_frontmatter = parts[0].splitlines()[1:]
body = parts[1]
data: dict[str, str] = {}
for line in raw_frontmatter:
if ":" not in line:
continue
key, value = line.split(":", 1)
data[key.strip()] = value.strip()
return data, body
def extract_title(body: str, fallback: str) -> str:
for line in body.splitlines():
if line.startswith("# "):
return line[2:].strip()
return fallback
def extract_section_text(body: str, heading: str) -> str:
lines = body.splitlines()
marker = f"## {heading}"
collecting = False
collected: list[str] = []
for line in lines:
if line.strip() == marker:
collecting = True
continue
if collecting and line.startswith("## "):
break
if collecting:
stripped = line.strip()
if stripped:
collected.append(stripped)
return " ".join(collected[:4]).strip()
def extract_tags(body: str) -> list[str]:
tags: list[str] = []
in_tag_section = False
for line in body.splitlines():
if line.strip() == "## 标签":
in_tag_section = True
continue
if in_tag_section and line.startswith("## "):
break
if in_tag_section:
for token in re.findall(r"#[^\s,]+", line):
tags.append(token)
return tags
def score_doc(query: str, tokens: list[str], doc: dict[str, Any]) -> tuple[int, list[str]]:
score = 0
matched: list[str] = []
path_text = f"{doc['relative_path']} {doc['file_name']}".lower()
title_text = doc["title"].lower()
summary_text = doc.get("summary", "").lower()
body_text = doc.get("body", "").lower()
frontmatter_text = " ".join(f"{k}:{v}" for k, v in doc.get("frontmatter", {}).items()).lower()
tags_text = " ".join(doc.get("tags", [])).lower()
if query and query.lower() in body_text:
score += 8
matched.append(query.lower())
case_id = doc.get("frontmatter", {}).get("case_id", "")
if case_id and case_id.lower() in query.lower():
score += 80
matched.append(case_id.lower())
scenario = doc.get("frontmatter", {}).get("scenario", "")
if scenario and scenario.lower() in query.lower():
score += 20
matched.append(scenario.lower())
for token in tokens:
token_hit = False
if token in title_text:
score += 12
token_hit = True
elif token in summary_text:
score += 7
token_hit = True
elif token in path_text:
score += 6
token_hit = True
elif token in frontmatter_text:
score += 5
token_hit = True
elif token in tags_text:
score += 4
token_hit = True
elif token in body_text:
score += 1
token_hit = True
if token_hit and token not in matched:
matched.append(token)
return score, matched[:8]
def load_docs(vault_root: str | Path) -> list[dict[str, Any]]:
vault_root = Path(vault_root)
docs: list[dict[str, Any]] = []
for path in sorted(vault_root.rglob("*.md")):
rel = path.relative_to(vault_root)
if any(part in SKIP_DIRS for part in rel.parts):
continue
if path.name in SKIP_FILES:
continue
text = path.read_text(encoding="utf-8")
frontmatter, body = parse_frontmatter(text)
docs.append(
{
"file_name": path.name,
"relative_path": str(rel),
"absolute_path": str(path),
"category": rel.parts[0] if rel.parts else "",
"directory": str(rel.parent),
"frontmatter": frontmatter,
"title": extract_title(body, path.stem),
"summary": extract_section_text(body, "告警摘要") or extract_section_text(body, "Summary"),
"tags": extract_tags(body),
"body": body,
}
)
return docs
def main() -> None:
parser = argparse.ArgumentParser(description="Search Obsidian SOC notes and return matching document references.")
parser.add_argument("--query", required=True, help="Search query")
parser.add_argument("--vault-root", default=DEFAULT_VAULT_ROOT, help="Obsidian vault root")
parser.add_argument("--limit", type=int, default=5, help="Maximum results")
parser.add_argument("--scenario", default="", help="Optional scenario filter")
args = parser.parse_args()
docs = load_docs(args.vault_root)
tokens = tokenize(args.query)
results: list[dict[str, Any]] = []
for doc in docs:
scenario = doc.get("frontmatter", {}).get("scenario", "")
if args.scenario and scenario != args.scenario:
continue
score, matched_terms = score_doc(args.query, tokens, doc)
if score <= 0:
continue
results.append(
{
"score": score,
"title": doc["title"],
"file_name": doc["file_name"],
"relative_path": doc["relative_path"],
"directory": doc["directory"],
"category": doc["category"],
"scenario": scenario,
"summary": doc.get("summary", ""),
"tags": doc.get("tags", []),
"matched_terms": matched_terms,
}
)
results.sort(key=lambda item: item["score"], reverse=True)
payload = {
"query": args.query,
"vault_root": str(Path(args.vault_root)),
"matched_docs": results[: args.limit],
}
print(json.dumps(payload, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,282 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("SOC_MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("SOC_MEMORY_GATEWAY_API_KEY", "")
DEFAULT_POC_ROOT = os.environ.get("SOC_MEMORY_POC_ROOT", "/home/tom/soc_memory_poc")
DEFAULT_VAULT_ROOT = str(Path(DEFAULT_POC_ROOT) / "obsidian-vault")
CASE_URI = "viking://resources/soc-memory-poc/case"
KNOWLEDGE_URI = "viking://resources/soc-memory-poc/knowledge"
def post_json(url: str, payload: dict[str, Any], api_key: str = "") -> dict[str, Any]:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))
def canonicalize_uri(uri: str) -> str:
if ".json/" in uri:
return uri.split(".json/", 1)[0] + ".json"
return uri
def filter_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
deduped: dict[str, dict[str, Any]] = {}
for item in results:
uri = item.get("uri") or ""
canonical = canonicalize_uri(uri)
if not canonical.startswith(prefix):
continue
score = item.get("score") or 0
payload = dict(item)
payload["uri"] = canonical
if canonical not in deduped or score > (deduped[canonical].get("score") or 0):
deduped[canonical] = payload
return sorted(deduped.values(), key=lambda entry: entry.get("score") or 0, reverse=True)
def gateway_search(query: str, uri: str, limit: int, gateway_url: str, api_key: str) -> list[dict[str, Any]]:
payload = {"query": query, "limit": max(limit * 5, 10), "uri": uri}
raw = post_json(gateway_url.rstrip("/") + "/api/search", payload, api_key=api_key)
return filter_results(raw.get("results", []), uri)[:limit]
def obsidian_search(query: str, scenario: str, limit: int, vault_root: str) -> dict[str, Any]:
from search_obsidian_docs import load_docs, score_doc, tokenize
docs = load_docs(vault_root)
tokens = tokenize(query)
results: list[dict[str, Any]] = []
for doc in docs:
doc_scenario = doc.get("frontmatter", {}).get("scenario", "")
if scenario and doc_scenario != scenario:
continue
score, matched_terms = score_doc(query, tokens, doc)
if score <= 0:
continue
results.append(
{
"score": score,
"title": doc["title"],
"file_name": doc["file_name"],
"relative_path": doc["relative_path"],
"directory": doc["directory"],
"absolute_path": str(Path(vault_root) / doc["relative_path"]),
"summary": doc.get("summary", ""),
"matched_terms": matched_terms,
}
)
results.sort(key=lambda item: item["score"], reverse=True)
return {"matched_docs": results[:limit]}
def build_query(args: argparse.Namespace) -> str:
parts = [
args.scenario,
args.alert_type,
args.user,
args.host,
args.sender,
args.subject,
args.attachment,
args.url,
args.ip,
args.summary,
]
parts.extend(args.fact)
return " ".join(part.strip() for part in parts if part and part.strip())
def bullet(lines: list[str], fallback: str) -> str:
if not lines:
return f"- {fallback}"
return "\n".join(f"- {line}" for line in lines)
def top_results(items: list[dict[str, Any]], limit: int = 3) -> list[dict[str, Any]]:
return items[:limit]
def has_fact(args: argparse.Namespace, needle: str) -> bool:
haystacks = [args.summary, args.subject, args.alert_type, *args.fact]
lowered = needle.lower()
return any(lowered in (item or "").lower() for item in haystacks)
def summarize_evidence(args: argparse.Namespace) -> list[str]:
evidence: list[str] = []
if args.subject:
evidence.append(f"邮件主题/诱饵:{args.subject}")
if args.attachment:
evidence.append(f"恶意附件:{args.attachment}")
if args.url:
evidence.append(f"可疑链接:{args.url}")
if args.sender:
evidence.append(f"发件人:{args.sender}")
if args.ip:
evidence.append(f"相关 IP{args.ip}")
for fact in args.fact[:4]:
evidence.append(fact)
return evidence[:6]
def uri_to_id(uri: str) -> str:
return uri.rsplit('/', 1)[-1].replace('.json', '')
def infer_assessment(args: argparse.Namespace, case_results: list[dict[str, Any]]) -> str:
top_case = case_results[0] if case_results else None
if args.scenario == "phishing":
if args.url and args.attachment and (has_fact(args, "dmarc failed") or has_fact(args, "clicked")):
base = "当前告警高度符合凭证收割型钓鱼攻击特征,属于高可信 True Positive且存在凭证泄露风险。"
elif args.url or args.attachment:
base = "当前告警具备明显钓鱼迹象,尤其是附件与落地页组合,倾向于高风险钓鱼事件。"
else:
base = "当前告警呈现出邮件钓鱼模式,但仍需补充落地页、附件和用户交互证据进一步确认。"
elif args.scenario == "o365_suspicious_login":
if has_fact(args, "impossible travel") and (has_fact(args, "mfa fatigue") or has_fact(args, "inbox rule") or has_fact(args, "oauth")):
base = "当前告警高度符合 O365 账号接管链路,属于高可信身份威胁事件。"
else:
base = "当前告警表现为异常身份登录需要结合登录轨迹、MFA 和邮箱规则进一步确认是否账号接管。"
else:
base = "当前告警具备明显的可疑特征,需要结合历史案例和关联知识继续判断。"
if top_case:
return base + f" 最相近的历史案例为 `{uri_to_id(top_case.get('uri', ''))}`,说明当前 case 与既有攻击模式存在明显重合。"
return base
def format_memory_results(case_results: list[dict[str, Any]], knowledge_results: list[dict[str, Any]]) -> str:
lines: list[str] = []
for item in top_results(case_results, 2):
uri = item.get("uri", "")
abstract = (item.get("abstract") or "").strip()
snippet = abstract[:140] + "..." if len(abstract) > 140 else abstract
lines.append(f"`{uri_to_id(uri)}`{uri})— {snippet}")
for item in top_results(knowledge_results, 2):
uri = item.get("uri", "")
abstract = (item.get("abstract") or "").strip()
snippet = abstract[:140] + "..." if len(abstract) > 140 else abstract
lines.append(f"`{uri_to_id(uri)}`{uri})— {snippet}")
return bullet(lines, "未检索到直接关联的 Memory 条目")
def format_obsidian_results(obsidian_docs: list[dict[str, Any]]) -> str:
lines = []
for doc in top_results(obsidian_docs, 3):
reason = doc.get("summary") or ", ".join(doc.get("matched_terms", [])) or "与当前场景相关"
lines.append(
f"`{doc['file_name']}` — `obsidian-vault/{doc['relative_path']}` "
f"absolute: `{doc['absolute_path']}`)— {reason}"
)
return bullet(lines, "未找到直接关联的 Obsidian 文档")
def recommend_actions(args: argparse.Namespace, case_results: list[dict[str, Any]]) -> list[str]:
actions: list[str] = []
if args.scenario == "phishing":
actions.extend([
"检查用户是否已点击链接或提交凭据,必要时立即重置账号并撤销会话。",
"搜索同主题、同发件人、同 URL 或同附件的邮件是否已投递给其他用户。",
"封锁相关域名、URL 和可疑 IP并保留附件样本用于沙箱分析。",
"如邮件面向财务或高价值角色,优先排查是否存在 BEC 或后续横向利用。",
])
elif args.scenario == "o365_suspicious_login":
actions.extend([
"复核登录日志、MFA 记录和后续邮箱规则 / OAuth 变更。",
"若确认账号接管迹象,立即重置凭据并撤销所有活跃会话。",
"检查同源 IP、同设备指纹和同时间窗口内的其他用户活动。",
"对邮箱转发、隐藏规则、恶意 OAuth 授权进行专项排查。",
])
else:
actions.append("基于当前高风险迹象继续扩充调查和处置。")
if case_results:
actions.append("对照最相近历史案例,复用已有 IOC 和调查路径。")
return actions[:5]
def main() -> None:
parser = argparse.ArgumentParser(description="Run a structured SOC triage using memory retrieval and Obsidian lookup.")
parser.add_argument("--scenario", required=True, help="Scenario, e.g. phishing or o365_suspicious_login")
parser.add_argument("--alert-type", default="", help="Alert type")
parser.add_argument("--user", default="", help="Target user")
parser.add_argument("--host", default="", help="Target host")
parser.add_argument("--sender", default="", help="Sender email")
parser.add_argument("--subject", default="", help="Email subject or short title")
parser.add_argument("--attachment", default="", help="Attachment name")
parser.add_argument("--url", default="", help="Suspicious URL")
parser.add_argument("--ip", default="", help="Relevant IP")
parser.add_argument("--summary", default="", help="One-sentence alert summary")
parser.add_argument("--fact", action="append", default=[], help="Additional known fact; repeatable")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL, help="Memory Gateway URL")
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY, help="Memory Gateway API key")
parser.add_argument("--vault-root", default=DEFAULT_VAULT_ROOT, help="Obsidian vault root")
parser.add_argument("--limit", type=int, default=5, help="Search limit")
args = parser.parse_args()
query = build_query(args)
case_results: list[dict[str, Any]] = []
knowledge_results: list[dict[str, Any]] = []
obsidian_docs: list[dict[str, Any]] = []
memory_error = ""
obsidian_error = ""
try:
case_results = gateway_search(query, CASE_URI, args.limit, args.gateway_url, args.api_key)
knowledge_results = gateway_search(query, KNOWLEDGE_URI, args.limit, args.gateway_url, args.api_key)
except urllib.error.URLError as exc:
memory_error = f"Memory Gateway 不可用:{exc}"
try:
obsidian_resp = obsidian_search(query, args.scenario, args.limit, args.vault_root)
obsidian_docs = obsidian_resp.get("matched_docs", [])
except Exception as exc: # noqa: BLE001
obsidian_error = f"Obsidian 检索失败:{exc}"
lines = [
"## 研判结果",
infer_assessment(args, case_results),
"",
"## 关键证据",
bullet(summarize_evidence(args), "当前输入只提供了有限证据,需要继续补充调查信息"),
"",
"## 关联 Memory Retrieval",
]
if memory_error:
lines.append(f"- {memory_error}")
else:
lines.append(format_memory_results(case_results, knowledge_results))
lines.extend([
"",
"## 关联 Obsidian 文档",
])
if obsidian_error:
lines.append(f"- {obsidian_error}")
else:
lines.append(format_obsidian_results(obsidian_docs))
lines.extend([
"",
"## 建议动作",
bullet(recommend_actions(args, case_results), "继续补充告警细节后再执行更精确的响应动作"),
])
print("\n".join(lines))
if __name__ == "__main__":
main()

View File

@ -1,201 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import re
import subprocess
import sys
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
TRIAGE_ALERT = SCRIPT_DIR / "triage_alert.py"
EMAIL_RE = re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}")
URL_RE = re.compile(r"https?://[^\s<>\"]+")
IP_RE = re.compile(r"\b(?:\d{1,3}\.){3}\d{1,3}\b")
HOST_RE = re.compile(r"\b[A-Z]{2,}(?:-[A-Z0-9]+)+\b")
ATTACHMENT_RE = re.compile(r"\b[\w.-]+\.(?:html|htm|pdf|zip|docx|xlsx|eml)\b", re.IGNORECASE)
HEADER_RE = re.compile(
r"^(From|To|Subject|Attachment|URL|IP|Host|User|Alert type|Scenario)\s*:\s*(.+)$",
re.IGNORECASE | re.MULTILINE,
)
def first_nonempty(*values: str) -> str:
for value in values:
if value and value.strip():
return value.strip()
return ""
def load_text(args: argparse.Namespace) -> str:
if args.file:
return Path(args.file).read_text(encoding="utf-8")
if args.text:
return args.text
data = sys.stdin.read()
if data.strip():
return data
return ""
def find_header(text: str, name: str) -> str:
for key, value in HEADER_RE.findall(text):
if key.lower() == name.lower():
return value.strip()
return ""
def unique_matches(pattern: re.Pattern[str], text: str) -> list[str]:
seen: list[str] = []
for match in pattern.findall(text):
if match not in seen:
seen.append(match)
return seen
def infer_scenario(text: str, explicit_scenario: str = "", explicit_alert_type: str = "") -> tuple[str, str]:
if explicit_scenario:
return explicit_scenario, explicit_alert_type
lowered = text.lower()
if any(token in lowered for token in ["impossible travel", "mfa fatigue", "oauth consent", "inbox rule", "entra", "azuread", "sign-in", "signin"]):
alert_type = explicit_alert_type or ("azuread_impossible_travel" if "impossible travel" in lowered else "o365_suspicious_login")
return "o365_suspicious_login", alert_type
if any(token in lowered for token in ["phishing", "invoice", "attachment", "credential harvest", "fake microsoft 365", "dmarc", "mail_suspicious", "wire transfer"]):
if explicit_alert_type:
return "phishing", explicit_alert_type
if "wire transfer" in lowered or "executive impersonation" in lowered or "bec" in lowered:
return "phishing", "mail_bec_impersonation"
if "link" in lowered and "attachment" not in lowered:
return "phishing", "mail_suspicious_link"
return "phishing", "mail_suspicious_attachment"
return "phishing", explicit_alert_type
def collect_facts(text: str, provided: list[str]) -> list[str]:
facts: list[str] = []
for fact in provided:
if fact and fact not in facts:
facts.append(fact)
lowered = text.lower()
fact_patterns = [
("DMARC failed", ["dmarc failed"]),
("SPF failed", ["spf failed"]),
("User may have clicked the link", ["clicked", "user clicked"]),
("Credential submission suspected", ["submitted credentials", "credential submission", "entered credentials"]),
("Impossible travel observed", ["impossible travel"]),
("MFA fatigue observed", ["mfa fatigue", "repeated mfa"]),
("Inbox rule creation observed", ["inbox rule"]),
("OAuth consent activity observed", ["oauth consent"]),
]
for label, needles in fact_patterns:
if any(needle in lowered for needle in needles) and label not in facts:
facts.append(label)
for line in text.splitlines():
stripped = line.strip("-* \t")
if not stripped or len(stripped) > 160:
continue
lower = stripped.lower()
if any(word in lower for word in ["dmarc", "spf", "clicked", "credential", "impossible travel", "mfa", "inbox rule", "oauth"]):
if stripped not in facts:
facts.append(stripped)
return facts[:8]
def build_summary(text: str, subject: str, provided_summary: str = "") -> str:
if provided_summary:
return provided_summary[:240]
if subject:
return subject[:180]
for line in text.splitlines():
stripped = line.strip()
if len(stripped) >= 20 and ":" not in stripped[:20]:
return stripped[:240]
return text.strip()[:240]
def parse_input(args: argparse.Namespace) -> dict[str, str | list[str]]:
text = load_text(args)
scenario, alert_type = infer_scenario(text, args.scenario, args.alert_type)
emails = unique_matches(EMAIL_RE, text)
urls = unique_matches(URL_RE, text)
ips = unique_matches(IP_RE, text)
hosts = unique_matches(HOST_RE, text)
attachments = unique_matches(ATTACHMENT_RE, text)
sender = first_nonempty(args.sender, find_header(text, "From"), emails[0] if emails else "")
user = first_nonempty(args.user, find_header(text, "User"), find_header(text, "To"), emails[1] if len(emails) > 1 else "")
subject = first_nonempty(args.subject, find_header(text, "Subject"))
attachment = first_nonempty(args.attachment, find_header(text, "Attachment"), attachments[0] if attachments else "")
url = first_nonempty(args.url, find_header(text, "URL"), urls[0] if urls else "")
ip = first_nonempty(args.ip, find_header(text, "IP"), ips[0] if ips else "")
host = first_nonempty(args.host, find_header(text, "Host"), hosts[0] if hosts else "")
summary = build_summary(text, subject, args.summary)
facts = collect_facts(text, args.fact)
return {
"scenario": scenario,
"alert_type": alert_type,
"user": user,
"host": host,
"sender": sender,
"subject": subject,
"attachment": attachment,
"url": url,
"ip": ip,
"summary": summary,
"facts": facts,
}
def run_triage(parsed: dict[str, str | list[str]], limit: int) -> None:
cmd = [
sys.executable,
str(TRIAGE_ALERT),
"--scenario", str(parsed["scenario"]),
"--alert-type", str(parsed["alert_type"]),
"--user", str(parsed["user"]),
"--host", str(parsed["host"]),
"--sender", str(parsed["sender"]),
"--subject", str(parsed["subject"]),
"--attachment", str(parsed["attachment"]),
"--url", str(parsed["url"]),
"--ip", str(parsed["ip"]),
"--summary", str(parsed["summary"]),
"--limit", str(limit),
]
for fact in parsed["facts"]:
cmd.extend(["--fact", str(fact)])
subprocess.run(cmd, check=True, env=os.environ.copy())
def main() -> None:
parser = argparse.ArgumentParser(description="Unified SOC alert/email triage entrypoint with memory and Obsidian retrieval.")
parser.add_argument("--text", help="Raw email, ticket text, or freeform alert text")
parser.add_argument("--file", help="Path to a raw email/ticket/alert text file")
parser.add_argument("--scenario", default="", help="Optional scenario override")
parser.add_argument("--alert-type", default="", help="Optional alert type override")
parser.add_argument("--user", default="", help="Optional user override")
parser.add_argument("--host", default="", help="Optional host override")
parser.add_argument("--sender", default="", help="Optional sender override")
parser.add_argument("--subject", default="", help="Optional subject override")
parser.add_argument("--attachment", default="", help="Optional attachment override")
parser.add_argument("--url", default="", help="Optional URL override")
parser.add_argument("--ip", default="", help="Optional IP override")
parser.add_argument("--summary", default="", help="Optional summary override")
parser.add_argument("--fact", action="append", default=[], help="Additional known fact; repeatable")
parser.add_argument("--limit", type=int, default=5, help="Search limit")
args = parser.parse_args()
parsed = parse_input(args)
run_triage(parsed, args.limit)
if __name__ == "__main__":
main()

View File

@ -1,13 +0,0 @@
#!/usr/bin/env python3
from __future__ import annotations
import os
import subprocess
import sys
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
TRIAGE_EMAIL = SCRIPT_DIR / "triage_email.py"
if __name__ == "__main__":
subprocess.run([sys.executable, str(TRIAGE_EMAIL), *sys.argv[1:]], check=True, env=os.environ.copy())

View File

@ -6,7 +6,7 @@ from typing import Optional
import yaml import yaml
from pydantic import ValidationError from pydantic import ValidationError
from .types import Config, ServerConfig, OpenVikingConfig, MemoryConfig, LoggingConfig from .types import Config, ServerConfig, OpenVikingConfig, MemoryConfig, LoggingConfig, LLMConfig, ObsidianConfig
def load_config(config_path: Optional[str] = None) -> Config: def load_config(config_path: Optional[str] = None) -> Config:
@ -32,6 +32,8 @@ def load_config(config_path: Optional[str] = None) -> Config:
openviking=OpenVikingConfig(**data.get("openviking", {})), openviking=OpenVikingConfig(**data.get("openviking", {})),
memory=MemoryConfig(**data.get("memory", {})), memory=MemoryConfig(**data.get("memory", {})),
logging=LoggingConfig(**data.get("logging", {})), logging=LoggingConfig(**data.get("logging", {})),
llm=LLMConfig(**data.get("llm", {})),
obsidian=ObsidianConfig(**data.get("obsidian", {})),
) )
except (ValidationError, yaml.YAMLError) as e: except (ValidationError, yaml.YAMLError) as e:
print(f"配置文件解析错误: {e}") print(f"配置文件解析错误: {e}")

View File

@ -0,0 +1,87 @@
"""Document ingestion helpers for Memory Gateway."""
from __future__ import annotations
import re
from datetime import datetime, timezone
from pathlib import Path
def slugify(value: str, fallback: str = "document") -> str:
slug = re.sub(r"[^a-zA-Z0-9\u4e00-\u9fff_-]+", "-", value.lower()).strip("-")
slug = re.sub(r"-+", "-", slug)[:100].strip("-")
return slug or fallback
def convert_file_to_markdown(file_path: str | Path) -> str:
"""Convert a local document to Markdown using Microsoft MarkItDown."""
try:
from markitdown import MarkItDown
except ModuleNotFoundError as exc:
raise RuntimeError("markitdown is not installed. Install with: pip install 'markitdown[all]'") from exc
file_path = Path(file_path)
converter = MarkItDown(enable_plugins=False)
if hasattr(converter, "convert_local"):
result = converter.convert_local(str(file_path))
else:
result = converter.convert(str(file_path))
markdown = getattr(result, "text_content", "") or ""
if not markdown.strip():
raise RuntimeError("Document conversion produced empty Markdown")
return markdown
def build_markdown_note(
*,
title: str,
markdown: str,
source_filename: str,
tags: list[str],
knowledge_type: str,
summary: str | None = None,
) -> str:
tag_text = ", ".join(tags)
frontmatter = [
"---",
f"title: {title}",
f"knowledge_type: {knowledge_type}",
f"source_filename: {source_filename}",
f"created_at: {datetime.now(timezone.utc).isoformat()}",
f"tags: [{tag_text}]" if tag_text else "tags: []",
]
if summary:
escaped = summary.replace('"', '\\"')
frontmatter.append(f'summary: "{escaped}"')
frontmatter.extend(["---", "", f"# {title}", "", markdown.strip(), ""])
return "\n".join(frontmatter)
def save_markdown_to_obsidian(
*,
vault_path: str | Path,
relative_dir: str,
title: str,
markdown: str,
source_filename: str,
tags: list[str],
knowledge_type: str,
summary: str | None = None,
) -> Path:
vault = Path(vault_path)
target_dir = vault / relative_dir.strip("/")
target_dir.mkdir(parents=True, exist_ok=True)
digest = slugify(source_filename.rsplit(".", 1)[0] or title)
note_name = f"{slugify(title, digest)}.md"
target = target_dir / note_name
target.write_text(
build_markdown_note(
title=title,
markdown=markdown,
source_filename=source_filename,
tags=tags,
knowledge_type=knowledge_type,
summary=summary,
),
encoding="utf-8",
)
return target

158
memory_gateway/llm.py Normal file
View File

@ -0,0 +1,158 @@
"""LLM helpers for Memory Gateway summaries."""
from __future__ import annotations
import json
import os
import re
from typing import Any
import httpx
from .config import get_config
class LLMConfigurationError(RuntimeError):
"""Raised when LLM summarization is requested but not configured."""
class LLMSummaryError(RuntimeError):
"""Raised when the LLM response cannot be used."""
def _llm_settings() -> dict[str, Any]:
config = get_config()
llm_config = getattr(config, "llm", None)
base_url = (
os.environ.get("MEMORY_GATEWAY_LLM_BASE_URL")
or os.environ.get("OPENAI_BASE_URL")
or getattr(llm_config, "base_url", "")
or "https://api.openai.com/v1"
).rstrip("/")
api_key = (
os.environ.get("MEMORY_GATEWAY_LLM_API_KEY")
or os.environ.get("OPENAI_API_KEY")
or getattr(llm_config, "api_key", "")
)
model = (
os.environ.get("MEMORY_GATEWAY_LLM_MODEL")
or os.environ.get("OPENAI_MODEL")
or getattr(llm_config, "model", "")
)
timeout = int(os.environ.get("MEMORY_GATEWAY_LLM_TIMEOUT") or getattr(llm_config, "timeout", 60))
max_input_chars = int(os.environ.get("MEMORY_GATEWAY_LLM_MAX_INPUT_CHARS") or getattr(llm_config, "max_input_chars", 24000))
return {
"base_url": base_url,
"api_key": api_key,
"model": model,
"timeout": timeout,
"max_input_chars": max_input_chars,
}
def _extract_json(text: str) -> dict[str, Any]:
text = text.strip()
if text.startswith("```"):
text = re.sub(r"^```(?:json)?\s*", "", text)
text = re.sub(r"\s*```$", "", text)
try:
return json.loads(text)
except json.JSONDecodeError:
match = re.search(r"\{.*\}", text, flags=re.S)
if not match:
raise LLMSummaryError("LLM did not return JSON") from None
return json.loads(match.group(0))
def _coerce_string_list(value: Any, limit: int = 12) -> list[str]:
if not isinstance(value, list):
return []
items: list[str] = []
for item in value:
if item is None:
continue
text = str(item).strip()
if text and text not in items:
items.append(text[:300])
if len(items) >= limit:
break
return items
async def summarize_with_llm(
content: str,
*,
title: str | None = None,
summary_hint: str | None = None,
tags: list[str] | None = None,
max_summary_chars: int = 800,
purpose: str = "generic knowledge memory",
) -> dict[str, Any]:
"""Summarize content using an OpenAI-compatible chat completions API."""
settings = _llm_settings()
if not settings["model"]:
raise LLMConfigurationError("LLM model is not configured. Set MEMORY_GATEWAY_LLM_MODEL or llm.model.")
if not settings["api_key"] and not settings["base_url"].startswith(("http://127.0.0.1", "http://localhost")):
raise LLMConfigurationError("LLM API key is not configured. Set MEMORY_GATEWAY_LLM_API_KEY or OPENAI_API_KEY.")
trimmed = content[: settings["max_input_chars"]]
tag_text = ", ".join(tags or [])
system_prompt = (
"You are a precise knowledge curator. Summarize input into reusable memory. "
"Return only valid JSON with these keys: title, summary, key_points, tags. "
"summary must be concise but specific; key_points must be reusable, evidence-based bullets. "
"Do not invent facts not present in the input. Preserve important identifiers, paths, URLs, IPs, IDs, and verdicts."
)
user_prompt = f"""
Purpose: {purpose}
Provided title: {title or ''}
Provided summary hint: {summary_hint or ''}
Provided tags: {tag_text}
Max summary characters: {max_summary_chars}
Content:
{trimmed}
""".strip()
headers = {"Content-Type": "application/json"}
if settings["api_key"]:
headers["Authorization"] = f"Bearer {settings['api_key']}"
payload = {
"model": settings["model"],
"messages": [
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
"temperature": 0.2,
"response_format": {"type": "json_object"},
}
async with httpx.AsyncClient(timeout=settings["timeout"]) as client:
response = await client.post(f"{settings['base_url']}/chat/completions", headers=headers, json=payload)
response.raise_for_status()
data = response.json()
try:
content_text = data["choices"][0]["message"]["content"]
except (KeyError, IndexError, TypeError) as exc:
raise LLMSummaryError(f"Unexpected LLM response shape: {data}") from exc
parsed = _extract_json(content_text)
merged_tags = []
for tag in [*(tags or []), *_coerce_string_list(parsed.get("tags"), limit=8)]:
tag = str(tag).strip()
if tag and tag not in merged_tags:
merged_tags.append(tag)
summary = str(parsed.get("summary") or "").strip()
return {
"title": str(parsed.get("title") or title or "Untitled summary").strip()[:160],
"summary": summary[:max(120, max_summary_chars)],
"key_points": _coerce_string_list(parsed.get("key_points"), limit=10),
"tags": merged_tags,
"llm": {
"provider": "openai-compatible",
"base_url": settings["base_url"],
"model": settings["model"],
},
}

View File

@ -1,4 +1,4 @@
"""OpenViking client wrapper used by the SOC Memory POC.""" """OpenViking client wrapper used by Memory Gateway."""
from __future__ import annotations from __future__ import annotations
import json import json

View File

@ -1,14 +1,19 @@
"""Memory Gateway MCP Server. """Memory Gateway MCP Server.
基于 Model Context Protocol 的记忆网关服务,为局域网内的 AI Agent 提供统一的 OpenViking 访问入口。 通用 Memory Gateway 服务,为 AI agent / harness 提供统一的 OpenViking 记忆检索、总结和知识沉淀入口。
""" """
import asyncio import asyncio
import hashlib
import json import json
import logging import logging
import re
import tempfile
from datetime import datetime, timezone
from contextlib import asynccontextmanager from contextlib import asynccontextmanager
from pathlib import Path
from typing import Any, Optional from typing import Any, Optional
from fastapi import APIRouter, Depends, FastAPI, Header, HTTPException, Request, status from fastapi import APIRouter, Depends, FastAPI, File, Form, Header, HTTPException, Request, UploadFile, status
from fastapi.responses import JSONResponse from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware from fastapi.middleware.cors import CORSMiddleware
from mcp.server import Server from mcp.server import Server
@ -17,7 +22,9 @@ from sse_starlette import EventSourceResponse
from .config import get_config, set_config, Config from .config import get_config, set_config, Config
from .openviking_client import get_openviking_client, close_openviking_client from .openviking_client import get_openviking_client, close_openviking_client
from .types import SearchRequest, AddMemoryRequest, AddResourceRequest from .document_ingest import convert_file_to_markdown, save_markdown_to_obsidian, slugify
from .llm import LLMConfigurationError, LLMSummaryError, summarize_with_llm
from .types import SearchRequest, AddMemoryRequest, AddResourceRequest, CommitSummaryRequest
# 配置日志 # 配置日志
logging.basicConfig( logging.basicConfig(
@ -75,6 +82,27 @@ async def list_tools() -> list[Tool]:
"required": ["uri", "content"], "required": ["uri", "content"],
}, },
), ),
Tool(
name="commit_summary",
description="总结一段通用内容并按需沉淀为 OpenViking memory/resource",
inputSchema={
"type": "object",
"properties": {
"content": {"type": "string", "description": "需要总结和沉淀的原文内容"},
"title": {"type": "string", "description": "标题(可选)"},
"summary": {"type": "string", "description": "人工提供的摘要(可选)"},
"namespace": {"type": "string", "description": "OpenViking memory namespace可选"},
"memory_type": {"type": "string", "description": "记忆类型,默认 summary"},
"tags": {"type": "array", "items": {"type": "string"}, "description": "标签列表"},
"source": {"type": "string", "description": "来源说明或外部链接"},
"resource_uri": {"type": "string", "description": "写入 resource 的 URI可选"},
"resource_type": {"type": "string", "description": "资源类型,默认 json"},
"persist_as": {"type": "string", "enum": ["memory", "resource", "both", "none"], "description": "沉淀方式"},
"max_summary_chars": {"type": "integer", "description": "摘要最大长度"},
},
"required": ["content"],
},
),
Tool( Tool(
name="get_status", name="get_status",
description="检查系统状态", description="检查系统状态",
@ -140,6 +168,11 @@ async def call_tool(name: str, arguments: Any) -> list[TextContent]:
) )
return [TextContent(type="text", text=str(result))] return [TextContent(type="text", text=str(result))]
elif name == "commit_summary":
request = CommitSummaryRequest(**arguments)
result = await commit_summary_to_openviking(request)
return [TextContent(type="text", text=json.dumps(result, ensure_ascii=False))]
elif name == "get_status": elif name == "get_status":
ov_status = await ov_client.health_check() ov_status = await ov_client.health_check()
return [TextContent(type="text", text=f"Memory Gateway: OK\nOpenViking: {ov_status}")] return [TextContent(type="text", text=f"Memory Gateway: OK\nOpenViking: {ov_status}")]
@ -201,6 +234,155 @@ def verify_api_key(x_api_key: Optional[str] = Header(default=None)) -> None:
) )
_SENTENCE_RE = re.compile(r"(?<=[。!?.!?])\s+")
_WORD_RE = re.compile(r"[^a-zA-Z0-9\u4e00-\u9fff_-]+")
def _normalize_whitespace(value: str) -> str:
return re.sub(r"\s+", " ", value).strip()
def _slugify(value: str, fallback: str) -> str:
slug = _WORD_RE.sub("-", value.lower()).strip("-")
slug = re.sub(r"-+", "-", slug)[:80].strip("-")
return slug or fallback
def _derive_title(content: str, title: Optional[str]) -> str:
if title and title.strip():
return title.strip()
for line in content.splitlines():
line = line.strip("# -*\t")
if line:
return line[:120]
return "Untitled summary"
def _derive_summary(content: str, provided: Optional[str], max_chars: int) -> str:
if provided and provided.strip():
return provided.strip()[:max_chars]
normalized = _normalize_whitespace(content)
if not normalized:
return ""
sentences = [part.strip() for part in _SENTENCE_RE.split(normalized) if part.strip()]
if not sentences:
return normalized[:max_chars]
summary = " ".join(sentences[:3])
return summary[:max_chars]
def _extract_key_points(content: str, limit: int = 8) -> list[str]:
points: list[str] = []
for raw_line in content.splitlines():
line = raw_line.strip()
if not line:
continue
stripped = re.sub(r"^(?:[-*•]\s*|\d+[.、)]\s*)", "", line).strip()
if not stripped:
continue
is_structured = line.startswith(("-", "*", "")) or re.match(r"^\d+[.、)]\s+", line)
has_signal = any(token in stripped.lower() for token in [
"verdict", "result", "finding", "evidence", "action", "risk", "ioc",
"结论", "结果", "证据", "建议", "动作", "风险", "命中", "关联",
])
if is_structured or has_signal:
point = _normalize_whitespace(stripped)
if point and point not in points:
points.append(point[:240])
if len(points) >= limit:
break
if points:
return points
summary = _derive_summary(content, None, 500)
return [summary] if summary else []
def _render_memory_text(artifact: dict[str, Any]) -> str:
lines = [
f"Title: {artifact['title']}",
f"Summary: {artifact['summary']}",
]
if artifact.get("tags"):
lines.append("Tags: " + ", ".join(artifact["tags"]))
if artifact.get("source"):
lines.append("Source: " + artifact["source"])
if artifact.get("key_points"):
lines.append("Key points:")
lines.extend(f"- {point}" for point in artifact["key_points"])
return "\n".join(lines)
def _default_summary_resource_uri(request: CommitSummaryRequest, title: str) -> str:
namespace = (request.namespace or get_config().memory.default_namespace or "general").strip("/")
memory_type = (request.memory_type or "summary").strip("/")
digest = hashlib.sha1(request.content.encode("utf-8")).hexdigest()[:12]
slug = _slugify(title, digest)
return f"viking://resources/{namespace}/{memory_type}/{slug}-{digest}.json"
async def build_summary_artifact(request: CommitSummaryRequest) -> dict[str, Any]:
max_chars = max(120, min(request.max_summary_chars, 4000))
llm_result = await summarize_with_llm(
request.content,
title=request.title,
summary_hint=request.summary,
tags=request.tags,
max_summary_chars=max_chars,
purpose=request.purpose or "generic knowledge memory",
)
title = llm_result.get("title") or _derive_title(request.content, request.title)
return {
"schema_version": "memory-gateway.summary.v1",
"id": hashlib.sha1(request.content.encode("utf-8")).hexdigest()[:16],
"title": title,
"summary": llm_result.get("summary", ""),
"key_points": llm_result.get("key_points", []),
"tags": llm_result.get("tags", request.tags),
"source": request.source,
"namespace": request.namespace or get_config().memory.default_namespace,
"memory_type": request.memory_type or "summary",
"created_at": datetime.now(timezone.utc).isoformat(),
"content": request.content,
"llm": llm_result.get("llm"),
}
async def commit_summary_to_openviking(request: CommitSummaryRequest) -> dict[str, Any]:
artifact = await build_summary_artifact(request)
ov_client = await get_openviking_client()
memory_result: Optional[dict[str, Any]] = None
resource_result: Optional[dict[str, Any]] = None
if request.persist_as in {"memory", "both"}:
memory_result = await ov_client.add_memory(
content=_render_memory_text(artifact),
namespace=artifact["namespace"],
memory_type=artifact["memory_type"],
)
if request.persist_as in {"resource", "both"}:
resource_uri = request.resource_uri or _default_summary_resource_uri(request, artifact["title"])
artifact["resource_uri"] = resource_uri
resource_result = await ov_client.add_resource(
uri=resource_uri,
content=json.dumps(artifact, ensure_ascii=False, indent=2),
resource_type=request.resource_type or "json",
)
return {
"status": "ok",
"artifact": artifact,
"memory_result": memory_result,
"resource_result": resource_result,
}
# FastAPI 应用 # FastAPI 应用
app = FastAPI(title="Memory Gateway", version="0.1.0", lifespan=lifespan) app = FastAPI(title="Memory Gateway", version="0.1.0", lifespan=lifespan)
@ -346,6 +528,136 @@ async def api_add_resource(request: AddResourceRequest):
return result return result
@app.post("/api/summary", dependencies=[Depends(verify_api_key)])
async def api_commit_summary(request: CommitSummaryRequest):
"""REST API: 通用内容 LLM 总结与记忆沉淀。"""
try:
return await commit_summary_to_openviking(request)
except LLMConfigurationError as exc:
raise HTTPException(status_code=503, detail=str(exc)) from exc
except (LLMSummaryError, Exception) as exc:
if isinstance(exc, HTTPException):
raise
raise HTTPException(status_code=502, detail=f"LLM summary failed: {exc}") from exc
def _parse_tags(tags: str | None) -> list[str]:
if not tags:
return []
return [tag.strip() for tag in re.split(r"[,\n]", tags) if tag.strip()]
def _default_knowledge_uri(namespace: str, knowledge_type: str, title: str, content: str) -> str:
digest = hashlib.sha1(content.encode("utf-8")).hexdigest()[:12]
return f"viking://resources/{namespace.strip('/')}/knowledge/{knowledge_type.strip('/')}/{slugify(title, digest)}-{digest}.json"
@app.post("/api/knowledge/upload", dependencies=[Depends(verify_api_key)])
async def api_upload_knowledge(
file: UploadFile = File(...),
title: Optional[str] = Form(default=None),
namespace: str = Form(default="memory-gateway"),
knowledge_type: str = Form(default="knowledge"),
tags: str = Form(default=""),
source: Optional[str] = Form(default=None),
obsidian_dir: Optional[str] = Form(default=None),
resource_uri: Optional[str] = Form(default=None),
persist_as: str = Form(default="resource"),
max_summary_chars: int = Form(default=1000),
):
"""Upload a document, convert it to Markdown, save to Obsidian, summarize with LLM, and commit to OpenViking."""
if persist_as not in {"memory", "resource", "both", "none"}:
raise HTTPException(status_code=422, detail="persist_as must be one of memory/resource/both/none")
original_name = file.filename or "uploaded-document"
suffix = Path(original_name).suffix or ".bin"
with tempfile.NamedTemporaryFile(delete=False, suffix=suffix) as tmp:
tmp.write(await file.read())
tmp_path = Path(tmp.name)
try:
markdown = await asyncio.to_thread(convert_file_to_markdown, tmp_path)
except RuntimeError as exc:
tmp_path.unlink(missing_ok=True)
raise HTTPException(status_code=500, detail=str(exc)) from exc
except Exception as exc: # noqa: BLE001
tmp_path.unlink(missing_ok=True)
raise HTTPException(status_code=500, detail=f"Document conversion failed: {exc}") from exc
finally:
tmp_path.unlink(missing_ok=True)
parsed_tags = _parse_tags(tags)
effective_title = title or Path(original_name).stem or "Uploaded knowledge"
request = CommitSummaryRequest(
content=markdown,
title=effective_title,
namespace=namespace,
memory_type=knowledge_type,
tags=parsed_tags,
source=source or original_name,
persist_as="none",
max_summary_chars=max_summary_chars,
purpose=f"knowledge upload: {knowledge_type}",
)
try:
artifact = await build_summary_artifact(request)
except LLMConfigurationError as exc:
raise HTTPException(status_code=503, detail=str(exc)) from exc
except Exception as exc: # noqa: BLE001
raise HTTPException(status_code=502, detail=f"LLM summary failed: {exc}") from exc
config = get_config()
relative_dir = obsidian_dir or getattr(config.obsidian, "knowledge_dir", "01_Knowledge/Uploaded")
obsidian_path = save_markdown_to_obsidian(
vault_path=config.obsidian.vault_path,
relative_dir=relative_dir,
title=artifact["title"],
markdown=markdown,
source_filename=original_name,
tags=artifact.get("tags", []),
knowledge_type=knowledge_type,
summary=artifact.get("summary"),
)
artifact.update(
{
"schema_version": "memory-gateway.knowledge_upload.v1",
"knowledge_type": knowledge_type,
"source_filename": original_name,
"obsidian_path": str(obsidian_path),
"obsidian_relative_path": str(obsidian_path.relative_to(config.obsidian.vault_path)),
"markdown_content": markdown,
}
)
ov_client = await get_openviking_client()
memory_result: Optional[dict[str, Any]] = None
resource_result: Optional[dict[str, Any]] = None
if persist_as in {"memory", "both"}:
memory_result = await ov_client.add_memory(
content=_render_memory_text(artifact),
namespace=namespace,
memory_type=knowledge_type,
)
if persist_as in {"resource", "both"}:
final_uri = resource_uri or _default_knowledge_uri(namespace, knowledge_type, artifact["title"], markdown)
artifact["resource_uri"] = final_uri
resource_result = await ov_client.add_resource(
uri=final_uri,
content=json.dumps(artifact, ensure_ascii=False, indent=2),
resource_type="json",
)
return {
"status": "ok",
"artifact": artifact,
"markdown_chars": len(markdown),
"obsidian_path": str(obsidian_path),
"memory_result": memory_result,
"resource_result": resource_result,
}
def create_app(config: Optional[Config] = None) -> FastAPI: def create_app(config: Optional[Config] = None) -> FastAPI:
"""创建 FastAPI 应用""" """创建 FastAPI 应用"""
if config: if config:

View File

@ -1,5 +1,5 @@
"""类型定义""" """类型定义"""
from typing import Optional, Any from typing import Optional, Any, Literal
from pydantic import BaseModel, Field from pydantic import BaseModel, Field
@ -19,10 +19,25 @@ class OpenVikingConfig(BaseModel):
class MemoryConfig(BaseModel): class MemoryConfig(BaseModel):
"""记忆配置""" """记忆配置"""
default_namespace: str = "soc" default_namespace: str = "memory-gateway"
search_limit: int = 10 search_limit: int = 10
class LLMConfig(BaseModel):
"""LLM 配置,用于通用总结和知识沉淀。"""
base_url: str = "https://api.openai.com/v1"
api_key: str = ""
model: str = ""
timeout: int = 60
max_input_chars: int = 24000
class ObsidianConfig(BaseModel):
"""Obsidian Vault 配置。"""
vault_path: str = "/home/tom/memory-gateway/obsidian-vault"
knowledge_dir: str = "01_Knowledge/Uploaded"
class LoggingConfig(BaseModel): class LoggingConfig(BaseModel):
"""日志配置""" """日志配置"""
level: str = "INFO" level: str = "INFO"
@ -35,6 +50,8 @@ class Config(BaseModel):
openviking: OpenVikingConfig = Field(default_factory=OpenVikingConfig) openviking: OpenVikingConfig = Field(default_factory=OpenVikingConfig)
memory: MemoryConfig = Field(default_factory=MemoryConfig) memory: MemoryConfig = Field(default_factory=MemoryConfig)
logging: LoggingConfig = Field(default_factory=LoggingConfig) logging: LoggingConfig = Field(default_factory=LoggingConfig)
llm: LLMConfig = Field(default_factory=LLMConfig)
obsidian: ObsidianConfig = Field(default_factory=ObsidianConfig)
class SearchRequest(BaseModel): class SearchRequest(BaseModel):
@ -59,6 +76,36 @@ class AddResourceRequest(BaseModel):
resource_type: Optional[str] = "text" resource_type: Optional[str] = "text"
class CommitSummaryRequest(BaseModel):
"""通用总结与沉淀请求。
该模型用于任意场景把一段高价值内容总结后
写入 OpenViking memory、resource或两者同时写入。
"""
content: str
title: Optional[str] = None
summary: Optional[str] = None
purpose: Optional[str] = "generic knowledge memory"
namespace: Optional[str] = None
memory_type: Optional[str] = "summary"
tags: list[str] = Field(default_factory=list)
source: Optional[str] = None
resource_uri: Optional[str] = None
resource_type: Optional[str] = "json"
persist_as: Literal["memory", "resource", "both", "none"] = "both"
max_summary_chars: int = 600
class CommitSummaryResponse(BaseModel):
"""通用总结与沉淀响应。"""
status: str
artifact: dict[str, Any]
memory_result: Optional[dict[str, Any]] = None
resource_result: Optional[dict[str, Any]] = None
class SearchResult(BaseModel): class SearchResult(BaseModel):
"""搜索结果""" """搜索结果"""
results: list[dict[str, Any]] results: list[dict[str, Any]]

View File

@ -0,0 +1,18 @@
---
title: Generic Memory Gateway Upload Test
knowledge_type: reference
source_filename: memory-gateway-generic-upload.txt
created_at: 2026-04-29T09:53:46.674688+00:00
tags: [generic, memory-gateway, upload-test, agent-workflow, knowledge-management, openviking]
summary: "A domain-neutral document describing a generic agent memory workflow for uploading reference documents. The workflow involves retrieving relevant context, summarizing final conclusions, uploading reference documents, and committing reusable knowledge to OpenViking."
---
# Generic Memory Gateway Upload Test
# Generic Memory Gateway Upload Test
This document describes a generic agent memory workflow:
- retrieve relevant context
- summarize final conclusions
- upload reference documents
- commit reusable knowledge to OpenViking

View File

@ -0,0 +1,14 @@
---
title: Memory Gateway Migration Upload Script Retry
knowledge_type: migration_note
source_filename: memory-gateway-migration-upload.txt
created_at: 2026-04-29T10:01:47.830369+00:00
tags: [migration, memory-gateway, script-retry]
summary: "Verification document confirming that document upload functionality works correctly after migrating the Memory Gateway project to /home/tom/memory-gateway. This serves as a test of the upload script following the project relocation."
---
# Memory Gateway Migration Upload Script Retry
# Memory Gateway Migration Upload
This document verifies that document upload works after moving the project to /home/tom/memory-gateway.

View File

@ -0,0 +1,14 @@
---
title: Memory Gateway Migration Upload
knowledge_type: migration_note
source_filename: memory-gateway-migration-upload.txt
created_at: 2026-04-29T10:01:29.858006+00:00
tags: [migration, memory-gateway, verification, document-upload]
summary: "Document upload functionality verified as working after migrating the Memory Gateway project to /home/tom/memory-gateway. This serves as a verification test confirming the migration was successful."
---
# Memory Gateway Migration Upload
# Memory Gateway Migration Upload
This document verifies that document upload works after moving the project to /home/tom/memory-gateway.

View File

@ -1,101 +0,0 @@
---
case_id: CASE-2026-1001
scenario: o365_suspicious_login
alert_type: azuread_impossible_travel
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1001 Impossible travel login followed by MFA prompt fatigue
## 基本信息
- Case ID: CASE-2026-1001
- 标题: Impossible travel login followed by MFA prompt fatigue
- 告警类型: azuread_impossible_travel
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.
## 关键实体
- 用户: david@corp.example
- 主机: WS-DAVID-01
- 邮箱: david@corp.example
- IP: 203.0.113.150, 198.51.100.61
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- Two successful sign-ins from geographically impossible locations within 15 minutes.
- MFA challenge volume increased abnormally before final success.
- User confirmed they did not initiate overseas login.
## 研判过程摘要
1. 确认告警场景与核心风险User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.
2. 提取关键证据并交叉验证Two successful sign-ins from geographically impossible locations within 15 minutes.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Two successful sign-ins from geographically impossible locations within 15 minutes.
- 补充依据MFA challenge volume increased abnormally before final success.
## 处置建议
- 复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。
- 若存在账号接管迹象,立即执行会话失效和凭据重置。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_impossible_travel
- 误报特征: 无
- 需关注的变体: 相关标签o365, login, impossible-travel, mfa-fatigue
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-IMPOSSIBLE-TRAVEL]], [[KB-O365-MFA-FATIGUE]]
- 关联历史 Case: [[CASE-2026-1005]], [[CASE-2026-1004]]
- 关联实体: [[david@corp.example]], [[WS-DAVID-01]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1005]] (case score=0.687) This directory contains a single case record documenting a false positive alert triggered by Microsoft 365s impossible travel detection sys...
- [[CASE-2026-1004]] (case score=0.636) This directory contains a single incident case file related to a suspicious Microsoft 365 login attempt, identified as CASE-2026-1004. The c...
### 推荐知识条目
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.69) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
- [[PB-O365-LOGIN-001]] (knowledge score=0.63) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_impossible_travel
- #verdict/true-positive
- #o365
- #login
- #impossible-travel
- #mfa-fatigue

View File

@ -1,100 +0,0 @@
---
case_id: CASE-2026-1002
scenario: o365_suspicious_login
alert_type: azuread_legacy_auth_attempt
severity: medium
verdict: false_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1002 Legacy protocol sign-in from unfamiliar IP blocked by policy
## 基本信息
- Case ID: CASE-2026-1002
- 标题: Legacy protocol sign-in from unfamiliar IP blocked by policy
- 告警类型: azuread_legacy_auth_attempt
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 误报
- 严重等级: medium
## 告警摘要
Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.
## 关键实体
- 用户: svc-migration@corp.example
- 主机: 无
- 邮箱: svc-migration@corp.example
- IP: 192.0.2.24
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- The account is a known migration service account.
- Source IP matched approved cloud migration vendor range.
- No successful sign-in occurred due to policy block.
## 研判过程摘要
1. 确认告警场景与核心风险Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.
2. 提取关键证据并交叉验证The account is a known migration service account.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:误报。
## 结论依据
- 结论为误报。
- 最关键依据The account is a known migration service account.
- 补充依据Source IP matched approved cloud migration vendor range.
## 处置建议
- 记录误报原因,并更新检测例外或抑制条件。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_legacy_auth_attempt
- 误报特征: 本案最终确认为误报,可用于补充抑制条件。
- 需关注的变体: 相关标签o365, login, false-positive, legacy-auth
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-LEGACY-AUTH]], [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1001]], [[CASE-2026-1004]]
- 关联实体: [[svc-migration@corp.example]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1001]] (case score=0.651) This directory contains a structured security incident case report related to a high-severity event in an Office 365 environment, identified...
- [[CASE-2026-1004]] (case score=0.634) This directory contains a single incident case file related to a suspicious Microsoft 365 login attempt, identified as CASE-2026-1004. The c...
### 推荐知识条目
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.626) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
- [[PB-O365-LOGIN-001]] (knowledge score=0.61) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_legacy_auth_attempt
- #verdict/false-positive
- #o365
- #login
- #false-positive
- #legacy-auth

View File

@ -1,101 +0,0 @@
---
case_id: CASE-2026-1003
scenario: o365_suspicious_login
alert_type: azuread_suspicious_inbox_rule_after_login
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1003 Suspicious inbox rule creation after successful foreign login
## 基本信息
- Case ID: CASE-2026-1003
- 标题: Suspicious inbox rule creation after successful foreign login
- 告警类型: azuread_suspicious_inbox_rule_after_login
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.
## 关键实体
- 用户: emma@corp.example
- 主机: WS-EMMA-07
- 邮箱: emma@corp.example
- IP: 198.51.100.98
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- Successful sign-in from untrusted ASN.
- Inbox rule moved wire transfer emails to RSS Feeds folder.
- Mailbox audit showed rule creation minutes after login.
## 研判过程摘要
1. 确认告警场景与核心风险An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.
2. 提取关键证据并交叉验证Successful sign-in from untrusted ASN.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Successful sign-in from untrusted ASN.
- 补充依据Inbox rule moved wire transfer emails to RSS Feeds folder.
## 处置建议
- 复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。
- 若存在账号接管迹象,立即执行会话失效和凭据重置。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_suspicious_inbox_rule_after_login
- 误报特征: 无
- 需关注的变体: 相关标签o365, login, inbox-rule, account-compromise
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-INBOX-RULE-ABUSE]], [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1005]], [[CASE-2026-1001]]
- 关联实体: [[emma@corp.example]], [[WS-EMMA-07]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1005]] (case score=0.667) This directory contains a single case record documenting a false positive alert triggered by Microsoft 365s impossible travel detection sys...
- [[CASE-2026-1001]] (case score=0.666) This document is a structured case report detailing a high-severity security incident involving suspicious login activity in an Office 365 e...
### 推荐知识条目
- [[PB-O365-LOGIN-001]] (knowledge score=0.653) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.645) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_suspicious_inbox_rule_after_login
- #verdict/true-positive
- #o365
- #login
- #inbox-rule
- #account-compromise

View File

@ -1,101 +0,0 @@
---
case_id: CASE-2026-1004
scenario: o365_suspicious_login
alert_type: azuread_password_spray_attempt
severity: medium
verdict: uncertain
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1004 Multiple failed logins from residential proxy but no successful access
## 基本信息
- Case ID: CASE-2026-1004
- 标题: Multiple failed logins from residential proxy but no successful access
- 告警类型: azuread_password_spray_attempt
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: uncertain
- 严重等级: medium
## 告警摘要
Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.
## 关键实体
- 用户: frank@corp.example
- 主机: 无
- 邮箱: frank@corp.example
- IP: 203.0.113.201
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- High-volume failed attempts over a short period.
- Source IP attributed to a residential proxy provider.
- No matching successful sign-in or MFA event found.
## 研判过程摘要
1. 确认告警场景与核心风险Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.
2. 提取关键证据并交叉验证High-volume failed attempts over a short period.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定uncertain。
## 结论依据
- 结论为uncertain。
- 最关键依据High-volume failed attempts over a short period.
- 补充依据Source IP attributed to a residential proxy provider.
## 处置建议
- 复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。
- 若存在账号接管迹象,立即执行会话失效和凭据重置。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_password_spray_attempt
- 误报特征: 无
- 需关注的变体: 相关标签o365, login, password-spray, pending
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1001]], [[CASE-2026-1003]]
- 关联实体: [[frank@corp.example]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1001]] (case score=0.665) This directory contains a structured security incident case report related to a high-severity event in an Office 365 environment, identified...
- [[CASE-2026-1003]] (case score=0.627) This directory contains a structured incident case report focused on a confirmed Microsoft 365 account compromise involving suspicious login...
### 推荐知识条目
- [[PB-O365-LOGIN-001]] (knowledge score=0.614) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.609) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_password_spray_attempt
- #verdict/uncertain
- #o365
- #login
- #password-spray
- #pending

View File

@ -1,100 +0,0 @@
---
case_id: CASE-2026-1005
scenario: o365_suspicious_login
alert_type: azuread_impossible_travel
severity: medium
verdict: false_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1005 Traveling executive triggered impossible travel but activity was legitimate
## 基本信息
- Case ID: CASE-2026-1005
- 标题: Traveling executive triggered impossible travel but activity was legitimate
- 告警类型: azuread_impossible_travel
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 误报
- 严重等级: medium
## 告警摘要
Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.
## 关键实体
- 用户: grace@corp.example
- 主机: VIP-LAPTOP-01
- 邮箱: grace@corp.example
- IP: 192.0.2.90, 203.0.113.77
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- Approved travel request existed.
- One login originated from corporate VPN exit node.
- Device and user agent were consistent with known user profile.
## 研判过程摘要
1. 确认告警场景与核心风险Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.
2. 提取关键证据并交叉验证Approved travel request existed.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:误报。
## 结论依据
- 结论为误报。
- 最关键依据Approved travel request existed.
- 补充依据One login originated from corporate VPN exit node.
## 处置建议
- 记录误报原因,并更新检测例外或抑制条件。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_impossible_travel
- 误报特征: 本案最终确认为误报,可用于补充抑制条件。
- 需关注的变体: 相关标签o365, login, false-positive, travel
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1001]], [[CASE-2026-1004]]
- 关联实体: [[grace@corp.example]], [[VIP-LAPTOP-01]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1001]] (case score=0.684) This directory contains a structured security incident case report related to a high-severity event in an Office 365 environment, identified...
- [[CASE-2026-1004]] (case score=0.63) This directory contains a single incident case file related to a suspicious Microsoft 365 login attempt, identified as CASE-2026-1004. The c...
### 推荐知识条目
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.703) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
- [[PB-O365-LOGIN-001]] (knowledge score=0.626) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_impossible_travel
- #verdict/false-positive
- #o365
- #login
- #false-positive
- #travel

View File

@ -1,101 +0,0 @@
---
case_id: CASE-2026-0001
scenario: phishing
alert_type: mail_suspicious_attachment
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0001 Finance user received invoice-themed phishing email
## 基本信息
- Case ID: CASE-2026-0001
- 标题: Finance user received invoice-themed phishing email
- 告警类型: mail_suspicious_attachment
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.
## 关键实体
- 用户: alice@corp.example
- 主机: FIN-LAPTOP-12
- 邮箱: alice@corp.example
- IP: 198.51.100.20
- 域名: vendor-payments.com, vendor-payments-login.com
- 文件 Hash: sha256:phish0001
- 其他 IOC: https://vendor-payments-login.com/review, billing@vendor-payments.com
## 关键证据
- Sender domain was newly observed and failed DMARC.
- Attachment redirected to a fake Microsoft 365 login page.
- User clicked the link before mail quarantine completed.
## 研判过程摘要
1. 确认告警场景与核心风险Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.
2. 提取关键证据并交叉验证Sender domain was newly observed and failed DMARC.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Sender domain was newly observed and failed DMARC.
- 补充依据Attachment redirected to a fake Microsoft 365 login page.
## 处置建议
- 隔离相同主题、发件人或 URL 的邮件样本。
- 核查用户是否点击或提交凭据,并按需执行凭据重置。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_suspicious_attachment
- 误报特征: 无
- 需关注的变体: 相关标签phishing, email, credential-harvest, finance
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-PHISH-HEADER-CHECK]], [[KB-CRED-HARVEST-PATTERNS]]
- 关联历史 Case: [[CASE-2026-0004]], [[CASE-2026-0002]]
- 关联实体: [[alice@corp.example]], [[FIN-LAPTOP-12]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0004]] (case score=0.662) This directory contains a structured incident case report related to a phishing attack targeting a shared mailbox via a spoofed OneDrive not...
- [[CASE-2026-0002]] (case score=0.631) This directory contains a single case record detailing the investigation of a suspicious payroll notification email flagged due to a shorten...
### 推荐知识条目
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.656) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
- [[PB-PHISH-001]] (knowledge score=0.639) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_suspicious_attachment
- #verdict/true-positive
- #phishing
- #email
- #credential-harvest
- #finance

View File

@ -1,100 +0,0 @@
---
case_id: CASE-2026-0002
scenario: phishing
alert_type: mail_suspicious_link
severity: medium
verdict: false_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0002 Payroll notification email flagged but determined benign
## 基本信息
- Case ID: CASE-2026-0002
- 标题: Payroll notification email flagged but determined benign
- 告警类型: mail_suspicious_link
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 误报
- 严重等级: medium
## 告警摘要
Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.
## 关键实体
- 用户: bob@corp.example
- 主机: HR-LAPTOP-03
- 邮箱: bob@corp.example
- IP: 无
- 域名: hr-vendor.example
- 文件 Hash: 无
- 其他 IOC: https://bit.ly/hr-portal-example, notify@hr-vendor.example
## 关键证据
- Sender domain aligned with SPF and DKIM.
- Destination domain matched approved supplier inventory.
- No credential prompt anomaly observed.
## 研判过程摘要
1. 确认告警场景与核心风险Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.
2. 提取关键证据并交叉验证Sender domain aligned with SPF and DKIM.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:误报。
## 结论依据
- 结论为误报。
- 最关键依据Sender domain aligned with SPF and DKIM.
- 补充依据Destination domain matched approved supplier inventory.
## 处置建议
- 记录误报原因,并更新检测例外或抑制条件。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_suspicious_link
- 误报特征: 本案最终确认为误报,可用于补充抑制条件。
- 需关注的变体: 相关标签phishing, email, false-positive, vendor
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-PHISH-HEADER-CHECK]], [[KB-CRED-HARVEST-PATTERNS]]
- 关联历史 Case: [[CASE-2026-0004]], [[CASE-2026-0001]]
- 关联实体: [[bob@corp.example]], [[HR-LAPTOP-03]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0004]] (case score=0.549) This directory contains a structured incident case report related to a phishing attack targeting a shared mailbox via a spoofed OneDrive not...
- [[CASE-2026-0001]] (case score=0.532) This directory contains a structured case report detailing a high-severity phishing incident targeting a finance user via a malicious invoic...
### 推荐知识条目
- [[PB-PHISH-001]] (knowledge score=0.514) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.494) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_suspicious_link
- #verdict/false-positive
- #phishing
- #email
- #false-positive
- #vendor

View File

@ -1,101 +0,0 @@
---
case_id: CASE-2026-0003
scenario: phishing
alert_type: mail_bec_impersonation
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0003 Executive impersonation email requested urgent wire transfer
## 基本信息
- Case ID: CASE-2026-0003
- 标题: Executive impersonation email requested urgent wire transfer
- 告警类型: mail_bec_impersonation
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.
## 关键实体
- 用户: carol@corp.example
- 主机: FIN-LAPTOP-08
- 邮箱: carol@corp.example
- IP: 203.0.113.45
- 域名: c0rp-example.com
- 文件 Hash: 无
- 其他 IOC: ceo@c0rp-example.com
## 关键证据
- Lookalike domain used numeric substitution.
- Language pressure matched prior BEC pattern.
- No historical communication from sender domain.
## 研判过程摘要
1. 确认告警场景与核心风险An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.
2. 提取关键证据并交叉验证Lookalike domain used numeric substitution.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Lookalike domain used numeric substitution.
- 补充依据Language pressure matched prior BEC pattern.
## 处置建议
- 隔离相同主题、发件人或 URL 的邮件样本。
- 核查用户是否点击或提交凭据,并按需执行凭据重置。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_bec_impersonation
- 误报特征: 无
- 需关注的变体: 相关标签phishing, bec, executive-impersonation
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-CRED-HARVEST-PATTERNS]], [[KB-PHISH-HEADER-CHECK]]
- 关联历史 Case: [[CASE-2026-0001]], [[CASE-2026-0004]]
- 关联实体: [[carol@corp.example]], [[FIN-LAPTOP-08]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0001]] (case score=0.572) This directory contains a structured case report detailing a high-severity phishing incident targeting a finance user via a malicious invoic...
- [[CASE-2026-0004]] (case score=0.566) This directory contains a structured incident case report related to a phishing attack targeting a shared mailbox via a spoofed OneDrive not...
### 推荐知识条目
- [[PB-PHISH-001]] (knowledge score=0.538) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.522) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
- [[KB-PHISH-HEADER-CHECK]] (knowledge score=0.512) This directory contains a structured knowledge base document focused on validating phishing emails through detailed analysis of email header...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_bec_impersonation
- #verdict/true-positive
- #phishing
- #bec
- #executive-impersonation

View File

@ -1,100 +0,0 @@
---
case_id: CASE-2026-0004
scenario: phishing
alert_type: mail_suspicious_attachment
severity: medium
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0004 Shared mailbox received OneDrive lure with HTML attachment
## 基本信息
- Case ID: CASE-2026-0004
- 标题: Shared mailbox received OneDrive lure with HTML attachment
- 告警类型: mail_suspicious_attachment
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: medium
## 告警摘要
Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.
## 关键实体
- 用户: shared-finance@corp.example
- 主机: 无
- 邮箱: shared-finance@corp.example
- IP: 198.51.100.87
- 域名: sharepoint-notify.com
- 文件 Hash: sha256:phish0004
- 其他 IOC: https://onedrive-review-login.example, noreply@sharepoint-notify.com
## 关键证据
- Attachment rendered a fake Microsoft sign-in page.
- Landing page hosted outside Microsoft IP space.
- Mail body reused branding from previous phishing campaign.
## 研判过程摘要
1. 确认告警场景与核心风险Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.
2. 提取关键证据并交叉验证Attachment rendered a fake Microsoft sign-in page.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Attachment rendered a fake Microsoft sign-in page.
- 补充依据Landing page hosted outside Microsoft IP space.
## 处置建议
- 隔离相同主题、发件人或 URL 的邮件样本。
- 核查用户是否点击或提交凭据,并按需执行凭据重置。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_suspicious_attachment
- 误报特征: 无
- 需关注的变体: 相关标签phishing, email, onedrive-lure
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-CRED-HARVEST-PATTERNS]]
- 关联历史 Case: [[CASE-2026-0001]], [[CASE-2026-0003]]
- 关联实体: [[shared-finance@corp.example]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0001]] (case score=0.675) This directory contains a structured case report detailing a high-severity phishing incident targeting a finance user via a malicious invoic...
- [[CASE-2026-0003]] (case score=0.606) This directory contains a structured incident report for a high-severity phishing attack involving executive impersonation, classified under...
### 推荐知识条目
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.652) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
- [[PB-PHISH-001]] (knowledge score=0.608) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_suspicious_attachment
- #verdict/true-positive
- #phishing
- #email
- #onedrive-lure

View File

@ -1,76 +0,0 @@
# Case Note Template
## 基本信息
- Case ID:
- 标题:
- 告警类型:
- 来源系统:
- 时间范围:
- 研判人 / Agent:
- 最终结论:
- 严重等级:
## 告警摘要
一句话概述这次 case 的核心问题。
## 关键实体
- 用户:
- 主机:
- 邮箱:
- IP:
- 域名:
- 文件 Hash:
- 其他 IOC:
## 关键证据
- 证据 1:
- 证据 2:
- 证据 3:
## 研判过程摘要
只保留对后续复用有价值的关键步骤,不记录所有原始过程。
1.
2.
3.
## 结论依据
- 为什么判定为真报 / 误报 / 可疑待定
- 哪些信号最关键
## 处置建议
-
-
## 可复用模式
- 命中模式:
- 误报特征:
- 需关注的变体:
## 关联知识
- 关联 Playbook:
- 关联 KB:
- 关联历史 Case:
- 关联实体:
## Lessons Learned
- 本案新增了什么可复用经验
- 哪些规则、知识或流程应更新
## 标签
- `#case`
- `#alert/...`
- `#verdict/true-positive`
- `#verdict/false-positive`
- `#ttp/...`

View File

@ -0,0 +1,26 @@
# {{title}}
---
type: knowledge
source: {{source}}
created: {{date}}
tags:
- memory-gateway
---
## Summary
简要说明这份知识对后续 agent / harness 的可复用价值。
## Key Points
-
## Usage Notes
说明 agent 在什么场景下应该检索或引用这份知识。
## Source
- 原始来源:
- OpenViking URI

View File

@ -1,59 +0,0 @@
# Playbook Template
## 基本信息
- 名称:
- 适用告警类型:
- 场景:
- 最近更新时间:
- 负责人:
## 场景描述
这个 playbook 解决什么问题,适用于哪些前置条件。
## 输入信号
- 必要信号:
- 可选信号:
- 常见数据源:
## 调查步骤
1.
2.
3.
## 关键判断点
- 什么情况下倾向真报
- 什么情况下倾向误报
- 哪些证据最关键
## 常见误报模式
-
-
## 常见真报模式
-
-
## 升级 / 处置建议
-
-
## 关联内容
- 相关 Case:
- 相关 KB:
- 相关 IOC:
- 相关 TTP:
## 标签
- `#playbook`
- `#alert/...`
- `#ttp/...`

View File

@ -1,52 +0,0 @@
# Report Summary Template
## 基本信息
- 标题:
- 来源:
- 日期:
- 作者 / 团队:
- 类型:
## 核心摘要
用 3 到 5 句话总结对 SOC 研判最有帮助的内容。
## 关键发现
- 发现 1:
- 发现 2:
- 发现 3:
## 关键实体
- 攻击者:
- 工具:
- 域名 / IP:
- Hash:
- 邮件主题 / 发件特征:
## 对 SOC 的实际价值
- 对哪些告警类型有帮助
- 对哪些 playbook 需要更新
- 对哪些规则或研判路径有启发
## 可沉淀记忆
- 哪些内容适合作为 Knowledge Memory
- 哪些内容适合作为 Case Pattern
## 关联内容
- 关联 KB:
- 关联 Playbook:
- 关联 Case:
- 关联 TTP:
## 标签
- `#report`
- `#intel`
- `#ttp/...`
- `#campaign/...`

View File

@ -1,15 +1,15 @@
# Obsidian Vault # Obsidian Vault
这个目录用于保存 Obsidian Vault 的推荐骨架 这个目录用于保存 Memory Gateway 的 Markdown 知识沉淀
原则: 原则:
- 只存高价值、可人工维护的沉淀 - 只存高价值、可人工维护的知识和总结。
- 不存全量原始资料 - 不存全量原始资料
-把 ticket 原文、报告全文直接塞进 Vault -存密钥、凭证、私人敏感信息或无需长期保留的聊天流水。
- 上传文档默认进入 `01_Knowledge/Uploaded/`,再由 Memory Gateway 总结并写入 OpenViking。
建议优先建设 当前结构
- `01_Knowledge/` - `01_Knowledge/Uploaded/`:上传文档转换后的 Markdown。
- `02_Cases/` - `05_Templates/`:通用知识笔记模板。
- `05_Templates/`

View File

@ -1,14 +0,0 @@
# Pipeline
这个目录用于保存知识源接入和数据清洗流程。
建议优先接入:
- 历史 case
- KB / Playbook
后续再逐步扩展:
- ticket system
- intel system
- 月报 / 报告

View File

@ -1,41 +0,0 @@
"""Batch-ingest mock case files and emit normalized case JSON documents."""
from __future__ import annotations
import json
from dataclasses import asdict
from pathlib import Path
from pipeline.transforms.normalize_case import load_and_normalize_case
def ingest_cases(input_dir: str | Path, output_dir: str | Path) -> list[Path]:
input_dir = Path(input_dir)
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
written: list[Path] = []
for src in sorted(input_dir.rglob("*.json")):
normalized = load_and_normalize_case(src)
dest = output_dir / f"{normalized.id}.json"
with dest.open("w", encoding="utf-8") as f:
json.dump(asdict(normalized), f, ensure_ascii=False, indent=2)
written.append(dest)
return written
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a directory of mock case JSON files.")
parser.add_argument("--input-dir", default="evaluation/datasets/mock_cases", help="Directory containing raw mock case files")
parser.add_argument("--output-dir", default="evaluation/datasets/normalized_cases", help="Directory to write normalized case files")
args = parser.parse_args()
written = ingest_cases(args.input_dir, args.output_dir)
print(f"normalized_cases={len(written)}")
for path in written:
print(path)
if __name__ == "__main__":
main()

View File

@ -1,41 +0,0 @@
"""Batch-ingest mock KB/playbook files and emit normalized knowledge JSON documents."""
from __future__ import annotations
import json
from dataclasses import asdict
from pathlib import Path
from pipeline.transforms.normalize_kb import load_and_normalize_kb
def ingest_kb(input_dir: str | Path, output_dir: str | Path) -> list[Path]:
input_dir = Path(input_dir)
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
written: list[Path] = []
for src in sorted(input_dir.rglob("*.json")):
normalized = load_and_normalize_kb(src)
dest = output_dir / f"{normalized.id}.json"
with dest.open("w", encoding="utf-8") as f:
json.dump(asdict(normalized), f, ensure_ascii=False, indent=2)
written.append(dest)
return written
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a directory of mock KB/playbook JSON files.")
parser.add_argument("--input-dir", default="evaluation/datasets/mock_kb", help="Directory containing raw mock KB/playbook files")
parser.add_argument("--output-dir", default="evaluation/datasets/normalized_kb", help="Directory to write normalized KB/playbook files")
args = parser.parse_args()
written = ingest_kb(args.input_dir, args.output_dir)
print(f"normalized_kb={len(written)}")
for path in written:
print(path)
if __name__ == "__main__":
main()

View File

@ -1,91 +0,0 @@
"""Normalize raw mock SOC cases into a retrieval-friendly structure.
This module is intentionally small and deterministic so it can be used with
mock data before real connectors are available.
"""
from __future__ import annotations
import json
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Any
@dataclass
class NormalizedCase:
id: str
memory_type: str
scenario: str
title: str
abstract: str
verdict: str
severity: str
entities: dict[str, list[str]]
observables: dict[str, list[str]]
evidence: list[str]
patterns: list[str]
related_refs: dict[str, list[str]]
source_path: str
tags: list[str]
def _derive_patterns(raw_case: dict[str, Any]) -> list[str]:
"""Derive a small set of reusable patterns from the case payload."""
patterns: list[str] = []
verdict = raw_case.get("conclusion", {}).get("verdict")
if verdict:
patterns.append(f"verdict:{verdict}")
scenario = raw_case.get("scenario")
if scenario:
patterns.append(f"scenario:{scenario}")
alert_type = raw_case.get("alert_type")
if alert_type:
patterns.append(f"alert_type:{alert_type}")
return patterns
def normalize_case(raw_case: dict[str, Any], source_path: str = "") -> NormalizedCase:
"""Convert a raw case document into the internal normalized case model."""
conclusion = raw_case.get("conclusion", {})
return NormalizedCase(
id=raw_case["case_id"],
memory_type="case",
scenario=raw_case["scenario"],
title=raw_case["title"],
abstract=raw_case.get("summary", ""),
verdict=conclusion.get("verdict", raw_case.get("status", "unknown")),
severity=raw_case.get("severity", "unknown"),
entities=raw_case.get("entities", {}),
observables=raw_case.get("observables", {}),
evidence=raw_case.get("evidence", []),
patterns=_derive_patterns(raw_case),
related_refs=raw_case.get("related_refs", {}),
source_path=source_path,
tags=raw_case.get("tags", []),
)
def load_and_normalize_case(path: str | Path) -> NormalizedCase:
path = Path(path)
with path.open("r", encoding="utf-8") as f:
raw_case = json.load(f)
return normalize_case(raw_case, source_path=str(path))
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a mock SOC case JSON file.")
parser.add_argument("path", help="Path to a raw case JSON file")
args = parser.parse_args()
normalized = load_and_normalize_case(args.path)
print(json.dumps(asdict(normalized), ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,63 +0,0 @@
"""Normalize raw mock KB/playbook documents into a retrieval-friendly structure."""
from __future__ import annotations
import json
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Any
@dataclass
class NormalizedKnowledge:
id: str
memory_type: str
doc_type: str
scenario: str
title: str
abstract: str
key_points: list[str]
investigation_guidance: list[str]
decision_points: list[str]
related_refs: dict[str, list[str]]
source_path: str
tags: list[str]
def normalize_kb(raw_doc: dict[str, Any], source_path: str = "") -> NormalizedKnowledge:
"""Convert a raw KB or playbook document into the normalized knowledge model."""
return NormalizedKnowledge(
id=raw_doc["doc_id"],
memory_type="knowledge",
doc_type=raw_doc["doc_type"],
scenario=raw_doc["scenario"],
title=raw_doc["title"],
abstract=raw_doc.get("summary", ""),
key_points=raw_doc.get("key_points", []),
investigation_guidance=raw_doc.get("investigation_guidance", []),
decision_points=raw_doc.get("decision_points", []),
related_refs=raw_doc.get("related_refs", {}),
source_path=source_path,
tags=raw_doc.get("tags", []),
)
def load_and_normalize_kb(path: str | Path) -> NormalizedKnowledge:
path = Path(path)
with path.open("r", encoding="utf-8") as f:
raw_doc = json.load(f)
return normalize_kb(raw_doc, source_path=str(path))
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a mock KB or playbook JSON file.")
parser.add_argument("path", help="Path to a raw KB/playbook JSON file")
args = parser.parse_args()
normalized = load_and_normalize_kb(args.path)
print(json.dumps(asdict(normalized), ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,7 +1,7 @@
[project] [project]
name = "memory-gateway" name = "memory-gateway"
version = "0.1.0" version = "0.1.0"
description = "基于 OpenViking 的统一记忆入口 MCP Server" description = "Generic Memory Gateway for OpenViking, Obsidian, LLM summarization, and agent memory workflows"
readme = "README.md" readme = "README.md"
requires-python = ">=3.10" requires-python = ">=3.10"
dependencies = [ dependencies = [
@ -13,6 +13,8 @@ dependencies = [
"pyyaml>=6.0", "pyyaml>=6.0",
"uvicorn>=0.27.0", "uvicorn>=0.27.0",
"tenacity>=8.2.0", "tenacity>=8.2.0",
"markitdown[all]>=0.1.5",
"python-multipart>=0.0.9",
] ]
[project.optional-dependencies] [project.optional-dependencies]

View File

@ -1,17 +0,0 @@
# Skills
建议优先落地的 skills
- `ingest_skill`
- `extract_memory_skill`
- `classify_memory_skill`
- `retrieve_context_skill`
- `summarize_case_skill`
- `commit_memory_skill`
- `prune_memory_skill`
POC 第一阶段建议先做:
- `retrieve_context_skill`
- `summarize_case_skill`
- `commit_memory_skill`

View File

@ -1,36 +0,0 @@
# commit_memory_skill
这个 skill 负责把标准化后的高价值记忆写回 OpenViking。
## 当前阶段职责
第一阶段优先把标准化后的 `case``knowledge` 以 resource 形式写入 OpenViking。
原因:
- 结构化数据适合用 URI 明确组织
- 相比通过会话提交 `add_memory`resource 写入更可控
- 便于后续按 namespace 和 URI 组织 case / knowledge / report
## 第一阶段输入
- 标准化后的 case JSON
- 标准化后的 KB / Playbook JSON
## 第一阶段输出
- OpenViking resource 写入结果
- 统一 URI 组织的资源
## 默认 URI 约定
- case: `viking://soc/case/<scenario>/<id>`
- knowledge: `viking://soc/knowledge/<doc_type>/<id>`
## 后续扩展
后续可以在 resource 写入稳定后,再增加:
- 高价值 summary 写入 `memory`
- EverMemOS 提炼结果回灌
- Obsidian / OpenViking 双写策略

View File

@ -1,29 +0,0 @@
# commit_memory_skill
## 用途
把已经过标准化和筛选的 case / knowledge 内容写入 OpenViking。
## 当前默认策略
第一阶段只做 resource 写入,不强行做复杂 memory 演化。
- `case` -> `viking://soc/case/<scenario>/<id>`
- `knowledge` -> `viking://soc/knowledge/<doc_type>/<id>`
## 输入
- 标准化后的 case / knowledge JSON 文件
- OpenViking 配置URL / API Key
## 输出
- 写入结果
- 目标 URI
- 成功 / 失败状态
## 成功标准
- 可以把本地标准化样本成功写入 OpenViking
- URI 组织符合 namespace 设计
- 后续可以被检索和引用

View File

@ -1,89 +0,0 @@
"""Commit normalized SOC memory items to OpenViking as structured resources."""
from __future__ import annotations
import argparse
import asyncio
import json
from pathlib import Path
from typing import Any
from memory_gateway.openviking_client import OpenVikingClient
def build_resource_uri(item: dict[str, Any]) -> str:
memory_type = item.get("memory_type")
item_id = item["id"]
if memory_type == "case":
scenario = item.get("scenario", "general")
return f"viking://resources/soc-memory-poc/case/{scenario}/{item_id}.json"
if memory_type == "knowledge":
doc_type = item.get("doc_type", "general")
return f"viking://resources/soc-memory-poc/knowledge/{doc_type}/{item_id}.json"
raise ValueError(f"Unsupported memory_type for commit: {memory_type}")
def load_item(path: str | Path) -> dict[str, Any]:
path = Path(path)
with path.open("r", encoding="utf-8") as f:
return json.load(f)
async def commit_file(path: str | Path, client: OpenVikingClient) -> dict[str, Any]:
item = load_item(path)
uri = build_resource_uri(item)
result = await client.add_resource(
uri=uri,
content=json.dumps(item, ensure_ascii=False, indent=2),
resource_type="json",
wait=False,
)
return {
"path": str(path),
"uri": uri,
"result": result,
}
async def commit_directory(directory: str | Path, client: OpenVikingClient, limit: int | None = None) -> list[dict[str, Any]]:
directory = Path(directory)
paths = sorted(directory.rglob("*.json"))
if limit is not None:
paths = paths[:limit]
results: list[dict[str, Any]] = []
for path in paths:
results.append(await commit_file(path, client))
return results
async def main_async(args: argparse.Namespace) -> None:
client = OpenVikingClient()
try:
if args.path:
result = await commit_file(args.path, client)
print(json.dumps(result, ensure_ascii=False, indent=2))
else:
results = await commit_directory(args.directory, client, limit=args.limit)
print(json.dumps(results, ensure_ascii=False, indent=2))
finally:
await client.close()
def main() -> None:
parser = argparse.ArgumentParser(description="Commit normalized SOC items to OpenViking.")
parser.add_argument("--path", help="Single normalized JSON file to commit")
parser.add_argument("--directory", help="Directory of normalized JSON files to commit")
parser.add_argument("--limit", type=int, default=None, help="Optional limit for directory commits")
args = parser.parse_args()
if not args.path and not args.directory:
parser.error("Either --path or --directory is required")
asyncio.run(main_async(args))
if __name__ == "__main__":
main()

View File

@ -1,42 +0,0 @@
# retrieve_context_skill
这个 skill 用于根据当前 case 的关键信号,从 OpenViking 或 mock dataset 中召回最相关的上下文。
## 目标
输入当前 case 的场景、告警类型、IOC、描述输出一组排序后的相关内容
- 相似历史 case
- 相关 KB
- 相关 Playbook
- 关键 decision points
## 第一阶段输入
- `scenario`
- `alert_type`
- `summary`
- `entities`
- `observables`
- `top_k`
## 第一阶段输出
- `matched_cases`
- `matched_knowledge`
- `decision_points`
- `next_actions`
## 第一阶段检索策略
1. 先按 `scenario` 过滤
2. 再按 `alert_type`、IOC、关键词做匹配
3. 再按 evidence / tags 做轻量重排序
4. 输出 top-k
## 第一阶段不做
- 向量检索
- 图检索
- 个性化排序
- 多源复杂重排

View File

@ -1,39 +0,0 @@
# retrieve_context_skill
## 用途
在 SOC case 研判时,为 agent 检索最相关的历史 case 和知识上下文。
## 输入
- `scenario`: 场景,如 `phishing``o365_suspicious_login`
- `alert_type`: 告警类型
- `summary`: 当前 case 摘要
- `entities`: 用户、主机、邮箱等
- `observables`: 域名、IP、URL、Hash 等
- `top_k`: 期望返回条数
## 输出
- 相关历史 case 列表
- 相关 KB / Playbook 列表
- 关键 evidence / decision points
- 推荐下一步调查动作
## 默认检索顺序
1. `session/<session_id>`
2. `soc/case`
3. `soc/knowledge`
4. `agent/<agent_id>`
5. `user/<user_id>`
## Mock 阶段工作方式
在没有真实数据和完整 OpenViking 检索链路时,先使用 `evaluation/datasets/mock_cases/``evaluation/datasets/mock_kb/` 做本地检索验证。
## 成功标准
- 钓鱼 case 能召回钓鱼 playbook 和相似 phishing case
- O365 异常登录 case 能召回登录异常 KB 和相似 case
- 返回结果对人工 reviewer 看起来是“有帮助的上下文”,而不是泛资料堆积

View File

@ -1,216 +0,0 @@
"""Retrieval entrypoint for SOC Memory POC.
Supports two modes:
- local: retrieve from normalized mock datasets
- openviking: retrieve from OpenViking resource namespaces and filter results
"""
from __future__ import annotations
import asyncio
import json
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Any
from memory_gateway.openviking_client import OpenVikingClient
CASE_URI_PREFIX = "viking://resources/soc-memory-poc/case"
KNOWLEDGE_URI_PREFIX = "viking://resources/soc-memory-poc/knowledge"
def _load_json_dir(path: str | Path) -> list[dict[str, Any]]:
path = Path(path)
items: list[dict[str, Any]] = []
for file in sorted(path.rglob("*.json")):
with file.open("r", encoding="utf-8") as f:
items.append(json.load(f))
return items
@dataclass
class RetrievalQuery:
scenario: str
alert_type: str = ""
summary: str = ""
entities: dict[str, list[str]] | None = None
observables: dict[str, list[str]] | None = None
top_k: int = 3
def _flatten_values(data: dict[str, list[str]] | None) -> set[str]:
if not data:
return set()
values: set[str] = set()
for items in data.values():
values.update(str(item).lower() for item in items)
return values
def _score_case(query: RetrievalQuery, item: dict[str, Any]) -> int:
score = 0
if item.get("scenario") == query.scenario:
score += 50
for pattern in item.get("patterns", []):
if query.alert_type and pattern == f"alert_type:{query.alert_type}":
score += 20
query_observables = _flatten_values(query.observables)
item_observables = _flatten_values(item.get("observables"))
score += 8 * len(query_observables & item_observables)
summary = query.summary.lower()
haystacks = [item.get("title", "").lower(), item.get("abstract", "").lower()]
for token in [t for t in summary.split() if len(t) > 4]:
if any(token in text for text in haystacks):
score += 2
return score
def _score_knowledge(query: RetrievalQuery, item: dict[str, Any]) -> int:
score = 0
if item.get("scenario") == query.scenario:
score += 40
title = item.get("title", "").lower()
abstract = item.get("abstract", "").lower()
for token in [t for t in query.summary.lower().split() if len(t) > 4]:
if token in title or token in abstract:
score += 2
if query.alert_type and query.alert_type in " ".join(item.get("related_refs", {}).get("cases", [])).lower():
score += 5
return score
def retrieve_context_local(
query: RetrievalQuery,
cases_dir: str | Path = "evaluation/datasets/normalized_cases",
knowledge_dir: str | Path = "evaluation/datasets/normalized_kb",
) -> dict[str, Any]:
cases = _load_json_dir(cases_dir)
knowledge = _load_json_dir(knowledge_dir)
ranked_cases = sorted(
({"score": _score_case(query, item), "item": item} for item in cases),
key=lambda x: x["score"],
reverse=True,
)
ranked_knowledge = sorted(
({"score": _score_knowledge(query, item), "item": item} for item in knowledge),
key=lambda x: x["score"],
reverse=True,
)
matched_cases = [entry for entry in ranked_cases if entry["score"] > 0][: query.top_k]
matched_knowledge = [entry for entry in ranked_knowledge if entry["score"] > 0][: query.top_k]
decision_points: list[str] = []
next_actions: list[str] = []
for entry in matched_knowledge:
item = entry["item"]
decision_points.extend(item.get("decision_points", []))
next_actions.extend(item.get("investigation_guidance", []))
return {
"backend": "local",
"query": asdict(query),
"matched_cases": matched_cases,
"matched_knowledge": matched_knowledge,
"decision_points": decision_points[: query.top_k],
"next_actions": next_actions[: query.top_k],
}
def _canonicalize_resource_uri(uri: str) -> str:
if ".json/" in uri:
return uri.split(".json/", 1)[0] + ".json"
return uri
def _query_text(query: RetrievalQuery) -> str:
parts = [query.scenario, query.alert_type, query.summary]
parts.extend(sorted(_flatten_values(query.observables)))
return " ".join(part for part in parts if part).strip()
def _dedupe_openviking_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
deduped: dict[str, dict[str, Any]] = {}
for item in results:
uri = item.get("uri") or ""
if not uri.startswith(prefix):
continue
canonical_uri = _canonicalize_resource_uri(uri)
score = item.get("score") or 0
existing = deduped.get(canonical_uri)
payload = {
"uri": canonical_uri,
"abstract": item.get("abstract", ""),
"score": score,
"context_type": item.get("context_type"),
"source_uri": uri,
}
if existing is None or score > existing.get("score", 0):
deduped[canonical_uri] = payload
return sorted(deduped.values(), key=lambda x: x["score"], reverse=True)
async def retrieve_context_openviking(
query: RetrievalQuery,
case_uri: str = CASE_URI_PREFIX,
knowledge_uri: str = KNOWLEDGE_URI_PREFIX,
) -> dict[str, Any]:
client = OpenVikingClient()
try:
query_text = _query_text(query)
case_result = await client.search(query=query_text, uri=case_uri, limit=max(query.top_k * 5, 10))
knowledge_result = await client.search(query=query_text, uri=knowledge_uri, limit=max(query.top_k * 5, 10))
matched_cases = _dedupe_openviking_results(case_result.results, case_uri)[: query.top_k]
matched_knowledge = _dedupe_openviking_results(knowledge_result.results, knowledge_uri)[: query.top_k]
return {
"backend": "openviking",
"query": asdict(query),
"matched_cases": matched_cases,
"matched_knowledge": matched_knowledge,
"decision_points": [],
"next_actions": [],
}
finally:
await client.close()
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Retrieve SOC context from local datasets or OpenViking.")
parser.add_argument("--backend", choices=["local", "openviking"], default="local", help="Retrieval backend")
parser.add_argument("--scenario", required=True, help="Scenario, e.g. phishing or o365_suspicious_login")
parser.add_argument("--alert-type", default="", help="Alert type")
parser.add_argument("--summary", default="", help="Short case summary")
parser.add_argument("--top-k", type=int, default=3, help="Number of results to return")
parser.add_argument("--cases-dir", default="evaluation/datasets/normalized_cases", help="Normalized case dataset directory")
parser.add_argument("--knowledge-dir", default="evaluation/datasets/normalized_kb", help="Normalized knowledge dataset directory")
parser.add_argument("--case-uri", default=CASE_URI_PREFIX, help="OpenViking case URI prefix")
parser.add_argument("--knowledge-uri", default=KNOWLEDGE_URI_PREFIX, help="OpenViking knowledge URI prefix")
args = parser.parse_args()
query = RetrievalQuery(
scenario=args.scenario,
alert_type=args.alert_type,
summary=args.summary,
top_k=args.top_k,
)
if args.backend == "openviking":
result = asyncio.run(retrieve_context_openviking(query, case_uri=args.case_uri, knowledge_uri=args.knowledge_uri))
else:
result = retrieve_context_local(query, cases_dir=args.cases_dir, knowledge_dir=args.knowledge_dir)
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -1,17 +0,0 @@
# summarize_case_skill
This skill turns a normalized SOC case record into a reusable Obsidian case note.
Current scope:
- input: normalized case JSON from `evaluation/datasets/normalized_cases/`
- output: markdown case note under `obsidian-vault/02_Cases/`
- goal: produce a clean analyst-facing note, not a raw process dump
Typical usage:
```bash
source /home/tom/OpenViking/.venv/bin/activate
PYTHONPATH=/home/tom/soc_memory_poc python /home/tom/soc_memory_poc/skills/summarize_case_skill/generate_case_note.py \
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json \
--output-dir /home/tom/soc_memory_poc/obsidian-vault/02_Cases
```

View File

@ -1,21 +0,0 @@
# summarize_case_skill
## Purpose
Summarize one normalized SOC case into a high-quality Obsidian case note that can be reviewed and maintained by analysts.
## Inputs
- A normalized case JSON document
- Optional output directory for Obsidian notes
## Outputs
- One markdown case note per case
- Stable structure aligned with the vault template
## Guardrails
- Do not dump raw logs or full tool traces
- Keep only reusable evidence, conclusions, and response guidance
- Prefer linked references to playbooks, KBs, and related cases
- Preserve case identifiers and observable values exactly
## Current implementation
Use `generate_case_note.py` to render a local markdown note from a normalized case.

View File

@ -1,346 +0,0 @@
"""Generate an Obsidian case note from a normalized SOC case JSON file."""
from __future__ import annotations
import argparse
import asyncio
import json
from pathlib import Path
from typing import Any
from skills.retrieve_context_skill.retrieve_context import RetrievalQuery, retrieve_context_openviking
def _load_case(path: str | Path) -> dict[str, Any]:
with Path(path).open("r", encoding="utf-8") as f:
return json.load(f)
def _extract_alert_type(patterns: list[str]) -> str:
for pattern in patterns:
if pattern.startswith("alert_type:"):
return pattern.split(":", 1)[1]
return "unknown"
def _verdict_label(verdict: str) -> str:
mapping = {
"true_positive": "真报",
"false_positive": "误报",
"suspicious": "可疑待定",
}
return mapping.get(verdict, verdict or "未知")
def _join_values(values: list[str]) -> str:
return ", ".join(values) if values else ""
def _bullet_lines(values: list[str], default: str = "- 无") -> str:
if not values:
return default
return "\n".join(f"- {value}" for value in values)
def _wikilinks(values: list[str]) -> str:
if not values:
return ""
return ", ".join(f"[[{value}]]" for value in values)
def _uri_to_id(uri: str) -> str:
name = uri.rstrip("/").rsplit("/", 1)[-1]
if name.endswith(".json"):
name = name[:-5]
return name
def _derive_process_summary(item: dict[str, Any]) -> list[str]:
steps: list[str] = []
if item.get("abstract"):
steps.append(f"确认告警场景与核心风险:{item['abstract']}")
if item.get("evidence"):
steps.append(f"提取关键证据并交叉验证:{item['evidence'][0]}")
related = item.get("related_refs", {})
if related.get("playbooks") or related.get("kb"):
steps.append("对照关联 playbook / KB 复核告警模式与处置路径。")
if item.get("verdict"):
steps.append(f"基于关键证据与场景模式完成结论判定:{_verdict_label(item['verdict'])}")
return steps[:4]
def _derive_disposition(item: dict[str, Any]) -> list[str]:
verdict = item.get("verdict", "")
evidence = item.get("evidence", [])
lines: list[str] = []
if verdict:
lines.append(f"结论为{_verdict_label(verdict)}")
if evidence:
lines.append(f"最关键依据:{evidence[0]}")
if len(evidence) > 1:
lines.append(f"补充依据:{evidence[1]}")
return lines
def _derive_actions(item: dict[str, Any]) -> list[str]:
scenario = item.get("scenario", "")
verdict = item.get("verdict", "")
actions: list[str] = []
if scenario == "phishing":
actions.extend([
"隔离相同主题、发件人或 URL 的邮件样本。",
"核查用户是否点击或提交凭据,并按需执行凭据重置。",
])
elif scenario == "o365_suspicious_login":
actions.extend([
"复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。",
"若存在账号接管迹象,立即执行会话失效和凭据重置。",
])
else:
actions.append("结合关联 playbook 执行后续处置。")
if verdict == "false_positive":
actions = ["记录误报原因,并更新检测例外或抑制条件。"]
return actions
def _derive_reusable_patterns(item: dict[str, Any]) -> tuple[list[str], list[str], list[str]]:
patterns = item.get("patterns", [])
tags = item.get("tags", [])
hit_patterns = [pattern for pattern in patterns if not pattern.startswith("verdict:")]
false_positive_traits = []
variants = []
if item.get("verdict") == "false_positive":
false_positive_traits.append("本案最终确认为误报,可用于补充抑制条件。")
if tags:
variants.append("相关标签:" + ", ".join(tags))
return hit_patterns or [""], false_positive_traits or [""], variants or [""]
async def _fetch_openviking_recommendations(item: dict[str, Any], top_k: int = 3) -> dict[str, list[dict[str, Any]]]:
query = RetrievalQuery(
scenario=item.get("scenario", "general"),
alert_type=_extract_alert_type(item.get("patterns", [])),
summary=item.get("abstract", ""),
observables=item.get("observables"),
top_k=top_k + 1,
)
result = await retrieve_context_openviking(query)
case_entries: list[dict[str, Any]] = []
for entry in result.get("matched_cases", []):
candidate_id = _uri_to_id(entry.get("uri", ""))
if candidate_id == item.get("id"):
continue
case_entries.append(
{
"id": candidate_id,
"score": round(float(entry.get("score") or 0), 3),
"abstract": entry.get("abstract", ""),
}
)
if len(case_entries) >= top_k:
break
knowledge_entries: list[dict[str, Any]] = []
for entry in result.get("matched_knowledge", []):
knowledge_entries.append(
{
"id": _uri_to_id(entry.get("uri", "")),
"score": round(float(entry.get("score") or 0), 3),
"abstract": entry.get("abstract", ""),
}
)
if len(knowledge_entries) >= top_k:
break
return {
"cases": case_entries,
"knowledge": knowledge_entries,
}
def _merge_unique(primary: list[str], secondary: list[str]) -> list[str]:
merged: list[str] = []
for value in primary + secondary:
if value and value not in merged:
merged.append(value)
return merged
def _recommendation_lines(entries: list[dict[str, Any]], prefix: str) -> list[str]:
lines: list[str] = []
for entry in entries:
abstract = entry.get("abstract", "")
abstract = abstract[:140] + "..." if len(abstract) > 140 else abstract
lines.append(f"[[{entry['id']}]] ({prefix} score={entry['score']}) {abstract}")
return lines
def render_case_note(item: dict[str, Any], recommendations: dict[str, list[dict[str, Any]]] | None = None) -> str:
case_id = item["id"]
title = item.get("title", case_id)
alert_type = _extract_alert_type(item.get("patterns", []))
severity = item.get("severity", "unknown")
verdict = _verdict_label(item.get("verdict", ""))
entities = item.get("entities", {})
observables = item.get("observables", {})
related = item.get("related_refs", {})
recommendations = recommendations or {"cases": [], "knowledge": []}
recommended_cases = [entry["id"] for entry in recommendations.get("cases", [])]
recommended_knowledge = [entry["id"] for entry in recommendations.get("knowledge", [])]
merged_cases = _merge_unique(related.get("cases", []), recommended_cases)
playbooks = related.get("playbooks", [])
kb_items = related.get("kb", [])
for knowledge_id in recommended_knowledge:
if knowledge_id.startswith("PB-"):
playbooks = _merge_unique(playbooks, [knowledge_id])
else:
kb_items = _merge_unique(kb_items, [knowledge_id])
process_summary = _derive_process_summary(item)
disposition = _derive_disposition(item)
actions = _derive_actions(item)
hit_patterns, false_positive_traits, variants = _derive_reusable_patterns(item)
tags = ["#case", f"#scenario/{item.get('scenario', 'general')}", f"#alert/{alert_type}"]
if item.get("verdict"):
tags.append(f"#verdict/{item['verdict'].replace('_', '-')}")
tags.extend(f"#{tag}" for tag in item.get("tags", []))
recommendation_case_lines = _recommendation_lines(recommendations.get("cases", []), "case")
recommendation_knowledge_lines = _recommendation_lines(recommendations.get("knowledge", []), "knowledge")
lines = [
"---",
f"case_id: {case_id}",
f"scenario: {item.get('scenario', 'general')}",
f"alert_type: {alert_type}",
f"severity: {severity}",
f"verdict: {item.get('verdict', 'unknown')}",
"source: soc-memory-poc",
f"openviking_enriched: {'true' if recommendation_case_lines or recommendation_knowledge_lines else 'false'}",
"---",
"",
f"# {case_id} {title}",
"",
"## 基本信息",
"",
f"- Case ID: {case_id}",
f"- 标题: {title}",
f"- 告警类型: {alert_type}",
f"- 来源系统: SOC Memory POC Mock Dataset",
f"- 时间范围: 待补充",
f"- 研判人 / Agent: AI Agent Draft",
f"- 最终结论: {verdict}",
f"- 严重等级: {severity}",
"",
"## 告警摘要",
"",
item.get("abstract", ""),
"",
"## 关键实体",
"",
f"- 用户: {_join_values(entities.get('users', []))}",
f"- 主机: {_join_values(entities.get('hosts', []))}",
f"- 邮箱: {_join_values(entities.get('mailboxes', []))}",
f"- IP: {_join_values(observables.get('ips', []))}",
f"- 域名: {_join_values(observables.get('domains', []))}",
f"- 文件 Hash: {_join_values(observables.get('hashes', []))}",
f"- 其他 IOC: {_join_values(observables.get('urls', []) + observables.get('sender_emails', []))}",
"",
"## 关键证据",
"",
_bullet_lines(item.get("evidence", [])),
"",
"## 研判过程摘要",
"",
"\n".join(f"{index}. {step}" for index, step in enumerate(process_summary, start=1)),
"",
"## 结论依据",
"",
_bullet_lines(disposition),
"",
"## 处置建议",
"",
_bullet_lines(actions),
"",
"## 可复用模式",
"",
f"- 命中模式: {_join_values(hit_patterns)}",
f"- 误报特征: {_join_values(false_positive_traits)}",
f"- 需关注的变体: {_join_values(variants)}",
"",
"## 关联知识",
"",
f"- 关联 Playbook: {_wikilinks(playbooks)}",
f"- 关联 KB: {_wikilinks(kb_items)}",
f"- 关联历史 Case: {_wikilinks(merged_cases)}",
f"- 关联实体: {_wikilinks(entities.get('users', []) + entities.get('hosts', []))}",
"",
"## 自动关联推荐",
"",
"### 推荐历史 Case",
"",
_bullet_lines(recommendation_case_lines),
"",
"### 推荐知识条目",
"",
_bullet_lines(recommendation_knowledge_lines),
"",
"## Lessons Learned",
"",
"- 本案可沉淀为后续同类告警的快速判定参考。",
"- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。",
"",
"## 标签",
"",
_bullet_lines(tags),
"",
]
return "\n".join(lines)
def build_output_path(item: dict[str, Any], output_dir: str | Path) -> Path:
scenario = item.get("scenario", "general")
case_id = item["id"]
safe_title = item.get("title", case_id).replace("/", "-")
return Path(output_dir) / scenario / f"{case_id} - {safe_title}.md"
async def generate_case_note_async(
input_path: str | Path,
output_dir: str | Path,
enrich_from_openviking: bool = False,
top_k: int = 3,
) -> Path:
item = _load_case(input_path)
recommendations: dict[str, list[dict[str, Any]]] | None = None
if enrich_from_openviking:
recommendations = await _fetch_openviking_recommendations(item, top_k=top_k)
output_path = build_output_path(item, output_dir)
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(render_case_note(item, recommendations=recommendations), encoding="utf-8")
return output_path
def main() -> None:
parser = argparse.ArgumentParser(description="Generate an Obsidian case note from a normalized case JSON file.")
parser.add_argument("--input", required=True, help="Normalized case JSON path")
parser.add_argument("--output-dir", default="obsidian-vault/02_Cases", help="Obsidian cases output directory")
parser.add_argument("--enrich-from-openviking", action="store_true", help="Retrieve related cases and knowledge from OpenViking")
parser.add_argument("--top-k", type=int, default=3, help="Number of OpenViking recommendations per type")
args = parser.parse_args()
output_path = asyncio.run(
generate_case_note_async(
args.input,
args.output_dir,
enrich_from_openviking=args.enrich_from_openviking,
top_k=args.top_k,
)
)
print(output_path)
if __name__ == "__main__":
main()

View File

@ -60,7 +60,7 @@ def install_test_stubs() -> None:
install_test_stubs() install_test_stubs()
from memory_gateway.server import app from memory_gateway.server import app
from memory_gateway.types import Config, SearchResult, ServerConfig from memory_gateway.types import Config, ObsidianConfig, SearchResult, ServerConfig
class FakeOVClient: class FakeOVClient:
@ -71,7 +71,7 @@ class FakeOVClient:
return SearchResult( return SearchResult(
results=[ results=[
{ {
"uri": "viking://soc/test", "uri": "viking://memory-gateway/test",
"abstract": query, "abstract": query,
"score": 1.0, "score": 1.0,
"context_type": "memory", "context_type": "memory",
@ -107,6 +107,16 @@ async def fake_get_openviking_client():
return FakeOVClient() return FakeOVClient()
async def fake_summarize_with_llm(content, **kwargs):
return {
"title": kwargs.get("title") or "Fake LLM title",
"summary": f"LLM summary: {content[:80]}",
"key_points": ["LLM key point", "Preserve IP 198.51.100.20"],
"tags": kwargs.get("tags") or ["fake"],
"llm": {"provider": "fake", "model": "fake-model"},
}
def build_headers(api_key: str | None): def build_headers(api_key: str | None):
return {"x-api-key": api_key} if api_key is not None else {} return {"x-api-key": api_key} if api_key is not None else {}
@ -120,6 +130,7 @@ def test_health_requires_api_key(monkeypatch):
"memory_gateway.server.get_openviking_client", "memory_gateway.server.get_openviking_client",
fake_get_openviking_client, fake_get_openviking_client,
) )
monkeypatch.setattr("memory_gateway.server.summarize_with_llm", fake_summarize_with_llm)
with TestClient(app) as client: with TestClient(app) as client:
response = client.get("/health") response = client.get("/health")
@ -149,7 +160,8 @@ def test_mcp_rpc_lists_tools_with_api_key(monkeypatch):
assert response.status_code == 200 assert response.status_code == 200
payload = response.json() payload = response.json()
assert payload["jsonrpc"] == "2.0" assert payload["jsonrpc"] == "2.0"
assert len(payload["result"]["tools"]) == 6 assert len(payload["result"]["tools"]) == 7
assert any(tool["name"] == "commit_summary" for tool in payload["result"]["tools"])
def test_search_passes_through_gateway(monkeypatch): def test_search_passes_through_gateway(monkeypatch):
@ -168,3 +180,73 @@ def test_search_passes_through_gateway(monkeypatch):
payload = response.json() payload = response.json()
assert payload["total"] == 1 assert payload["total"] == 1
assert payload["results"][0]["abstract"] == "phishing" assert payload["results"][0]["abstract"] == "phishing"
def test_summary_endpoint_builds_generic_artifact(monkeypatch):
monkeypatch.setattr(
"memory_gateway.server.get_config",
lambda: Config(server=ServerConfig(api_key="")),
)
monkeypatch.setattr(
"memory_gateway.server.get_openviking_client",
fake_get_openviking_client,
)
monkeypatch.setattr("memory_gateway.server.summarize_with_llm", fake_summarize_with_llm)
with TestClient(app) as client:
response = client.post(
"/api/summary",
json={
"title": "Demo investigation summary",
"content": "结论:这是一次高价值沉淀。\n- 证据:命中历史 case。\n- 建议:后续复用该处置路径。",
"namespace": "demo",
"memory_type": "knowledge",
"tags": ["demo", "summary"],
"persist_as": "none",
},
)
assert response.status_code == 200
payload = response.json()
assert payload["status"] == "ok"
assert payload["artifact"]["title"] == "Demo investigation summary"
assert payload["artifact"]["namespace"] == "demo"
assert payload["artifact"]["memory_type"] == "knowledge"
assert payload["artifact"]["summary"].startswith("LLM summary:")
assert payload["artifact"]["llm"]["provider"] == "fake"
assert payload["memory_result"] is None
assert payload["resource_result"] is None
def test_knowledge_upload_converts_saves_and_commits(monkeypatch, tmp_path):
monkeypatch.setattr(
"memory_gateway.server.get_config",
lambda: Config(
server=ServerConfig(api_key=""),
obsidian=ObsidianConfig(vault_path=str(tmp_path / "vault"), knowledge_dir="01_Knowledge/Uploaded"),
),
)
monkeypatch.setattr("memory_gateway.server.get_openviking_client", fake_get_openviking_client)
monkeypatch.setattr("memory_gateway.server.summarize_with_llm", fake_summarize_with_llm)
monkeypatch.setattr("memory_gateway.server.convert_file_to_markdown", lambda path: "# Uploaded Doc\n\nImportant uploaded knowledge.")
with TestClient(app) as client:
response = client.post(
"/api/knowledge/upload",
data={
"title": "Uploaded Knowledge",
"namespace": "demo",
"knowledge_type": "playbook",
"tags": "demo,upload",
"persist_as": "resource",
},
files={"file": ("sample.txt", b"hello", "text/plain")},
)
assert response.status_code == 200
payload = response.json()
assert payload["status"] == "ok"
assert payload["artifact"]["schema_version"] == "memory-gateway.knowledge_upload.v1"
assert payload["artifact"]["knowledge_type"] == "playbook"
assert payload["artifact"]["markdown_content"].startswith("# Uploaded Doc")
assert payload["resource_result"]["status"] == "ok"
assert (tmp_path / "vault" / payload["artifact"]["obsidian_relative_path"]).exists()