Initial SOC memory POC implementation

This commit is contained in:
2026-04-27 17:13:06 +08:00
parent fc68581198
commit e6b1520bce
89 changed files with 7610 additions and 1 deletions

27
.gitignore vendored Normal file
View File

@ -0,0 +1,27 @@
# Local runtime configuration
config.yaml
*.local.yaml
*.secret.yaml
.env
.env.*
# Python cache / test artifacts
__pycache__/
*.py[cod]
.pytest_cache/
.ruff_cache/
.mypy_cache/
.coverage
htmlcov/
# Virtual environments
.venv/
venv/
# Local editor / agent metadata
.codex
.DS_Store
# Runtime output
*.log
*.tmp

View File

@ -100,7 +100,12 @@ obsidian-vault/
### 7. Hermes Agent 集成
已在本机 Hermes skill 目录创建 `soc-memory-poc` skill
已在本机 Hermes skill 目录创建 `soc-memory-poc` skill,并在仓库中保留了一份可版本化副本
- 本机 Hermes 实际加载路径:`/home/tom/.hermes/skills/soc-memory-poc/`
- 仓库副本路径:`integrations/hermes/soc-memory-poc/`
本机 Hermes skill 文件结构:
```text
/home/tom/.hermes/skills/soc-memory-poc/

1113
SOC-Memory-POC-Design.md Normal file

File diff suppressed because it is too large Load Diff

32
config.example.yaml Normal file
View File

@ -0,0 +1,32 @@
# Memory Gateway 配置示例
# 复制为 config.yaml 并根据实际情况修改
# Memory Gateway 服务配置
server:
# 监听地址0.0.0.0 表示接受所有网卡(局域网可访问)
host: "0.0.0.0"
# MCP Server 端口
port: 1934
# 可选API Key 认证,客户端需要提供相同的 Key
api_key: ""
# OpenViking 后端配置
openviking:
# OpenViking 服务器地址
url: "http://localhost:1933"
# OpenViking API Key如有
api_key: ""
# 请求超时时间(秒)
timeout: 30
# 记忆配置
memory:
# 默认命名空间
default_namespace: "soc"
# 默认搜索返回数量
search_limit: 10
# 日志配置
logging:
level: "INFO"
format: "%(asctime)s - %(name)s - %(levelname)s - %(message)s"

190
docs/architecture.md Normal file
View File

@ -0,0 +1,190 @@
# Architecture
## 整体目标
构建一个面向 SOC case 研判辅助的记忆系统 POC用于提升 AI agent 在以下环节的效果:
- 告警研判
- 历史 case 检索
- 上下文补全
- 结论生成
- 高价值记忆沉淀
## 总体架构图
```text
┌────────────────────────────┐
│ 知识源 / 数据源 │
│ KB / Playbook / 月报 / 报告 │
│ Ticket / Intel / 历史 Case │
└─────────────┬──────────────┘
│ ingest / normalize
┌──────────────────────────────┐
│ Pipeline 层 │
│ connectors / transforms / jobs│
└─────────────┬────────────────┘
│ extracted inputs
┌──────────────────────────────┐
│ Skills 层 │
│ ingest / classify / retrieve │
│ summarize / commit / prune │
└───────┬─────────────┬────────┘
│ │
query/write │ │ write notes / long-term
▼ ▼
┌────────────────────┐ ┌────────────────────┐
│ Memory Gateway │ │ Obsidian Vault │
│ MCP / REST / Auth │ │ Human-maintained │
└─────────┬──────────┘ └────────────────────┘
┌────────────────────┐
│ OpenViking │
│ context / memory │
│ resources / skills │
└─────────┬──────────┘
┌─────────┴──────────┐
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Session / Online │ │ EverMemOS │
│ retrieval │ │ long-term memory │
└──────────────────┘ └──────────────────┘
┌────────────────────┐
│ AI Agent / Harness │
│ Nanobot / Hermes │
│ OpenClaw / others │
└────────────────────┘
```
## 分层说明
### 1. 知识源层
外部系统和已有资料:
- KB
- Playbook
- 月报
- 报告
- Ticket system
- 情报系统
- 历史 case
特点:
- 来源多样
- 结构不一致
- 不能直接全部当记忆使用
### 2. Pipeline 层
负责:
- 数据接入
- 格式标准化
- 提取元数据
- 过滤噪声
边界:
- 不做最终检索
- 不做最终长期沉淀判断
### 3. Skills 层
负责:
- 抽取高价值记忆
- 分类为 knowledge / case / process / session
- 检索相关上下文
- 生成 case 总结
- 写回 OpenViking / Obsidian / EverMemOS
这是整套系统的流程编排层。
### 4. Memory Gateway 层
负责:
- 给 AI agent 提供统一入口
- 屏蔽 OpenViking 细节
- 提供 MCP / REST 接口
- 处理鉴权和协议兼容
### 5. OpenViking 统一上下文层
负责:
- 保存 memory
- 保存 resources
- 组织 skills
- 按 namespace 管理不同类型上下文
### 6. Obsidian 层
负责人工可维护的知识沉淀:
- 高质量 case note
- playbook
- 月报 / 报告摘要
- 关键实体说明
### 7. EverMemOS 层
负责后台长期记忆整理:
- episode -> long-term memory
- 去重
- 合并
- 更新
- 衰减
## 多 Agent 共享方式
多 agent 不直接彼此共享临时内存,而是通过统一上下文层协作:
- 公共稳定知识走 `soc/knowledge`
- 历史案例走 `soc/case`
- 当前任务走 `session/<session_id>`
- agent 私有偏好走 `agent/<agent_id>`
这样可以做到:
- 公共知识共享
- 当前会话隔离
- 不同 agent 框架可复用同一体系
## 检索质量控制原则
为避免“所有东西全塞进去”导致检索质量下降,必须坚持:
- 原始资料不直接全部进入长期记忆
- 只保留高价值摘要、模式、结论、证据
- session / process memory 默认短期保留
- 历史 case 和 playbook 优先于泛知识
- Obsidian 只放人工维护内容,不放全量原文
## 第一阶段默认方案
第一阶段推荐组合:
- OpenViking统一 context / memory 层
- Memory Gateway统一访问入口
- Skills检索、总结、沉淀
- Obsidian人工可维护知识沉淀
- EverMemOS后台长期记忆整理
推荐原因:
- 模块边界清晰
- 最适合 POC 小步快跑
- 最容易控制系统复杂度
- 最容易对不同 agent 框架复用

138
docs/data-model.md Normal file
View File

@ -0,0 +1,138 @@
# Data Model
## 目标
这个数据模型面向 SOC case 研判辅助场景,不追求全量归档,而强调高价值记忆抽取。
## 数据分层
### 1. Knowledge Memory
适用内容:
- KB
- Playbook
- 月报摘要
- 报告摘要
- PO
- 检测规则说明
特点:
- 偏稳定、可复用
- 面向方法、知识、模式
- 适合长期保存
建议字段:
- `id`
- `title`
- `source_type`
- `summary`
- `tags`
- `entities`
- `ttp`
- `confidence`
- `updated_at`
### 2. Case Memory
适用内容:
- 历史 case
- 最终研判结论
- 关键证据
- 误报 / 真报模式
- 处置建议
特点:
- 面向具体案例
- 适合检索相似 case
- 是 POC 阶段最重要的数据层
建议字段:
- `case_id`
- `title`
- `alert_type`
- `verdict`
- `summary`
- `key_evidence`
- `entities`
- `detection_logic`
- `lessons_learned`
- `source_links`
### 3. Process Memory
适用内容:
- agent 中间步骤
- 工具调用结果
- 推理路径
- 临时分析结论
特点:
- 生命周期短
- 价值不均匀
- 只应抽取高价值部分转化为长期记忆
建议字段:
- `session_id`
- `step_id`
- `tool_name`
- `observation`
- `intermediate_conclusion`
- `value_score`
- `timestamp`
### 4. Profile / Preference Memory
适用内容:
- analyst 偏好
- 默认输出风格
- 常用研判路径
特点:
- 数量小
- 用于个性化辅助
建议字段:
- `user_id`
- `preference_type`
- `value`
- `scope`
### 5. Session Memory
适用内容:
- 当前 case 的上下文
- 当前轮对话、当前任务的临时缓存
特点:
- 强时效
- 默认不长期保留
建议字段:
- `session_id`
- `task_id`
- `active_entities`
- `active_hypotheses`
- `recent_observations`
- `expires_at`
## 设计原则
- 原始材料不直接当记忆
- 只沉淀对后续研判有帮助的高价值信息
- Process Memory 默认短期,经过抽取后才升级为长期记忆
- Knowledge 与 Case 是 POC 阶段优先建设的两层

View File

@ -0,0 +1,91 @@
# Hermes Demo Prompts
## Recommended: Raw Email / Freeform Alert
Use this when you want to show that Hermes does not need a rigid input schema. The `soc-memory-poc` skill should route the content through `triage_email.py`, extract useful fields, retrieve memory, search Obsidian, and return the fixed SOC triage sections.
```text
Use the soc-memory-poc skill. Triage this email alert and include Memory Retrieval and Obsidian references.
From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12.
Return exactly these sections:
研判结果
关键证据
关联 Memory Retrieval
关联 Obsidian 文档
建议动作
```
Equivalent direct script check:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
```
## Structured Phishing Alert
Use this when you want maximum repeatability with explicit fields.
```text
Use the soc-memory-poc skill. Treat the following as a structured SOC alert and use the preferred Scheme A path.
Scenario: phishing
Alert type: mail_suspicious_attachment
User: alice@corp.example
Host: FIN-LAPTOP-12
Sender: billing@vendor-payments.com
Subject: Invoice overdue notice
Attachment: invoice_review.html
URL: https://vendor-payments-login.com/review
IP: 198.51.100.20
Known facts:
- DMARC failed
- User may have clicked the link
Return exactly these sections:
研判结果
关键证据
关联 Memory Retrieval
关联 Obsidian 文档
建议动作
```
## Structured O365 Alert
```text
Use the soc-memory-poc skill. Treat the following as a structured SOC alert and use the preferred Scheme A path.
Scenario: o365_suspicious_login
Alert type: azuread_impossible_travel
User: david@corp.example
Host: WS-DAVID-01
IP: 203.0.113.150
Known facts:
- Impossible travel observed between Shanghai and Amsterdam within 15 minutes
- MFA fatigue occurred before final success
- User denied initiating the overseas login
- Inbox rule creation was observed after login
Return exactly these sections:
研判结果
关键证据
关联 Memory Retrieval
关联 Obsidian 文档
建议动作
```
## Generate Case Note
```text
Use the soc-memory-poc skill. Generate an Obsidian case note for /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0003.json with OpenViking enrichment, then tell me the output path and confirm whether the note was written successfully.
```

120
docs/namespaces.md Normal file
View File

@ -0,0 +1,120 @@
# OpenViking Namespaces
## 目标
通过明确 namespace 和 URI 组织方式,把 OpenViking 用作统一的 context / memory gateway。
## 推荐 namespace
### 1. `soc/knowledge`
用于稳定知识:
- KB
- Playbook
- 月报摘要
- 报告摘要
- PO
示例:
- `viking://soc/knowledge/kb/phishing-mail-header-analysis`
- `viking://soc/knowledge/playbook/o365-suspicious-login`
### 2. `soc/case`
用于历史案例和 case 结论:
- 历史 case
- 真报 / 误报模式
- 关键证据
示例:
- `viking://soc/case/true-positive/case-2026-00128`
- `viking://soc/case/false-positive/case-2026-00072`
### 3. `soc/process`
用于流程级记忆:
- agent 中间分析
- 工具输出摘要
- 可复用的中间判断模式
示例:
- `viking://soc/process/session-abc123/step-04`
### 4. `session/<session_id>`
用于当前任务的临时上下文。
示例:
- `viking://session/incident-20260421-001/context`
- `viking://session/incident-20260421-001/tools`
### 5. `agent/<agent_id>`
用于 agent 级别的私有或半私有上下文。
示例:
- `viking://agent/hermes-soc/default`
- `viking://agent/nanobot-soc/preferences`
### 6. `user/<user_id>`
用于 analyst 偏好、展示习惯等小规模 profile 信息。
示例:
- `viking://user/alice/preferences`
## 资源组织建议
### memory
适用于:
- 高价值摘要
- case 结论
- pattern
- lesson learned
### resources
适用于:
- 原始附件链接
- 外部文档引用
- Obsidian note 路径
- ticket / report / intel 引用
### skills
适用于:
- 检索 skill
- 记忆抽取 skill
- case 沉淀 skill
## 检索顺序建议
当前 case 发生检索时,建议按以下顺序召回:
1. `session/<session_id>`
2. `soc/case`
3. `soc/knowledge`
4. `agent/<agent_id>`
5. `user/<user_id>`
这样可以优先保证“当前上下文”和“历史相似 case”的相关性不让通用知识淹没 case 信号。
## 约束建议
- 不要把所有原始资料直接写入 `soc/knowledge`
- `soc/process` 默认应该设置清理策略
- 长期稳定内容再写入 `soc/knowledge``soc/case`
- Obsidian 只存人工可维护的摘要和结构化沉淀,不做全量原文仓

130
docs/poc-scope.md Normal file
View File

@ -0,0 +1,130 @@
# POC Scope
## 目标
第一阶段 POC 只验证一件事:
**高价值记忆抽取 + 相似 case / 知识召回,是否能有效提升 SOC case 研判效率和质量。**
## POC 范围
### 聚焦 case 类型
建议只选 1 到 2 类典型场景:
1. 钓鱼邮件 / 恶意附件
2. O365 异常登录 / 疑似账号被盗
原因:
- 数据可获得性较高
- 历史 case 重用价值高
- playbook / KB 通常较完整
- 便于定义“相似 case 命中率”
## 第一阶段只接入的数据
### 必接
- 历史 case
- KB
- Playbook
### 可选接入
- 月报摘要
- 报告摘要
### 暂不接入
- ticket system 双向同步
- 全量情报系统自动拉取
- 全量报告原文
- 大规模 process trace 持久化
- analyst 偏好个性化
## 第一阶段要做的能力
### 必做
- 历史 case 导入
- KB / Playbook 导入
- 高价值信息抽取
- 基于当前 case 的相关上下文检索
- case 总结沉淀
- 结构化写回 OpenViking
- 生成 Obsidian case note
### 第二阶段再做
- EverMemOS 长期整理自动化
- 更复杂的去重和衰减
- 多数据源自动同步
- 多 agent 协同策略优化
## 不做的事情
为了保证 POC 可落地,第一阶段明确不做:
- 泛化的企业级记忆平台
- 所有原始数据全量入库
- 全量全文检索系统重构
- 覆盖所有 SOC 告警类型
- 复杂权限系统
- 完整的在线标注平台
## 交付物
第一阶段建议交付:
1. 可运行的 memory gateway
2. 一批可导入的历史 case 与 KB / Playbook 样本
3. 最小的 ingest / retrieve / summarize / commit 闭环
4. Obsidian 模板和样例 note
5. 一份 baseline 与 POC 对比评估结果
## 2 到 4 周实施建议
### 第 1 周
- 冻结 POC 范围
- 整理样本数据
- 完成数据模型与 namespace 约定
- 建好 Obsidian 模板
### 第 2 周
- 完成历史 case / KB 导入脚本
- 完成 `retrieve_context_skill`
- 接通 OpenViking 的 `soc/case``soc/knowledge`
### 第 3 周
- 完成 `summarize_case_skill`
- 完成 `commit_memory_skill`
- 输出标准 case note 到 Obsidian
### 第 4 周
- 跑评估脚本
- 做人工 review
- 收敛下一阶段需求
## 评估指标
建议至少跟踪以下指标:
- 相似 case 命中率
- 检索上下文相关性
- 平均研判时间
- 最终结论准确率
- 人工满意度
## 验收标准
POC 第一阶段可以认为成功,当同时满足:
- 能稳定召回相关历史 case 或知识
- 能辅助生成结构化 case note
- 人工评估认为上下文质量有明显提升
- 没有因为“塞入太多资料”导致检索明显劣化

188
docs/sample-data-spec.md Normal file
View File

@ -0,0 +1,188 @@
# Sample Data Spec
## 目标
这个文档定义 SOC Memory POC 在无真实数据阶段使用的 mock 数据格式,用于:
- 验证 ingestion pipeline
- 验证标准化脚本
- 验证 context retrieval
- 验证 case summary 与 memory commit 流程
当前只覆盖两类场景:
- 钓鱼邮件
- O365 异常登录 / 疑似账号被盗
## 目录约定
```text
evaluation/datasets/
├── mock_cases/
│ ├── phishing/
│ └── o365_suspicious_login/
└── mock_kb/
├── playbooks/
├── kb/
└── reports/
```
## Mock Case 原始格式
每个 case 使用一个 JSON 文件,文件名建议:
```text
<case_id>.json
```
### 字段定义
| 字段 | 类型 | 必填 | 说明 |
|---|---|---:|---|
| `case_id` | string | 是 | case 唯一 ID |
| `title` | string | 是 | 简短标题 |
| `scenario` | string | 是 | `phishing``o365_suspicious_login` |
| `alert_type` | string | 是 | 告警类型 |
| `severity` | string | 是 | `low` / `medium` / `high` / `critical` |
| `status` | string | 是 | `confirmed` / `false_positive` / `pending` |
| `time_window` | object | 是 | 开始和结束时间 |
| `summary` | string | 是 | 一句话摘要 |
| `alert_source` | string | 是 | 告警来源系统 |
| `entities` | object | 是 | 关键实体 |
| `observables` | object | 否 | IOC/可观测对象 |
| `evidence` | array | 是 | 关键证据列表 |
| `investigation_steps` | array | 是 | 关键调查步骤 |
| `conclusion` | object | 是 | 研判结论 |
| `related_refs` | object | 否 | 相关 KB / playbook / case |
| `lessons_learned` | array | 否 | 复用经验 |
| `tags` | array | 否 | 标签 |
### 示例骨架
```json
{
"case_id": "CASE-2026-0001",
"title": "Potential phishing email targeting finance user",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "high",
"status": "confirmed",
"time_window": {
"start": "2026-04-01T09:10:00+08:00",
"end": "2026-04-01T11:30:00+08:00"
},
"summary": "Finance user received an invoice-themed phishing email with a malicious HTML attachment.",
"alert_source": "Secure Email Gateway",
"entities": {
"users": ["alice@corp.example"],
"hosts": ["FIN-LAPTOP-12"],
"mailboxes": ["alice@corp.example"]
},
"observables": {
"sender_emails": ["billing@vendor-payments.com"],
"domains": ["vendor-payments.com"],
"urls": ["https://vendor-payments-login.com/review"],
"hashes": ["sha256:..."],
"ips": ["198.51.100.20"]
},
"evidence": [
"The sender domain was newly observed and failed DMARC.",
"The attachment redirected the user to a credential harvesting page."
],
"investigation_steps": [
"Validate sender reputation and authentication results.",
"Detonate attachment in sandbox.",
"Check click telemetry and account sign-in logs."
],
"conclusion": {
"verdict": "true_positive",
"reason": "Multiple aligned phishing indicators and confirmed click behavior.",
"recommended_actions": [
"Reset the impacted account password.",
"Block the sender domain and landing URL."
]
},
"related_refs": {
"playbooks": ["PB-PHISH-001"],
"kb": ["KB-PHISH-HEADER-CHECK"],
"cases": []
},
"lessons_learned": [
"Invoice-themed phishing remains effective against finance users."
],
"tags": ["phishing", "email", "credential-harvest"]
}
```
## Mock KB / Playbook 原始格式
每个知识条目使用一个 JSON 文件,文件名建议:
```text
<doc_id>.json
```
### 字段定义
| 字段 | 类型 | 必填 | 说明 |
|---|---|---:|---|
| `doc_id` | string | 是 | 文档唯一 ID |
| `doc_type` | string | 是 | `kb` / `playbook` / `report_summary` |
| `title` | string | 是 | 标题 |
| `scenario` | string | 是 | 适用场景 |
| `summary` | string | 是 | 核心摘要 |
| `applicability` | array | 否 | 适用条件 |
| `key_points` | array | 是 | 核心知识点 |
| `investigation_guidance` | array | 否 | 调查建议 |
| `decision_points` | array | 否 | 判定关键点 |
| `related_entities` | object | 否 | 相关实体/TTP/IOC |
| `related_refs` | object | 否 | 相关文档 |
| `tags` | array | 否 | 标签 |
| `updated_at` | string | 否 | 更新时间 |
## 标准化输出目标
### 标准化后的 Case 结构
标准化脚本输出建议字段:
- `id`
- `memory_type` = `case`
- `scenario`
- `title`
- `abstract`
- `verdict`
- `severity`
- `entities`
- `observables`
- `evidence`
- `patterns`
- `related_refs`
- `source_path`
- `tags`
### 标准化后的 KB 结构
标准化脚本输出建议字段:
- `id`
- `memory_type` = `knowledge`
- `doc_type`
- `scenario`
- `title`
- `abstract`
- `key_points`
- `investigation_guidance`
- `decision_points`
- `related_refs`
- `source_path`
- `tags`
## 检索测试建议
在 mock 数据阶段,优先验证:
- 钓鱼 case 是否能召回 phishing playbook 和相似 phishing case
- O365 登录异常 case 是否能召回登录异常 KB 和相似 case
- 真报与误报 case 是否能被区分并保留不同模式
- 召回结果是否包含关键 evidence / decision points

View File

@ -0,0 +1,68 @@
# System Positioning
## 当前项目定位
`memory_gateway` 不是完整的 SOC 记忆系统,而是整套方案里的统一上下文入口层。
它当前承担的职责是:
- 为 AI agent 提供统一的 MCP / REST 访问入口
- 将检索和写入请求转发给 OpenViking
- 提供基础鉴权、协议兼容和网关能力
- 作为多 agent 共享记忆体系的最薄接入层
它不直接承担以下职责:
- 原始知识源的批量导入
- 高价值记忆抽取和筛选
- Obsidian Vault 的人工知识沉淀
- EverMemOS 的长期记忆整理与演化
- 评估数据集与实验流程管理
## 在整套 SOC 记忆系统中的位置
```text
SOC 数据源
KB / Playbook / 月报 / 报告 / Ticket / Intel / 历史 Case
|
v
Skills / Pipeline
ingest / extract / classify / summarize / commit / prune
|
v
memory_gateway
统一入口层MCP / REST / Auth / Routing
|
v
OpenViking
统一 context / memory / resource / skill 层
| |
v v
Obsidian Vault EverMemOS
人工沉淀层 长期整理层
```
## 下一阶段模块建议
建议把后续 POC 能力分成以下模块:
- `docs/`
保存系统设计、数据模型、命名空间规范
- `poc/skills/`
保存检索、抽取、沉淀相关的 skills
- `poc/pipeline/`
保存接入 ticket、intel、历史 case 的导入流程
- `poc/obsidian-vault/`
保存人工维护知识和 case note 模板
- `poc/evermemos/`
保存长期记忆整理逻辑和策略
- `poc/evaluation/`
保存数据集、评估脚本和结果
## 当前仓库边界建议
建议继续把本仓库控制在“网关项目”边界内:
- 保留服务入口、OpenViking 接入、配置、协议、测试
- 新增系统设计文档、POC 骨架目录
- 不建议继续堆积大量业务规则、海量导入脚本、Vault 内容本体

12
evaluation/README.md Normal file
View File

@ -0,0 +1,12 @@
# Evaluation
这个目录用于保存 POC 评估相关内容。
建议评估指标:
- 相似 case 命中率
- 研判时间缩短比例
- 结论准确率
- 人工满意度
建议 POC 先聚焦 1 到 2 类 SOC case。

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-1001",
"title": "Impossible travel login followed by MFA prompt fatigue",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_impossible_travel",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-02T22:10:00+08:00", "end": "2026-04-02T23:30:00+08:00"},
"summary": "User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["david@corp.example"], "hosts": ["WS-DAVID-01"], "mailboxes": ["david@corp.example"]},
"observables": {"ips": ["203.0.113.150", "198.51.100.61"], "domains": [], "urls": [], "hashes": []},
"evidence": ["Two successful sign-ins from geographically impossible locations within 15 minutes.", "MFA challenge volume increased abnormally before final success.", "User confirmed they did not initiate overseas login."],
"investigation_steps": ["Review sign-in logs and device IDs.", "Check MFA event sequence.", "Validate user travel status with manager."],
"conclusion": {"verdict": "true_positive", "reason": "Impossible travel plus user denial and MFA fatigue pattern.", "recommended_actions": ["Revoke sessions and reset credentials.", "Review mailbox rules and app consent."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-IMPOSSIBLE-TRAVEL", "KB-O365-MFA-FATIGUE"], "cases": []},
"lessons_learned": ["Impossible travel needs to be combined with user confirmation and MFA telemetry."],
"tags": ["o365", "login", "impossible-travel", "mfa-fatigue"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-1002",
"title": "Legacy protocol sign-in from unfamiliar IP blocked by policy",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_legacy_auth_attempt",
"severity": "medium",
"status": "false_positive",
"time_window": {"start": "2026-04-04T07:50:00+08:00", "end": "2026-04-04T08:10:00+08:00"},
"summary": "Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["svc-migration@corp.example"], "hosts": [], "mailboxes": ["svc-migration@corp.example"]},
"observables": {"ips": ["192.0.2.24"], "domains": [], "urls": [], "hashes": []},
"evidence": ["The account is a known migration service account.", "Source IP matched approved cloud migration vendor range.", "No successful sign-in occurred due to policy block."],
"investigation_steps": ["Review service account inventory.", "Check change ticket for migration activity.", "Validate source IP against vendor allowlist."],
"conclusion": {"verdict": "false_positive", "reason": "Expected migration tool behavior with policy block and approved change window.", "recommended_actions": ["Tune alert suppression for approved migration windows."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-LEGACY-AUTH"], "cases": []},
"lessons_learned": ["Service account context is essential before escalating legacy auth alerts."],
"tags": ["o365", "login", "false-positive", "legacy-auth"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-1003",
"title": "Suspicious inbox rule creation after successful foreign login",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_suspicious_inbox_rule_after_login",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-06T19:20:00+08:00", "end": "2026-04-06T20:45:00+08:00"},
"summary": "An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.",
"alert_source": "Microsoft Defender for Cloud Apps",
"entities": {"users": ["emma@corp.example"], "hosts": ["WS-EMMA-07"], "mailboxes": ["emma@corp.example"]},
"observables": {"ips": ["198.51.100.98"], "domains": [], "urls": [], "hashes": []},
"evidence": ["Successful sign-in from untrusted ASN.", "Inbox rule moved wire transfer emails to RSS Feeds folder.", "Mailbox audit showed rule creation minutes after login."],
"investigation_steps": ["Review mailbox audit logs.", "Export suspicious inbox rules.", "Check for OAuth app consent and forwarding settings."],
"conclusion": {"verdict": "true_positive", "reason": "Account compromise indicators plus malicious inbox rule persistence.", "recommended_actions": ["Remove malicious rules.", "Reset account and revoke refresh tokens."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-INBOX-RULE-ABUSE", "KB-O365-IMPOSSIBLE-TRAVEL"], "cases": []},
"lessons_learned": ["Mailbox rule inspection should be default for suspicious O365 login cases."],
"tags": ["o365", "login", "inbox-rule", "account-compromise"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-1004",
"title": "Multiple failed logins from residential proxy but no successful access",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_password_spray_attempt",
"severity": "medium",
"status": "pending",
"time_window": {"start": "2026-04-08T02:00:00+08:00", "end": "2026-04-08T03:10:00+08:00"},
"summary": "Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["frank@corp.example"], "hosts": [], "mailboxes": ["frank@corp.example"]},
"observables": {"ips": ["203.0.113.201"], "domains": [], "urls": [], "hashes": []},
"evidence": ["High-volume failed attempts over a short period.", "Source IP attributed to a residential proxy provider.", "No matching successful sign-in or MFA event found."],
"investigation_steps": ["Check password spray pattern across tenant.", "Confirm user recent password reset history.", "Review conditional access outcomes."],
"conclusion": {"verdict": "uncertain", "reason": "Suspicious authentication pattern but no confirmed access or downstream activity.", "recommended_actions": ["Monitor account closely.", "Consider temporary sign-in risk remediation."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-IMPOSSIBLE-TRAVEL"], "cases": []},
"lessons_learned": ["Pending cases should still capture reusable spray indicators without overcommitting verdict."],
"tags": ["o365", "login", "password-spray", "pending"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-1005",
"title": "Traveling executive triggered impossible travel but activity was legitimate",
"scenario": "o365_suspicious_login",
"alert_type": "azuread_impossible_travel",
"severity": "medium",
"status": "false_positive",
"time_window": {"start": "2026-04-09T09:00:00+08:00", "end": "2026-04-09T09:40:00+08:00"},
"summary": "Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.",
"alert_source": "Microsoft Entra ID",
"entities": {"users": ["grace@corp.example"], "hosts": ["VIP-LAPTOP-01"], "mailboxes": ["grace@corp.example"]},
"observables": {"ips": ["192.0.2.90", "203.0.113.77"], "domains": [], "urls": [], "hashes": []},
"evidence": ["Approved travel request existed.", "One login originated from corporate VPN exit node.", "Device and user agent were consistent with known user profile."],
"investigation_steps": ["Check travel approval and itinerary.", "Review VPN egress mapping.", "Compare user agent and managed device posture."],
"conclusion": {"verdict": "false_positive", "reason": "Legitimate travel combined with VPN routing caused impossible travel signal.", "recommended_actions": ["Document travel context and improve analyst checklist."]},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "kb": ["KB-O365-IMPOSSIBLE-TRAVEL"], "cases": []},
"lessons_learned": ["Impossible travel should consider approved travel and VPN topology before escalation."],
"tags": ["o365", "login", "false-positive", "travel"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-0001",
"title": "Finance user received invoice-themed phishing email",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-01T09:10:00+08:00", "end": "2026-04-01T11:30:00+08:00"},
"summary": "Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["alice@corp.example"], "hosts": ["FIN-LAPTOP-12"], "mailboxes": ["alice@corp.example"]},
"observables": {"sender_emails": ["billing@vendor-payments.com"], "domains": ["vendor-payments.com", "vendor-payments-login.com"], "urls": ["https://vendor-payments-login.com/review"], "ips": ["198.51.100.20"], "hashes": ["sha256:phish0001"]},
"evidence": ["Sender domain was newly observed and failed DMARC.", "Attachment redirected to a fake Microsoft 365 login page.", "User clicked the link before mail quarantine completed."],
"investigation_steps": ["Validate sender authentication results.", "Detonate HTML attachment in sandbox.", "Check mailbox click telemetry and account sign-in logs."],
"conclusion": {"verdict": "true_positive", "reason": "Aligned phishing indicators and confirmed click behavior.", "recommended_actions": ["Reset impacted account password.", "Block sender domain and landing URL.", "Hunt for similar emails in tenant."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-PHISH-HEADER-CHECK", "KB-CRED-HARVEST-PATTERNS"], "cases": []},
"lessons_learned": ["Invoice lure remains effective against finance users."],
"tags": ["phishing", "email", "credential-harvest", "finance"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-0002",
"title": "Payroll notification email flagged but determined benign",
"scenario": "phishing",
"alert_type": "mail_suspicious_link",
"severity": "medium",
"status": "false_positive",
"time_window": {"start": "2026-04-03T08:40:00+08:00", "end": "2026-04-03T09:20:00+08:00"},
"summary": "Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["bob@corp.example"], "hosts": ["HR-LAPTOP-03"], "mailboxes": ["bob@corp.example"]},
"observables": {"sender_emails": ["notify@hr-vendor.example"], "domains": ["hr-vendor.example"], "urls": ["https://bit.ly/hr-portal-example"], "ips": [], "hashes": []},
"evidence": ["Sender domain aligned with SPF and DKIM.", "Destination domain matched approved supplier inventory.", "No credential prompt anomaly observed."],
"investigation_steps": ["Expand shortened URL.", "Validate vendor domain against allowlist.", "Review prior communication pattern with HR users."],
"conclusion": {"verdict": "false_positive", "reason": "Trusted vendor communication with expected destination.", "recommended_actions": ["Tune mail rule to reduce noisy alerts for approved HR vendor."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-PHISH-HEADER-CHECK"], "cases": []},
"lessons_learned": ["Short URLs alone should not drive phishing conclusion without destination validation."],
"tags": ["phishing", "email", "false-positive", "vendor"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-0003",
"title": "Executive impersonation email requested urgent wire transfer",
"scenario": "phishing",
"alert_type": "mail_bec_impersonation",
"severity": "high",
"status": "confirmed",
"time_window": {"start": "2026-04-05T13:15:00+08:00", "end": "2026-04-05T15:00:00+08:00"},
"summary": "An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["carol@corp.example"], "hosts": ["FIN-LAPTOP-08"], "mailboxes": ["carol@corp.example"]},
"observables": {"sender_emails": ["ceo@c0rp-example.com"], "domains": ["c0rp-example.com"], "urls": [], "ips": ["203.0.113.45"], "hashes": []},
"evidence": ["Lookalike domain used numeric substitution.", "Language pressure matched prior BEC pattern.", "No historical communication from sender domain."],
"investigation_steps": ["Compare sender domain with corporate domain.", "Review historical communication graph.", "Confirm with executive assistant out of band."],
"conclusion": {"verdict": "true_positive", "reason": "Strong BEC indicators and confirmed spoofed sender identity.", "recommended_actions": ["Block sender domain.", "Notify finance team and update awareness content."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-CRED-HARVEST-PATTERNS"], "cases": []},
"lessons_learned": ["Lookalike domains need strong entity normalization in retrieval and detection logic."],
"tags": ["phishing", "bec", "executive-impersonation"]
}

View File

@ -0,0 +1,19 @@
{
"case_id": "CASE-2026-0004",
"title": "Shared mailbox received OneDrive lure with HTML attachment",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "medium",
"status": "confirmed",
"time_window": {"start": "2026-04-07T10:00:00+08:00", "end": "2026-04-07T12:05:00+08:00"},
"summary": "Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.",
"alert_source": "Secure Email Gateway",
"entities": {"users": ["shared-finance@corp.example"], "hosts": [], "mailboxes": ["shared-finance@corp.example"]},
"observables": {"sender_emails": ["noreply@sharepoint-notify.com"], "domains": ["sharepoint-notify.com"], "urls": ["https://onedrive-review-login.example"], "ips": ["198.51.100.87"], "hashes": ["sha256:phish0004"]},
"evidence": ["Attachment rendered a fake Microsoft sign-in page.", "Landing page hosted outside Microsoft IP space.", "Mail body reused branding from previous phishing campaign."],
"investigation_steps": ["Render attachment safely.", "Review URL hosting provider reputation.", "Search tenant for same subject and sender."],
"conclusion": {"verdict": "true_positive", "reason": "Credential harvesting lure with campaign reuse indicators.", "recommended_actions": ["Block sender and URL.", "Search and purge duplicate emails."]},
"related_refs": {"playbooks": ["PB-PHISH-001"], "kb": ["KB-CRED-HARVEST-PATTERNS"], "cases": ["CASE-2026-0001"]},
"lessons_learned": ["Campaign reuse makes historical phishing similarity especially valuable."],
"tags": ["phishing", "email", "onedrive-lure"]
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "KB-CRED-HARVEST-PATTERNS",
"doc_type": "kb",
"title": "Credential Harvesting Indicators",
"scenario": "phishing",
"summary": "Common indicators that a phishing case involves credential harvesting rather than simple spam or benign mail.",
"applicability": ["mail_suspicious_attachment", "mail_suspicious_link"],
"key_points": ["Landing page mimics Microsoft 365 or common SaaS login pages.", "HTML attachment often acts as a redirector rather than containing malware.", "Credential harvest campaigns frequently reuse branding and lures across tenants."],
"investigation_guidance": ["Capture full redirect chain.", "Look for post-click login anomalies in identity logs.", "Search for same lure across multiple mailboxes."],
"decision_points": ["User click plus sign-in anomaly greatly increases confidence.", "Branding reuse can help link separate phishing cases into one campaign."],
"related_entities": {"ttps": ["T1566.002"], "iocs": []},
"related_refs": {"playbooks": ["PB-PHISH-001"], "cases": []},
"tags": ["kb", "phishing", "credential-harvest"],
"updated_at": "2026-04-10T09:25:00+08:00"
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "KB-O365-IMPOSSIBLE-TRAVEL",
"doc_type": "kb",
"title": "Interpreting O365 Impossible Travel Alerts",
"scenario": "o365_suspicious_login",
"summary": "Guidance for validating impossible travel alerts, including VPN, proxy, and approved travel false-positive conditions.",
"applicability": ["azuread_impossible_travel"],
"key_points": ["Impossible travel must be validated against user travel context.", "VPN egress and cloud proxy routing are common false-positive sources.", "Pair sign-in anomaly with MFA, mailbox, or device anomalies before concluding compromise."],
"investigation_guidance": ["Validate source ASN and IP history.", "Check user-approved travel or remote work context.", "Compare device ID and user agent consistency."],
"decision_points": ["User denial of travel plus new device strongly increases confidence.", "Approved travel and trusted VPN topology reduce confidence."],
"related_entities": {"ttps": ["T1078"], "iocs": []},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "cases": []},
"tags": ["kb", "o365", "impossible-travel"],
"updated_at": "2026-04-10T09:30:00+08:00"
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "KB-O365-INBOX-RULE-ABUSE",
"doc_type": "kb",
"title": "Inbox Rule Abuse After Account Compromise",
"scenario": "o365_suspicious_login",
"summary": "Common mailbox persistence behaviors after O365 account compromise, especially rule creation to hide or forward finance emails.",
"applicability": ["azuread_suspicious_inbox_rule_after_login"],
"key_points": ["Attackers often hide financial emails using move-to-folder rules.", "Forwarding and delete rules are strong post-compromise indicators.", "Mailbox audit logs should be reviewed immediately after suspicious login confirmation."],
"investigation_guidance": ["Enumerate all inbox rules and forwarding settings.", "Check mailbox audit timeline around suspicious sign-in.", "Review OAuth consents if inbox rules are absent but suspicious mail actions continue."],
"decision_points": ["Inbox rule creation shortly after suspicious login strongly supports compromise verdict."],
"related_entities": {"ttps": ["T1114"], "iocs": []},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "cases": []},
"tags": ["kb", "o365", "inbox-rule"],
"updated_at": "2026-04-10T09:40:00+08:00"
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "KB-O365-MFA-FATIGUE",
"doc_type": "kb",
"title": "MFA Fatigue Detection Notes",
"scenario": "o365_suspicious_login",
"summary": "Patterns for identifying MFA fatigue / push bombing during account compromise attempts.",
"applicability": ["azuread_impossible_travel", "azuread_suspicious_login"],
"key_points": ["Repeated MFA prompts preceding one successful prompt is suspicious.", "User-reported prompt fatigue is strong supporting evidence.", "MFA fatigue is often coupled with credential theft rather than password spray alone."],
"investigation_guidance": ["Review MFA event counts and timing.", "Check if the user acknowledged unexpected prompts.", "Look for subsequent session hijacking or mailbox abuse."],
"decision_points": ["Prompt flood plus user denial usually warrants immediate containment."],
"related_entities": {"ttps": ["T1621"], "iocs": []},
"related_refs": {"playbooks": ["PB-O365-LOGIN-001"], "cases": []},
"tags": ["kb", "o365", "mfa-fatigue"],
"updated_at": "2026-04-10T09:35:00+08:00"
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "KB-PHISH-HEADER-CHECK",
"doc_type": "kb",
"title": "Phishing Header Validation Checklist",
"scenario": "phishing",
"summary": "Checklist for validating sender identity, domain reputation, and authentication results in suspected phishing emails.",
"applicability": ["mail_suspicious_attachment", "mail_suspicious_link", "mail_bec_impersonation"],
"key_points": ["Review SPF, DKIM, and DMARC alignment.", "Compare display name, envelope sender, and reply-to anomalies.", "Check domain age and known-good communication history."],
"investigation_guidance": ["Use message trace and header parser.", "Compare sender domain with vendor allowlist.", "Escalate lookalike domains even when content appears business-relevant."],
"decision_points": ["Newly observed domains with failed auth are high-risk.", "Benign vendor mail often has consistent historical sending patterns."],
"related_entities": {"ttps": ["T1566.001"], "iocs": []},
"related_refs": {"playbooks": ["PB-PHISH-001"], "cases": []},
"tags": ["kb", "phishing", "email-header"],
"updated_at": "2026-04-10T09:20:00+08:00"
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "PB-O365-LOGIN-001",
"doc_type": "playbook",
"title": "O365 Suspicious Login Investigation Playbook",
"scenario": "o365_suspicious_login",
"summary": "Standard investigation steps for suspicious Entra ID sign-ins, impossible travel, MFA abuse, and follow-on mailbox abuse.",
"applicability": ["azuread_impossible_travel", "azuread_legacy_auth_attempt", "azuread_suspicious_inbox_rule_after_login", "azuread_password_spray_attempt"],
"key_points": ["Confirm user travel and business context.", "Review sign-in logs, device IDs, and user agents.", "Inspect downstream actions such as inbox rules, app consent, and forwarding."],
"investigation_guidance": ["Correlate MFA telemetry with sign-in sequence.", "Check risky sign-ins and risky users views.", "Revoke sessions and reset credentials when compromise is confirmed."],
"decision_points": ["Impossible travel alone is insufficient without corroborating evidence.", "Inbox rule creation after foreign login strongly increases confidence of compromise."],
"related_entities": {"ttps": ["T1078"], "iocs": []},
"related_refs": {"kb": ["KB-O365-IMPOSSIBLE-TRAVEL", "KB-O365-MFA-FATIGUE", "KB-O365-INBOX-RULE-ABUSE"], "cases": []},
"tags": ["playbook", "o365", "login"],
"updated_at": "2026-04-10T09:10:00+08:00"
}

View File

@ -0,0 +1,15 @@
{
"doc_id": "PB-PHISH-001",
"doc_type": "playbook",
"title": "Phishing Email Investigation Playbook",
"scenario": "phishing",
"summary": "Standard investigation steps for suspicious email, credential harvesting, and BEC-like cases.",
"applicability": ["mail_suspicious_attachment", "mail_suspicious_link", "mail_bec_impersonation"],
"key_points": ["Validate sender authentication results.", "Inspect landing URL and attachment behavior.", "Check whether the user clicked or submitted credentials."],
"investigation_guidance": ["Query email telemetry for same sender, subject, or URL.", "Review mailbox click logs and endpoint browser artifacts.", "Reset credentials if submission is suspected."],
"decision_points": ["If sender auth fails and user interaction exists, treat as likely phishing.", "If destination is allowlisted and communication pattern is expected, investigate false positive path."],
"related_entities": {"ttps": ["T1566"], "iocs": []},
"related_refs": {"kb": ["KB-PHISH-HEADER-CHECK", "KB-CRED-HARVEST-PATTERNS"], "cases": []},
"tags": ["playbook", "phishing", "email"],
"updated_at": "2026-04-10T09:00:00+08:00"
}

View File

@ -0,0 +1,65 @@
{
"id": "CASE-2026-0001",
"memory_type": "case",
"scenario": "phishing",
"title": "Finance user received invoice-themed phishing email",
"abstract": "Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"alice@corp.example"
],
"hosts": [
"FIN-LAPTOP-12"
],
"mailboxes": [
"alice@corp.example"
]
},
"observables": {
"sender_emails": [
"billing@vendor-payments.com"
],
"domains": [
"vendor-payments.com",
"vendor-payments-login.com"
],
"urls": [
"https://vendor-payments-login.com/review"
],
"ips": [
"198.51.100.20"
],
"hashes": [
"sha256:phish0001"
]
},
"evidence": [
"Sender domain was newly observed and failed DMARC.",
"Attachment redirected to a fake Microsoft 365 login page.",
"User clicked the link before mail quarantine completed."
],
"patterns": [
"verdict:true_positive",
"scenario:phishing",
"alert_type:mail_suspicious_attachment"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-PHISH-HEADER-CHECK",
"KB-CRED-HARVEST-PATTERNS"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0001.json",
"tags": [
"phishing",
"email",
"credential-harvest",
"finance"
]
}

View File

@ -0,0 +1,59 @@
{
"id": "CASE-2026-0002",
"memory_type": "case",
"scenario": "phishing",
"title": "Payroll notification email flagged but determined benign",
"abstract": "Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.",
"verdict": "false_positive",
"severity": "medium",
"entities": {
"users": [
"bob@corp.example"
],
"hosts": [
"HR-LAPTOP-03"
],
"mailboxes": [
"bob@corp.example"
]
},
"observables": {
"sender_emails": [
"notify@hr-vendor.example"
],
"domains": [
"hr-vendor.example"
],
"urls": [
"https://bit.ly/hr-portal-example"
],
"ips": [],
"hashes": []
},
"evidence": [
"Sender domain aligned with SPF and DKIM.",
"Destination domain matched approved supplier inventory.",
"No credential prompt anomaly observed."
],
"patterns": [
"verdict:false_positive",
"scenario:phishing",
"alert_type:mail_suspicious_link"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-PHISH-HEADER-CHECK"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0002.json",
"tags": [
"phishing",
"email",
"false-positive",
"vendor"
]
}

View File

@ -0,0 +1,58 @@
{
"id": "CASE-2026-0003",
"memory_type": "case",
"scenario": "phishing",
"title": "Executive impersonation email requested urgent wire transfer",
"abstract": "An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"carol@corp.example"
],
"hosts": [
"FIN-LAPTOP-08"
],
"mailboxes": [
"carol@corp.example"
]
},
"observables": {
"sender_emails": [
"ceo@c0rp-example.com"
],
"domains": [
"c0rp-example.com"
],
"urls": [],
"ips": [
"203.0.113.45"
],
"hashes": []
},
"evidence": [
"Lookalike domain used numeric substitution.",
"Language pressure matched prior BEC pattern.",
"No historical communication from sender domain."
],
"patterns": [
"verdict:true_positive",
"scenario:phishing",
"alert_type:mail_bec_impersonation"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-CRED-HARVEST-PATTERNS"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0003.json",
"tags": [
"phishing",
"bec",
"executive-impersonation"
]
}

View File

@ -0,0 +1,62 @@
{
"id": "CASE-2026-0004",
"memory_type": "case",
"scenario": "phishing",
"title": "Shared mailbox received OneDrive lure with HTML attachment",
"abstract": "Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.",
"verdict": "true_positive",
"severity": "medium",
"entities": {
"users": [
"shared-finance@corp.example"
],
"hosts": [],
"mailboxes": [
"shared-finance@corp.example"
]
},
"observables": {
"sender_emails": [
"noreply@sharepoint-notify.com"
],
"domains": [
"sharepoint-notify.com"
],
"urls": [
"https://onedrive-review-login.example"
],
"ips": [
"198.51.100.87"
],
"hashes": [
"sha256:phish0004"
]
},
"evidence": [
"Attachment rendered a fake Microsoft sign-in page.",
"Landing page hosted outside Microsoft IP space.",
"Mail body reused branding from previous phishing campaign."
],
"patterns": [
"verdict:true_positive",
"scenario:phishing",
"alert_type:mail_suspicious_attachment"
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"kb": [
"KB-CRED-HARVEST-PATTERNS"
],
"cases": [
"CASE-2026-0001"
]
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/phishing/CASE-2026-0004.json",
"tags": [
"phishing",
"email",
"onedrive-lure"
]
}

View File

@ -0,0 +1,56 @@
{
"id": "CASE-2026-1001",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Impossible travel login followed by MFA prompt fatigue",
"abstract": "User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"david@corp.example"
],
"hosts": [
"WS-DAVID-01"
],
"mailboxes": [
"david@corp.example"
]
},
"observables": {
"ips": [
"203.0.113.150",
"198.51.100.61"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"Two successful sign-ins from geographically impossible locations within 15 minutes.",
"MFA challenge volume increased abnormally before final success.",
"User confirmed they did not initiate overseas login."
],
"patterns": [
"verdict:true_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_impossible_travel"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL",
"KB-O365-MFA-FATIGUE"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1001.json",
"tags": [
"o365",
"login",
"impossible-travel",
"mfa-fatigue"
]
}

View File

@ -0,0 +1,52 @@
{
"id": "CASE-2026-1002",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Legacy protocol sign-in from unfamiliar IP blocked by policy",
"abstract": "Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.",
"verdict": "false_positive",
"severity": "medium",
"entities": {
"users": [
"svc-migration@corp.example"
],
"hosts": [],
"mailboxes": [
"svc-migration@corp.example"
]
},
"observables": {
"ips": [
"192.0.2.24"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"The account is a known migration service account.",
"Source IP matched approved cloud migration vendor range.",
"No successful sign-in occurred due to policy block."
],
"patterns": [
"verdict:false_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_legacy_auth_attempt"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-LEGACY-AUTH"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1002.json",
"tags": [
"o365",
"login",
"false-positive",
"legacy-auth"
]
}

View File

@ -0,0 +1,55 @@
{
"id": "CASE-2026-1003",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Suspicious inbox rule creation after successful foreign login",
"abstract": "An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.",
"verdict": "true_positive",
"severity": "high",
"entities": {
"users": [
"emma@corp.example"
],
"hosts": [
"WS-EMMA-07"
],
"mailboxes": [
"emma@corp.example"
]
},
"observables": {
"ips": [
"198.51.100.98"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"Successful sign-in from untrusted ASN.",
"Inbox rule moved wire transfer emails to RSS Feeds folder.",
"Mailbox audit showed rule creation minutes after login."
],
"patterns": [
"verdict:true_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_suspicious_inbox_rule_after_login"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-INBOX-RULE-ABUSE",
"KB-O365-IMPOSSIBLE-TRAVEL"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1003.json",
"tags": [
"o365",
"login",
"inbox-rule",
"account-compromise"
]
}

View File

@ -0,0 +1,52 @@
{
"id": "CASE-2026-1004",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Multiple failed logins from residential proxy but no successful access",
"abstract": "Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.",
"verdict": "uncertain",
"severity": "medium",
"entities": {
"users": [
"frank@corp.example"
],
"hosts": [],
"mailboxes": [
"frank@corp.example"
]
},
"observables": {
"ips": [
"203.0.113.201"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"High-volume failed attempts over a short period.",
"Source IP attributed to a residential proxy provider.",
"No matching successful sign-in or MFA event found."
],
"patterns": [
"verdict:uncertain",
"scenario:o365_suspicious_login",
"alert_type:azuread_password_spray_attempt"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1004.json",
"tags": [
"o365",
"login",
"password-spray",
"pending"
]
}

View File

@ -0,0 +1,55 @@
{
"id": "CASE-2026-1005",
"memory_type": "case",
"scenario": "o365_suspicious_login",
"title": "Traveling executive triggered impossible travel but activity was legitimate",
"abstract": "Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.",
"verdict": "false_positive",
"severity": "medium",
"entities": {
"users": [
"grace@corp.example"
],
"hosts": [
"VIP-LAPTOP-01"
],
"mailboxes": [
"grace@corp.example"
]
},
"observables": {
"ips": [
"192.0.2.90",
"203.0.113.77"
],
"domains": [],
"urls": [],
"hashes": []
},
"evidence": [
"Approved travel request existed.",
"One login originated from corporate VPN exit node.",
"Device and user agent were consistent with known user profile."
],
"patterns": [
"verdict:false_positive",
"scenario:o365_suspicious_login",
"alert_type:azuread_impossible_travel"
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_cases/o365_suspicious_login/CASE-2026-1005.json",
"tags": [
"o365",
"login",
"false-positive",
"travel"
]
}

View File

@ -0,0 +1,34 @@
{
"id": "KB-CRED-HARVEST-PATTERNS",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "phishing",
"title": "Credential Harvesting Indicators",
"abstract": "Common indicators that a phishing case involves credential harvesting rather than simple spam or benign mail.",
"key_points": [
"Landing page mimics Microsoft 365 or common SaaS login pages.",
"HTML attachment often acts as a redirector rather than containing malware.",
"Credential harvest campaigns frequently reuse branding and lures across tenants."
],
"investigation_guidance": [
"Capture full redirect chain.",
"Look for post-click login anomalies in identity logs.",
"Search for same lure across multiple mailboxes."
],
"decision_points": [
"User click plus sign-in anomaly greatly increases confidence.",
"Branding reuse can help link separate phishing cases into one campaign."
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-CRED-HARVEST-PATTERNS.json",
"tags": [
"kb",
"phishing",
"credential-harvest"
]
}

View File

@ -0,0 +1,34 @@
{
"id": "KB-O365-IMPOSSIBLE-TRAVEL",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "o365_suspicious_login",
"title": "Interpreting O365 Impossible Travel Alerts",
"abstract": "Guidance for validating impossible travel alerts, including VPN, proxy, and approved travel false-positive conditions.",
"key_points": [
"Impossible travel must be validated against user travel context.",
"VPN egress and cloud proxy routing are common false-positive sources.",
"Pair sign-in anomaly with MFA, mailbox, or device anomalies before concluding compromise."
],
"investigation_guidance": [
"Validate source ASN and IP history.",
"Check user-approved travel or remote work context.",
"Compare device ID and user agent consistency."
],
"decision_points": [
"User denial of travel plus new device strongly increases confidence.",
"Approved travel and trusted VPN topology reduce confidence."
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-O365-IMPOSSIBLE-TRAVEL.json",
"tags": [
"kb",
"o365",
"impossible-travel"
]
}

View File

@ -0,0 +1,33 @@
{
"id": "KB-O365-INBOX-RULE-ABUSE",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "o365_suspicious_login",
"title": "Inbox Rule Abuse After Account Compromise",
"abstract": "Common mailbox persistence behaviors after O365 account compromise, especially rule creation to hide or forward finance emails.",
"key_points": [
"Attackers often hide financial emails using move-to-folder rules.",
"Forwarding and delete rules are strong post-compromise indicators.",
"Mailbox audit logs should be reviewed immediately after suspicious login confirmation."
],
"investigation_guidance": [
"Enumerate all inbox rules and forwarding settings.",
"Check mailbox audit timeline around suspicious sign-in.",
"Review OAuth consents if inbox rules are absent but suspicious mail actions continue."
],
"decision_points": [
"Inbox rule creation shortly after suspicious login strongly supports compromise verdict."
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-O365-INBOX-RULE-ABUSE.json",
"tags": [
"kb",
"o365",
"inbox-rule"
]
}

View File

@ -0,0 +1,33 @@
{
"id": "KB-O365-MFA-FATIGUE",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "o365_suspicious_login",
"title": "MFA Fatigue Detection Notes",
"abstract": "Patterns for identifying MFA fatigue / push bombing during account compromise attempts.",
"key_points": [
"Repeated MFA prompts preceding one successful prompt is suspicious.",
"User-reported prompt fatigue is strong supporting evidence.",
"MFA fatigue is often coupled with credential theft rather than password spray alone."
],
"investigation_guidance": [
"Review MFA event counts and timing.",
"Check if the user acknowledged unexpected prompts.",
"Look for subsequent session hijacking or mailbox abuse."
],
"decision_points": [
"Prompt flood plus user denial usually warrants immediate containment."
],
"related_refs": {
"playbooks": [
"PB-O365-LOGIN-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-O365-MFA-FATIGUE.json",
"tags": [
"kb",
"o365",
"mfa-fatigue"
]
}

View File

@ -0,0 +1,34 @@
{
"id": "KB-PHISH-HEADER-CHECK",
"memory_type": "knowledge",
"doc_type": "kb",
"scenario": "phishing",
"title": "Phishing Header Validation Checklist",
"abstract": "Checklist for validating sender identity, domain reputation, and authentication results in suspected phishing emails.",
"key_points": [
"Review SPF, DKIM, and DMARC alignment.",
"Compare display name, envelope sender, and reply-to anomalies.",
"Check domain age and known-good communication history."
],
"investigation_guidance": [
"Use message trace and header parser.",
"Compare sender domain with vendor allowlist.",
"Escalate lookalike domains even when content appears business-relevant."
],
"decision_points": [
"Newly observed domains with failed auth are high-risk.",
"Benign vendor mail often has consistent historical sending patterns."
],
"related_refs": {
"playbooks": [
"PB-PHISH-001"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/kb/KB-PHISH-HEADER-CHECK.json",
"tags": [
"kb",
"phishing",
"email-header"
]
}

View File

@ -0,0 +1,36 @@
{
"id": "PB-O365-LOGIN-001",
"memory_type": "knowledge",
"doc_type": "playbook",
"scenario": "o365_suspicious_login",
"title": "O365 Suspicious Login Investigation Playbook",
"abstract": "Standard investigation steps for suspicious Entra ID sign-ins, impossible travel, MFA abuse, and follow-on mailbox abuse.",
"key_points": [
"Confirm user travel and business context.",
"Review sign-in logs, device IDs, and user agents.",
"Inspect downstream actions such as inbox rules, app consent, and forwarding."
],
"investigation_guidance": [
"Correlate MFA telemetry with sign-in sequence.",
"Check risky sign-ins and risky users views.",
"Revoke sessions and reset credentials when compromise is confirmed."
],
"decision_points": [
"Impossible travel alone is insufficient without corroborating evidence.",
"Inbox rule creation after foreign login strongly increases confidence of compromise."
],
"related_refs": {
"kb": [
"KB-O365-IMPOSSIBLE-TRAVEL",
"KB-O365-MFA-FATIGUE",
"KB-O365-INBOX-RULE-ABUSE"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/playbooks/PB-O365-LOGIN-001.json",
"tags": [
"playbook",
"o365",
"login"
]
}

View File

@ -0,0 +1,35 @@
{
"id": "PB-PHISH-001",
"memory_type": "knowledge",
"doc_type": "playbook",
"scenario": "phishing",
"title": "Phishing Email Investigation Playbook",
"abstract": "Standard investigation steps for suspicious email, credential harvesting, and BEC-like cases.",
"key_points": [
"Validate sender authentication results.",
"Inspect landing URL and attachment behavior.",
"Check whether the user clicked or submitted credentials."
],
"investigation_guidance": [
"Query email telemetry for same sender, subject, or URL.",
"Review mailbox click logs and endpoint browser artifacts.",
"Reset credentials if submission is suspected."
],
"decision_points": [
"If sender auth fails and user interaction exists, treat as likely phishing.",
"If destination is allowlisted and communication pattern is expected, investigate false positive path."
],
"related_refs": {
"kb": [
"KB-PHISH-HEADER-CHECK",
"KB-CRED-HARVEST-PATTERNS"
],
"cases": []
},
"source_path": "/home/tom/soc_memory_poc/evaluation/datasets/mock_kb/playbooks/PB-PHISH-001.json",
"tags": [
"playbook",
"phishing",
"email"
]
}

9
evermemos/README.md Normal file
View File

@ -0,0 +1,9 @@
# EverMemOS Layer
这个目录用于保存长期记忆整理层的工作逻辑。
主要职责:
- 从 episode / process memory 中抽取长期记忆
- 去重、合并、更新、衰减
- 反哺 OpenViking 和 Obsidian

View File

@ -0,0 +1,314 @@
---
name: soc-memory-poc
description: Load this skill whenever Hermes is handling SOC alert triage, phishing investigation, suspicious O365 login analysis, historical case lookup, Obsidian note lookup, case-note generation, or committing high-value SOC findings into the SOC Memory POC. It provides a strict triage workflow using the SOC Memory Gateway for search/write operations, local Obsidian vault search, and local SOC Memory POC scripts for Obsidian case note generation.
version: 1.3.0
metadata:
hermes:
tags: [soc, memory, openviking, obsidian, incident-response, case-triage, phishing, o365]
related_skills: [hermes-agent]
---
# SOC Memory POC
Use this skill for SOC case workflows only. It is the default procedure for phishing-style alerts, suspicious O365 / Entra ID login cases, historical case comparison, Obsidian knowledge lookup, and case-note generation.
## Mandatory Trigger Rule
Load this skill immediately when the user asks Hermes to do any of the following:
- investigate or triage a SOC alert
- find similar phishing or O365 suspicious-login cases
- retrieve related KB or playbook context before concluding a case
- check whether Obsidian already has a related case note or knowledge note
- generate an Obsidian case note from a normalized case
- commit a normalized case or knowledge artifact into the SOC memory system
If the task is clearly SOC triage related, do not proceed without using this skill.
## What This Skill Connects To
This skill assumes:
- SOC Memory POC root: `/home/tom/soc_memory_poc`
- Memory Gateway URL: `http://127.0.0.1:1934`
- Gateway API key: empty by default unless configured otherwise
- Obsidian vault root: `/home/tom/soc_memory_poc/obsidian-vault`
Override with environment variables when needed:
- `SOC_MEMORY_POC_ROOT`
- `SOC_MEMORY_GATEWAY_URL`
- `SOC_MEMORY_GATEWAY_API_KEY`
Capabilities:
- search SOC case / knowledge context through the Memory Gateway
- search existing Obsidian notes by case ID, scenario, keywords, or tags
- commit normalized case / knowledge JSON through the Memory Gateway
- generate Obsidian case notes from normalized case JSON
## Triage Workflow
Follow this order unless the user explicitly asks for something narrower.
### Preferred Path For Structured Alerts (Scheme A)
If the user provides a structured alert summary with fields like user, host, sender, subject, attachment, URL, IP, alert type, or known facts, do **not** manually improvise the final answer from memory search results alone.
Use the deterministic triage helper first:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_alert.py \
--scenario phishing \
--alert-type mail_suspicious_attachment \
--user alice@corp.example \
--host FIN-LAPTOP-12 \
--sender billing@vendor-payments.com \
--subject "Invoice overdue notice" \
--attachment invoice_review.html \
--url https://vendor-payments-login.com/review \
--ip 198.51.100.20 \
--summary "Invoice-themed phishing email with HTML attachment and credential harvesting link" \
--fact "DMARC failed" \
--fact "User may have clicked the link"
```
This script performs:
- case retrieval from the SOC Memory Gateway
- knowledge retrieval from the SOC Memory Gateway
- Obsidian note lookup from the local vault
- final markdown rendering with all required sections populated
For scheme A, prefer returning the script output with only light cleanup. Do not drop the `关联 Memory Retrieval` or `关联 Obsidian 文档` sections.
### Preferred Path For Freeform Alerts Or Raw Email Content
If the user does **not** provide neatly separated fields, or pastes raw email content / ticket text / freeform alert text, do not force them into Scheme A manually.
Use the unified triage helper:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
To: alice@corp.example
Subject: Invoice overdue notice
Attachment: invoice_review.html
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
```
Or point it at a file:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --file /path/to/raw_email.txt
```
This helper will:
- infer the most likely scenario and alert type
- extract sender, user, subject, attachment, URL, IP, and host when possible
- carry over important facts like DMARC failure, user click, MFA fatigue, inbox rule, or OAuth consent
- run the deterministic triage pipeline so the final answer still contains `关联 Memory Retrieval` and `关联 Obsidian 文档`
For non-structured input, prefer this helper over freehand reasoning.
For all SOC triage inputs, `triage_email.py` is the preferred single entrypoint. It accepts raw text, a file, or optional structured overrides, then calls the deterministic retrieval pipeline.
### Phase 1: Ground The Case
First identify:
- scenario: `phishing`, `o365_suspicious_login`, or another SOC scenario
- likely alert type
- short case summary in one sentence
- key observables if available: sender, URL, domain, IP, mailbox, user, hash
Do not start by writing memory. Start by grounding the case.
### Phase 2: Retrieve Memory Context Before Judging
Before concluding the case, search both related history and related knowledge.
1. Search similar historical cases.
2. Search KB / playbook context.
3. Compare the current case against what comes back.
Run these separately for better precision.
Case search example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
--query "invoice phishing html attachment credential harvesting" \
--kind case --limit 5
```
Knowledge search example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
--query "invoice phishing html attachment credential harvesting" \
--kind knowledge --limit 5
```
O365 example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
--query "impossible travel MFA fatigue inbox rule oauth consent" \
--kind knowledge --limit 5
```
Search scopes:
- `case` -> `viking://resources/soc-memory-poc/case`
- `knowledge` -> `viking://resources/soc-memory-poc/knowledge`
- `all` -> `viking://resources/soc-memory-poc`
### Phase 3: Retrieve Obsidian References
After memory retrieval, look for related notes in the Obsidian vault so the final answer can reference existing human-readable documentation.
Example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_obsidian_docs.py \
--query "invoice phishing html attachment credential harvesting" \
--scenario phishing \
--limit 5
```
Use this to surface:
- existing case notes
- related scenario notes
- notes whose names, tags, or content closely match the current case
When reporting Obsidian references, include at least:
- note title or file name
- relative path under `obsidian-vault/`
- why the note is relevant
### Phase 4: Produce The Triage Output
After retrieval, synthesize a result that includes:
- likely verdict or current assessment
- strongest evidence
- closest matching historical cases
- most relevant KB / playbook guidance
- related Obsidian notes
- recommended next investigation or response actions
Do not just paste raw search output. Summarize why the returned items matter.
## Final Output Template
Unless the user asks for a different format, use this structure for final SOC triage answers:
### 研判结果
- one short paragraph with the likely verdict / current assessment
### 关键证据
- 2 to 5 flat bullets with the strongest evidence
### 关联 Memory Retrieval
- one flat bullet per retrieved case / knowledge item
- include: ID + short relevance reason
- example: `CASE-2026-0001`: same invoice lure + HTML attachment + credential harvesting flow
### 关联 Obsidian 文档
- one flat bullet per note
- include: note name + relative path + one-line relevance reason
- example: `CASE-2026-0001 - Finance user ...md``02_Cases/phishing/...` — already documents a near-identical phishing pattern
### 建议动作
- 2 to 5 flat bullets with next investigation or response steps
If no Obsidian note matches, explicitly say `未找到直接关联的 Obsidian 文档`.
### Phase 5: Generate Case Note When The Case Is Mature Enough
If the task includes documenting the result, or the case already has a normalized JSON artifact, generate an Obsidian case note.
Example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/generate_case_note.py \
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json \
--enrich-from-openviking \
--top-k 3
```
This writes under `obsidian-vault/02_Cases/<scenario>/`.
Use `--enrich-from-openviking` by default when the gateway is available.
### Phase 6: Commit Only High-Value Artifacts
If Hermes has a normalized case or knowledge JSON that is worth preserving, commit it through the Gateway.
Example:
```bash
python /home/tom/.hermes/skills/soc-memory-poc/scripts/commit_case_memory.py \
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json
```
Only commit normalized, reusable artifacts. Do not commit raw logs, raw tool traces, or ad hoc chat text.
## Recommended Defaults By Scenario
### Phishing
Default order:
1. search `case`
2. search `knowledge`
3. search related Obsidian notes
4. assess sender auth, lure type, landing page, user interaction
5. generate case note if the case is already structured
6. commit only if the case artifact is normalized and high value
Good query ingredients:
- lure theme
- attachment type
- credential harvesting
- fake M365 login
- sender domain
- landing URL pattern
### O365 Suspicious Login
Default order:
1. search `case`
2. search `knowledge`
3. search related Obsidian notes
4. assess impossible travel, MFA fatigue, inbox rule abuse, OAuth consent, legacy auth
5. generate case note if the case is already structured
6. commit only if the case artifact is normalized and high value
Good query ingredients:
- impossible travel
- MFA fatigue
- inbox rule
- foreign login
- OAuth consent
- legacy protocol
## Failure Handling
If Gateway search fails:
- say explicitly that the SOC Memory Gateway is unavailable
- do not pretend retrieval succeeded
- continue with local reasoning only if the user still wants that
If Obsidian search fails:
- say explicitly that Obsidian references could not be retrieved
- do not invent note names or paths
If note generation fails:
- report the failing path or command
- do not claim the note was written
If commit fails:
- report the URI or file that failed
- do not claim the memory was stored
## Guardrails
- Search `case` and `knowledge` separately before concluding a triage result.
- Search Obsidian notes after memory retrieval so final output can point to human-readable references.
- Prefer narrow, scenario-specific queries over vague long prompts.
- Do not dump raw investigative process into memory.
- Generate case notes from normalized case JSON, not from freeform chat.
- Commit only high-value, reusable artifacts.
- When Gateway results look noisy, explain that retrieval quality may still need SOC-specific reranking.

View File

@ -0,0 +1,66 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("SOC_MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("SOC_MEMORY_GATEWAY_API_KEY", "")
def load_item(path: str | Path) -> dict[str, Any]:
with Path(path).open("r", encoding="utf-8") as f:
return json.load(f)
def build_resource_uri(item: dict[str, Any]) -> str:
memory_type = item.get("memory_type")
item_id = item["id"]
if memory_type == "case":
scenario = item.get("scenario", "general")
return f"viking://resources/soc-memory-poc/case/{scenario}/{item_id}.json"
if memory_type == "knowledge":
doc_type = item.get("doc_type", "general")
return f"viking://resources/soc-memory-poc/knowledge/{doc_type}/{item_id}.json"
raise SystemExit(f"Unsupported memory_type: {memory_type}")
def post_json(url: str, payload: dict[str, Any], api_key: str = "") -> dict[str, Any]:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=60) as resp:
return json.loads(resp.read().decode("utf-8"))
def main() -> None:
parser = argparse.ArgumentParser(description="Commit a normalized SOC case / knowledge JSON through the Memory Gateway.")
parser.add_argument("--input", required=True, help="Normalized JSON file path")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL, help="Memory Gateway base URL")
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY, help="Gateway API key if required")
args = parser.parse_args()
item = load_item(args.input)
payload = {
"uri": build_resource_uri(item),
"content": json.dumps(item, ensure_ascii=False, indent=2),
"resource_type": "json",
}
try:
result = post_json(args.gateway_url.rstrip("/") + "/api/resource", payload, api_key=args.api_key)
except urllib.error.URLError as exc:
raise SystemExit(f"Gateway resource commit failed: {exc}") from exc
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,48 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import subprocess
import sys
from pathlib import Path
DEFAULT_POC_ROOT = os.environ.get("SOC_MEMORY_POC_ROOT", "/home/tom/soc_memory_poc")
def main() -> None:
parser = argparse.ArgumentParser(description="Generate an Obsidian case note from a normalized SOC case JSON file.")
parser.add_argument("--input", required=True, help="Normalized case JSON path")
parser.add_argument("--output-dir", default=None, help="Override Obsidian output directory")
parser.add_argument("--enrich-from-openviking", action="store_true", help="Enrich with OpenViking recommendations")
parser.add_argument("--top-k", type=int, default=3, help="Recommendation count per type")
parser.add_argument("--poc-root", default=DEFAULT_POC_ROOT, help="SOC Memory POC root")
args = parser.parse_args()
poc_root = Path(args.poc_root)
script_path = poc_root / "skills" / "summarize_case_skill" / "generate_case_note.py"
if not script_path.exists():
raise SystemExit(f"SOC Memory POC summarize script not found: {script_path}")
output_dir = args.output_dir or str(poc_root / "obsidian-vault" / "02_Cases")
cmd = [
sys.executable,
str(script_path),
"--input",
args.input,
"--output-dir",
output_dir,
"--top-k",
str(args.top_k),
]
if args.enrich_from_openviking:
cmd.append("--enrich-from-openviking")
env = os.environ.copy()
existing = env.get("PYTHONPATH", "")
env["PYTHONPATH"] = str(poc_root) + (os.pathsep + existing if existing else "")
subprocess.run(cmd, check=True, env=env)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,85 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import urllib.error
import urllib.request
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("SOC_MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("SOC_MEMORY_GATEWAY_API_KEY", "")
URI_PREFIXES = {
"case": "viking://resources/soc-memory-poc/case",
"knowledge": "viking://resources/soc-memory-poc/knowledge",
"all": "viking://resources/soc-memory-poc",
}
def post_json(url: str, payload: dict[str, Any], api_key: str = "") -> dict[str, Any]:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))
def canonicalize_uri(uri: str) -> str:
if ".json/" in uri:
return uri.split(".json/", 1)[0] + ".json"
return uri
def filter_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
deduped: dict[str, dict[str, Any]] = {}
for item in results:
uri = item.get("uri") or ""
canonical = canonicalize_uri(uri)
if not canonical.startswith(prefix):
continue
score = item.get("score") or 0
payload = dict(item)
payload["uri"] = canonical
if canonical not in deduped or score > (deduped[canonical].get("score") or 0):
deduped[canonical] = payload
return sorted(deduped.values(), key=lambda entry: entry.get("score") or 0, reverse=True)
def main() -> None:
parser = argparse.ArgumentParser(description="Search SOC Memory Gateway for case / knowledge context.")
parser.add_argument("--query", required=True, help="Search query")
parser.add_argument("--kind", choices=["case", "knowledge", "all"], default="all", help="SOC resource scope")
parser.add_argument("--limit", type=int, default=5, help="Max results")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL, help="Memory Gateway base URL")
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY, help="Gateway API key if required")
args = parser.parse_args()
prefix = URI_PREFIXES[args.kind]
payload = {
"query": args.query,
"limit": max(args.limit * 5, 10),
"uri": prefix,
}
try:
result = post_json(args.gateway_url.rstrip("/") + "/api/search", payload, api_key=args.api_key)
except urllib.error.URLError as exc:
raise SystemExit(f"Gateway search failed: {exc}") from exc
raw_results = result.get("results", [])
filtered = filter_results(raw_results, prefix)
output = {
"query": args.query,
"kind": args.kind,
"uri_prefix": prefix,
"results": filtered[: args.limit],
"total": len(filtered),
}
print(json.dumps(output, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,205 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import re
from pathlib import Path
from typing import Any
DEFAULT_POC_ROOT = os.environ.get("SOC_MEMORY_POC_ROOT", "/home/tom/soc_memory_poc")
DEFAULT_VAULT_ROOT = str(Path(DEFAULT_POC_ROOT) / "obsidian-vault")
TOKEN_RE = re.compile(r"[A-Za-z0-9_./:-]+")
SKIP_DIRS = {"05_Templates"}
SKIP_FILES = {"README.md"}
def tokenize(text: str) -> list[str]:
lowered = (text or "").lower()
tokens = TOKEN_RE.findall(lowered)
return [token for token in tokens if len(token) >= 3]
def parse_frontmatter(text: str) -> tuple[dict[str, str], str]:
if not text.startswith("---\n"):
return {}, text
parts = text.split("\n---\n", 1)
if len(parts) != 2:
return {}, text
raw_frontmatter = parts[0].splitlines()[1:]
body = parts[1]
data: dict[str, str] = {}
for line in raw_frontmatter:
if ":" not in line:
continue
key, value = line.split(":", 1)
data[key.strip()] = value.strip()
return data, body
def extract_title(body: str, fallback: str) -> str:
for line in body.splitlines():
if line.startswith("# "):
return line[2:].strip()
return fallback
def extract_section_text(body: str, heading: str) -> str:
lines = body.splitlines()
marker = f"## {heading}"
collecting = False
collected: list[str] = []
for line in lines:
if line.strip() == marker:
collecting = True
continue
if collecting and line.startswith("## "):
break
if collecting:
stripped = line.strip()
if stripped:
collected.append(stripped)
return " ".join(collected[:4]).strip()
def extract_tags(body: str) -> list[str]:
tags: list[str] = []
in_tag_section = False
for line in body.splitlines():
if line.strip() == "## 标签":
in_tag_section = True
continue
if in_tag_section and line.startswith("## "):
break
if in_tag_section:
for token in re.findall(r"#[^\s,]+", line):
tags.append(token)
return tags
def score_doc(query: str, tokens: list[str], doc: dict[str, Any]) -> tuple[int, list[str]]:
score = 0
matched: list[str] = []
path_text = f"{doc['relative_path']} {doc['file_name']}".lower()
title_text = doc["title"].lower()
summary_text = doc.get("summary", "").lower()
body_text = doc.get("body", "").lower()
frontmatter_text = " ".join(f"{k}:{v}" for k, v in doc.get("frontmatter", {}).items()).lower()
tags_text = " ".join(doc.get("tags", [])).lower()
if query and query.lower() in body_text:
score += 8
matched.append(query.lower())
case_id = doc.get("frontmatter", {}).get("case_id", "")
if case_id and case_id.lower() in query.lower():
score += 80
matched.append(case_id.lower())
scenario = doc.get("frontmatter", {}).get("scenario", "")
if scenario and scenario.lower() in query.lower():
score += 20
matched.append(scenario.lower())
for token in tokens:
token_hit = False
if token in title_text:
score += 12
token_hit = True
elif token in summary_text:
score += 7
token_hit = True
elif token in path_text:
score += 6
token_hit = True
elif token in frontmatter_text:
score += 5
token_hit = True
elif token in tags_text:
score += 4
token_hit = True
elif token in body_text:
score += 1
token_hit = True
if token_hit and token not in matched:
matched.append(token)
return score, matched[:8]
def load_docs(vault_root: str | Path) -> list[dict[str, Any]]:
vault_root = Path(vault_root)
docs: list[dict[str, Any]] = []
for path in sorted(vault_root.rglob("*.md")):
rel = path.relative_to(vault_root)
if any(part in SKIP_DIRS for part in rel.parts):
continue
if path.name in SKIP_FILES:
continue
text = path.read_text(encoding="utf-8")
frontmatter, body = parse_frontmatter(text)
docs.append(
{
"file_name": path.name,
"relative_path": str(rel),
"absolute_path": str(path),
"category": rel.parts[0] if rel.parts else "",
"directory": str(rel.parent),
"frontmatter": frontmatter,
"title": extract_title(body, path.stem),
"summary": extract_section_text(body, "告警摘要") or extract_section_text(body, "Summary"),
"tags": extract_tags(body),
"body": body,
}
)
return docs
def main() -> None:
parser = argparse.ArgumentParser(description="Search Obsidian SOC notes and return matching document references.")
parser.add_argument("--query", required=True, help="Search query")
parser.add_argument("--vault-root", default=DEFAULT_VAULT_ROOT, help="Obsidian vault root")
parser.add_argument("--limit", type=int, default=5, help="Maximum results")
parser.add_argument("--scenario", default="", help="Optional scenario filter")
args = parser.parse_args()
docs = load_docs(args.vault_root)
tokens = tokenize(args.query)
results: list[dict[str, Any]] = []
for doc in docs:
scenario = doc.get("frontmatter", {}).get("scenario", "")
if args.scenario and scenario != args.scenario:
continue
score, matched_terms = score_doc(args.query, tokens, doc)
if score <= 0:
continue
results.append(
{
"score": score,
"title": doc["title"],
"file_name": doc["file_name"],
"relative_path": doc["relative_path"],
"directory": doc["directory"],
"category": doc["category"],
"scenario": scenario,
"summary": doc.get("summary", ""),
"tags": doc.get("tags", []),
"matched_terms": matched_terms,
}
)
results.sort(key=lambda item: item["score"], reverse=True)
payload = {
"query": args.query,
"vault_root": str(Path(args.vault_root)),
"matched_docs": results[: args.limit],
}
print(json.dumps(payload, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,282 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import json
import os
import urllib.error
import urllib.request
from pathlib import Path
from typing import Any
DEFAULT_GATEWAY_URL = os.environ.get("SOC_MEMORY_GATEWAY_URL", "http://127.0.0.1:1934")
DEFAULT_GATEWAY_API_KEY = os.environ.get("SOC_MEMORY_GATEWAY_API_KEY", "")
DEFAULT_POC_ROOT = os.environ.get("SOC_MEMORY_POC_ROOT", "/home/tom/soc_memory_poc")
DEFAULT_VAULT_ROOT = str(Path(DEFAULT_POC_ROOT) / "obsidian-vault")
CASE_URI = "viking://resources/soc-memory-poc/case"
KNOWLEDGE_URI = "viking://resources/soc-memory-poc/knowledge"
def post_json(url: str, payload: dict[str, Any], api_key: str = "") -> dict[str, Any]:
data = json.dumps(payload).encode("utf-8")
req = urllib.request.Request(url, data=data, method="POST")
req.add_header("Content-Type", "application/json")
if api_key:
req.add_header("X-API-Key", api_key)
with urllib.request.urlopen(req, timeout=30) as resp:
return json.loads(resp.read().decode("utf-8"))
def canonicalize_uri(uri: str) -> str:
if ".json/" in uri:
return uri.split(".json/", 1)[0] + ".json"
return uri
def filter_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
deduped: dict[str, dict[str, Any]] = {}
for item in results:
uri = item.get("uri") or ""
canonical = canonicalize_uri(uri)
if not canonical.startswith(prefix):
continue
score = item.get("score") or 0
payload = dict(item)
payload["uri"] = canonical
if canonical not in deduped or score > (deduped[canonical].get("score") or 0):
deduped[canonical] = payload
return sorted(deduped.values(), key=lambda entry: entry.get("score") or 0, reverse=True)
def gateway_search(query: str, uri: str, limit: int, gateway_url: str, api_key: str) -> list[dict[str, Any]]:
payload = {"query": query, "limit": max(limit * 5, 10), "uri": uri}
raw = post_json(gateway_url.rstrip("/") + "/api/search", payload, api_key=api_key)
return filter_results(raw.get("results", []), uri)[:limit]
def obsidian_search(query: str, scenario: str, limit: int, vault_root: str) -> dict[str, Any]:
from search_obsidian_docs import load_docs, score_doc, tokenize
docs = load_docs(vault_root)
tokens = tokenize(query)
results: list[dict[str, Any]] = []
for doc in docs:
doc_scenario = doc.get("frontmatter", {}).get("scenario", "")
if scenario and doc_scenario != scenario:
continue
score, matched_terms = score_doc(query, tokens, doc)
if score <= 0:
continue
results.append(
{
"score": score,
"title": doc["title"],
"file_name": doc["file_name"],
"relative_path": doc["relative_path"],
"directory": doc["directory"],
"absolute_path": str(Path(vault_root) / doc["relative_path"]),
"summary": doc.get("summary", ""),
"matched_terms": matched_terms,
}
)
results.sort(key=lambda item: item["score"], reverse=True)
return {"matched_docs": results[:limit]}
def build_query(args: argparse.Namespace) -> str:
parts = [
args.scenario,
args.alert_type,
args.user,
args.host,
args.sender,
args.subject,
args.attachment,
args.url,
args.ip,
args.summary,
]
parts.extend(args.fact)
return " ".join(part.strip() for part in parts if part and part.strip())
def bullet(lines: list[str], fallback: str) -> str:
if not lines:
return f"- {fallback}"
return "\n".join(f"- {line}" for line in lines)
def top_results(items: list[dict[str, Any]], limit: int = 3) -> list[dict[str, Any]]:
return items[:limit]
def has_fact(args: argparse.Namespace, needle: str) -> bool:
haystacks = [args.summary, args.subject, args.alert_type, *args.fact]
lowered = needle.lower()
return any(lowered in (item or "").lower() for item in haystacks)
def summarize_evidence(args: argparse.Namespace) -> list[str]:
evidence: list[str] = []
if args.subject:
evidence.append(f"邮件主题/诱饵:{args.subject}")
if args.attachment:
evidence.append(f"恶意附件:{args.attachment}")
if args.url:
evidence.append(f"可疑链接:{args.url}")
if args.sender:
evidence.append(f"发件人:{args.sender}")
if args.ip:
evidence.append(f"相关 IP{args.ip}")
for fact in args.fact[:4]:
evidence.append(fact)
return evidence[:6]
def uri_to_id(uri: str) -> str:
return uri.rsplit('/', 1)[-1].replace('.json', '')
def infer_assessment(args: argparse.Namespace, case_results: list[dict[str, Any]]) -> str:
top_case = case_results[0] if case_results else None
if args.scenario == "phishing":
if args.url and args.attachment and (has_fact(args, "dmarc failed") or has_fact(args, "clicked")):
base = "当前告警高度符合凭证收割型钓鱼攻击特征,属于高可信 True Positive且存在凭证泄露风险。"
elif args.url or args.attachment:
base = "当前告警具备明显钓鱼迹象,尤其是附件与落地页组合,倾向于高风险钓鱼事件。"
else:
base = "当前告警呈现出邮件钓鱼模式,但仍需补充落地页、附件和用户交互证据进一步确认。"
elif args.scenario == "o365_suspicious_login":
if has_fact(args, "impossible travel") and (has_fact(args, "mfa fatigue") or has_fact(args, "inbox rule") or has_fact(args, "oauth")):
base = "当前告警高度符合 O365 账号接管链路,属于高可信身份威胁事件。"
else:
base = "当前告警表现为异常身份登录需要结合登录轨迹、MFA 和邮箱规则进一步确认是否账号接管。"
else:
base = "当前告警具备明显的可疑特征,需要结合历史案例和关联知识继续判断。"
if top_case:
return base + f" 最相近的历史案例为 `{uri_to_id(top_case.get('uri', ''))}`,说明当前 case 与既有攻击模式存在明显重合。"
return base
def format_memory_results(case_results: list[dict[str, Any]], knowledge_results: list[dict[str, Any]]) -> str:
lines: list[str] = []
for item in top_results(case_results, 2):
uri = item.get("uri", "")
abstract = (item.get("abstract") or "").strip()
snippet = abstract[:140] + "..." if len(abstract) > 140 else abstract
lines.append(f"`{uri_to_id(uri)}`{uri})— {snippet}")
for item in top_results(knowledge_results, 2):
uri = item.get("uri", "")
abstract = (item.get("abstract") or "").strip()
snippet = abstract[:140] + "..." if len(abstract) > 140 else abstract
lines.append(f"`{uri_to_id(uri)}`{uri})— {snippet}")
return bullet(lines, "未检索到直接关联的 Memory 条目")
def format_obsidian_results(obsidian_docs: list[dict[str, Any]]) -> str:
lines = []
for doc in top_results(obsidian_docs, 3):
reason = doc.get("summary") or ", ".join(doc.get("matched_terms", [])) or "与当前场景相关"
lines.append(
f"`{doc['file_name']}` — `obsidian-vault/{doc['relative_path']}` "
f"absolute: `{doc['absolute_path']}`)— {reason}"
)
return bullet(lines, "未找到直接关联的 Obsidian 文档")
def recommend_actions(args: argparse.Namespace, case_results: list[dict[str, Any]]) -> list[str]:
actions: list[str] = []
if args.scenario == "phishing":
actions.extend([
"检查用户是否已点击链接或提交凭据,必要时立即重置账号并撤销会话。",
"搜索同主题、同发件人、同 URL 或同附件的邮件是否已投递给其他用户。",
"封锁相关域名、URL 和可疑 IP并保留附件样本用于沙箱分析。",
"如邮件面向财务或高价值角色,优先排查是否存在 BEC 或后续横向利用。",
])
elif args.scenario == "o365_suspicious_login":
actions.extend([
"复核登录日志、MFA 记录和后续邮箱规则 / OAuth 变更。",
"若确认账号接管迹象,立即重置凭据并撤销所有活跃会话。",
"检查同源 IP、同设备指纹和同时间窗口内的其他用户活动。",
"对邮箱转发、隐藏规则、恶意 OAuth 授权进行专项排查。",
])
else:
actions.append("基于当前高风险迹象继续扩充调查和处置。")
if case_results:
actions.append("对照最相近历史案例,复用已有 IOC 和调查路径。")
return actions[:5]
def main() -> None:
parser = argparse.ArgumentParser(description="Run a structured SOC triage using memory retrieval and Obsidian lookup.")
parser.add_argument("--scenario", required=True, help="Scenario, e.g. phishing or o365_suspicious_login")
parser.add_argument("--alert-type", default="", help="Alert type")
parser.add_argument("--user", default="", help="Target user")
parser.add_argument("--host", default="", help="Target host")
parser.add_argument("--sender", default="", help="Sender email")
parser.add_argument("--subject", default="", help="Email subject or short title")
parser.add_argument("--attachment", default="", help="Attachment name")
parser.add_argument("--url", default="", help="Suspicious URL")
parser.add_argument("--ip", default="", help="Relevant IP")
parser.add_argument("--summary", default="", help="One-sentence alert summary")
parser.add_argument("--fact", action="append", default=[], help="Additional known fact; repeatable")
parser.add_argument("--gateway-url", default=DEFAULT_GATEWAY_URL, help="Memory Gateway URL")
parser.add_argument("--api-key", default=DEFAULT_GATEWAY_API_KEY, help="Memory Gateway API key")
parser.add_argument("--vault-root", default=DEFAULT_VAULT_ROOT, help="Obsidian vault root")
parser.add_argument("--limit", type=int, default=5, help="Search limit")
args = parser.parse_args()
query = build_query(args)
case_results: list[dict[str, Any]] = []
knowledge_results: list[dict[str, Any]] = []
obsidian_docs: list[dict[str, Any]] = []
memory_error = ""
obsidian_error = ""
try:
case_results = gateway_search(query, CASE_URI, args.limit, args.gateway_url, args.api_key)
knowledge_results = gateway_search(query, KNOWLEDGE_URI, args.limit, args.gateway_url, args.api_key)
except urllib.error.URLError as exc:
memory_error = f"Memory Gateway 不可用:{exc}"
try:
obsidian_resp = obsidian_search(query, args.scenario, args.limit, args.vault_root)
obsidian_docs = obsidian_resp.get("matched_docs", [])
except Exception as exc: # noqa: BLE001
obsidian_error = f"Obsidian 检索失败:{exc}"
lines = [
"## 研判结果",
infer_assessment(args, case_results),
"",
"## 关键证据",
bullet(summarize_evidence(args), "当前输入只提供了有限证据,需要继续补充调查信息"),
"",
"## 关联 Memory Retrieval",
]
if memory_error:
lines.append(f"- {memory_error}")
else:
lines.append(format_memory_results(case_results, knowledge_results))
lines.extend([
"",
"## 关联 Obsidian 文档",
])
if obsidian_error:
lines.append(f"- {obsidian_error}")
else:
lines.append(format_obsidian_results(obsidian_docs))
lines.extend([
"",
"## 建议动作",
bullet(recommend_actions(args, case_results), "继续补充告警细节后再执行更精确的响应动作"),
])
print("\n".join(lines))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,201 @@
#!/usr/bin/env python3
from __future__ import annotations
import argparse
import os
import re
import subprocess
import sys
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
TRIAGE_ALERT = SCRIPT_DIR / "triage_alert.py"
EMAIL_RE = re.compile(r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}")
URL_RE = re.compile(r"https?://[^\s<>\"]+")
IP_RE = re.compile(r"\b(?:\d{1,3}\.){3}\d{1,3}\b")
HOST_RE = re.compile(r"\b[A-Z]{2,}(?:-[A-Z0-9]+)+\b")
ATTACHMENT_RE = re.compile(r"\b[\w.-]+\.(?:html|htm|pdf|zip|docx|xlsx|eml)\b", re.IGNORECASE)
HEADER_RE = re.compile(
r"^(From|To|Subject|Attachment|URL|IP|Host|User|Alert type|Scenario)\s*:\s*(.+)$",
re.IGNORECASE | re.MULTILINE,
)
def first_nonempty(*values: str) -> str:
for value in values:
if value and value.strip():
return value.strip()
return ""
def load_text(args: argparse.Namespace) -> str:
if args.file:
return Path(args.file).read_text(encoding="utf-8")
if args.text:
return args.text
data = sys.stdin.read()
if data.strip():
return data
return ""
def find_header(text: str, name: str) -> str:
for key, value in HEADER_RE.findall(text):
if key.lower() == name.lower():
return value.strip()
return ""
def unique_matches(pattern: re.Pattern[str], text: str) -> list[str]:
seen: list[str] = []
for match in pattern.findall(text):
if match not in seen:
seen.append(match)
return seen
def infer_scenario(text: str, explicit_scenario: str = "", explicit_alert_type: str = "") -> tuple[str, str]:
if explicit_scenario:
return explicit_scenario, explicit_alert_type
lowered = text.lower()
if any(token in lowered for token in ["impossible travel", "mfa fatigue", "oauth consent", "inbox rule", "entra", "azuread", "sign-in", "signin"]):
alert_type = explicit_alert_type or ("azuread_impossible_travel" if "impossible travel" in lowered else "o365_suspicious_login")
return "o365_suspicious_login", alert_type
if any(token in lowered for token in ["phishing", "invoice", "attachment", "credential harvest", "fake microsoft 365", "dmarc", "mail_suspicious", "wire transfer"]):
if explicit_alert_type:
return "phishing", explicit_alert_type
if "wire transfer" in lowered or "executive impersonation" in lowered or "bec" in lowered:
return "phishing", "mail_bec_impersonation"
if "link" in lowered and "attachment" not in lowered:
return "phishing", "mail_suspicious_link"
return "phishing", "mail_suspicious_attachment"
return "phishing", explicit_alert_type
def collect_facts(text: str, provided: list[str]) -> list[str]:
facts: list[str] = []
for fact in provided:
if fact and fact not in facts:
facts.append(fact)
lowered = text.lower()
fact_patterns = [
("DMARC failed", ["dmarc failed"]),
("SPF failed", ["spf failed"]),
("User may have clicked the link", ["clicked", "user clicked"]),
("Credential submission suspected", ["submitted credentials", "credential submission", "entered credentials"]),
("Impossible travel observed", ["impossible travel"]),
("MFA fatigue observed", ["mfa fatigue", "repeated mfa"]),
("Inbox rule creation observed", ["inbox rule"]),
("OAuth consent activity observed", ["oauth consent"]),
]
for label, needles in fact_patterns:
if any(needle in lowered for needle in needles) and label not in facts:
facts.append(label)
for line in text.splitlines():
stripped = line.strip("-* \t")
if not stripped or len(stripped) > 160:
continue
lower = stripped.lower()
if any(word in lower for word in ["dmarc", "spf", "clicked", "credential", "impossible travel", "mfa", "inbox rule", "oauth"]):
if stripped not in facts:
facts.append(stripped)
return facts[:8]
def build_summary(text: str, subject: str, provided_summary: str = "") -> str:
if provided_summary:
return provided_summary[:240]
if subject:
return subject[:180]
for line in text.splitlines():
stripped = line.strip()
if len(stripped) >= 20 and ":" not in stripped[:20]:
return stripped[:240]
return text.strip()[:240]
def parse_input(args: argparse.Namespace) -> dict[str, str | list[str]]:
text = load_text(args)
scenario, alert_type = infer_scenario(text, args.scenario, args.alert_type)
emails = unique_matches(EMAIL_RE, text)
urls = unique_matches(URL_RE, text)
ips = unique_matches(IP_RE, text)
hosts = unique_matches(HOST_RE, text)
attachments = unique_matches(ATTACHMENT_RE, text)
sender = first_nonempty(args.sender, find_header(text, "From"), emails[0] if emails else "")
user = first_nonempty(args.user, find_header(text, "User"), find_header(text, "To"), emails[1] if len(emails) > 1 else "")
subject = first_nonempty(args.subject, find_header(text, "Subject"))
attachment = first_nonempty(args.attachment, find_header(text, "Attachment"), attachments[0] if attachments else "")
url = first_nonempty(args.url, find_header(text, "URL"), urls[0] if urls else "")
ip = first_nonempty(args.ip, find_header(text, "IP"), ips[0] if ips else "")
host = first_nonempty(args.host, find_header(text, "Host"), hosts[0] if hosts else "")
summary = build_summary(text, subject, args.summary)
facts = collect_facts(text, args.fact)
return {
"scenario": scenario,
"alert_type": alert_type,
"user": user,
"host": host,
"sender": sender,
"subject": subject,
"attachment": attachment,
"url": url,
"ip": ip,
"summary": summary,
"facts": facts,
}
def run_triage(parsed: dict[str, str | list[str]], limit: int) -> None:
cmd = [
sys.executable,
str(TRIAGE_ALERT),
"--scenario", str(parsed["scenario"]),
"--alert-type", str(parsed["alert_type"]),
"--user", str(parsed["user"]),
"--host", str(parsed["host"]),
"--sender", str(parsed["sender"]),
"--subject", str(parsed["subject"]),
"--attachment", str(parsed["attachment"]),
"--url", str(parsed["url"]),
"--ip", str(parsed["ip"]),
"--summary", str(parsed["summary"]),
"--limit", str(limit),
]
for fact in parsed["facts"]:
cmd.extend(["--fact", str(fact)])
subprocess.run(cmd, check=True, env=os.environ.copy())
def main() -> None:
parser = argparse.ArgumentParser(description="Unified SOC alert/email triage entrypoint with memory and Obsidian retrieval.")
parser.add_argument("--text", help="Raw email, ticket text, or freeform alert text")
parser.add_argument("--file", help="Path to a raw email/ticket/alert text file")
parser.add_argument("--scenario", default="", help="Optional scenario override")
parser.add_argument("--alert-type", default="", help="Optional alert type override")
parser.add_argument("--user", default="", help="Optional user override")
parser.add_argument("--host", default="", help="Optional host override")
parser.add_argument("--sender", default="", help="Optional sender override")
parser.add_argument("--subject", default="", help="Optional subject override")
parser.add_argument("--attachment", default="", help="Optional attachment override")
parser.add_argument("--url", default="", help="Optional URL override")
parser.add_argument("--ip", default="", help="Optional IP override")
parser.add_argument("--summary", default="", help="Optional summary override")
parser.add_argument("--fact", action="append", default=[], help="Additional known fact; repeatable")
parser.add_argument("--limit", type=int, default=5, help="Search limit")
args = parser.parse_args()
parsed = parse_input(args)
run_triage(parsed, args.limit)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,13 @@
#!/usr/bin/env python3
from __future__ import annotations
import os
import subprocess
import sys
from pathlib import Path
SCRIPT_DIR = Path(__file__).resolve().parent
TRIAGE_EMAIL = SCRIPT_DIR / "triage_email.py"
if __name__ == "__main__":
subprocess.run([sys.executable, str(TRIAGE_EMAIL), *sys.argv[1:]], check=True, env=os.environ.copy())

View File

@ -0,0 +1 @@
"""Memory Gateway 核心模块"""

55
memory_gateway/config.py Normal file
View File

@ -0,0 +1,55 @@
"""配置加载模块"""
import os
from pathlib import Path
from typing import Optional
import yaml
from pydantic import ValidationError
from .types import Config, ServerConfig, OpenVikingConfig, MemoryConfig, LoggingConfig
def load_config(config_path: Optional[str] = None) -> Config:
"""加载配置文件"""
if config_path is None:
config_path = os.environ.get("MEMORY_GATEWAY_CONFIG", "config.yaml")
config_file = Path(config_path)
if not config_file.exists():
# 返回默认配置
return Config()
try:
with open(config_file, "r", encoding="utf-8") as f:
data = yaml.safe_load(f)
if data is None:
return Config()
return Config(
server=ServerConfig(**data.get("server", {})),
openviking=OpenVikingConfig(**data.get("openviking", {})),
memory=MemoryConfig(**data.get("memory", {})),
logging=LoggingConfig(**data.get("logging", {})),
)
except (ValidationError, yaml.YAMLError) as e:
print(f"配置文件解析错误: {e}")
return Config()
def get_config() -> Config:
"""获取全局配置(单例)"""
global _config
if _config is None:
_config = load_config()
return _config
def set_config(config: Config) -> None:
"""设置全局配置"""
global _config
_config = config
_config: Optional[Config] = None

View File

@ -0,0 +1,302 @@
"""OpenViking client wrapper used by the SOC Memory POC."""
from __future__ import annotations
import json
import logging
import mimetypes
import tempfile
from pathlib import Path
from typing import Any, Optional
import httpx
from .config import get_config
from .types import MemoryEntry, ResourceEntry, SearchResult
logger = logging.getLogger(__name__)
class OpenVikingClient:
"""Thin async client for the OpenViking HTTP API."""
def __init__(
self,
base_url: Optional[str] = None,
api_key: Optional[str] = None,
timeout: int = 30,
account: str = "default",
user: str = "default",
):
self.config = get_config()
self.base_url = base_url or self.config.openviking.url
self.api_key = api_key or self.config.openviking.api_key or "your-secret-root-key"
self.timeout = timeout
self.account = account
self.user = user
self._client: Optional[httpx.AsyncClient] = None
def _get_headers(self) -> dict[str, str]:
headers = {}
if self.api_key:
headers["X-API-Key"] = self.api_key
headers["X-OpenViking-Account"] = self.account
headers["X-OpenViking-User"] = self.user
return headers
async def _get_client(self) -> httpx.AsyncClient:
if self._client is None:
self._client = httpx.AsyncClient(
base_url=self.base_url,
headers=self._get_headers(),
timeout=self.timeout,
)
return self._client
async def close(self):
if self._client:
await self._client.aclose()
self._client = None
async def health_check(self) -> dict[str, Any]:
client = await self._get_client()
try:
response = await client.get("/health")
response.raise_for_status()
return response.json()
except httpx.HTTPError as e:
logger.error(f"OpenViking 健康检查失败: {e}")
return {"status": "error", "message": str(e)}
async def search(
self,
query: str,
namespace: Optional[str] = None,
limit: Optional[int] = None,
uri: Optional[str] = None,
) -> SearchResult:
"""Semantic search against OpenViking resources/memories."""
client = await self._get_client()
payload: dict[str, Any] = {"query": query}
if limit:
payload["limit"] = limit
if uri:
payload["uri"] = uri
elif namespace:
payload["uri"] = f"viking://{namespace}"
try:
response = await client.post("/api/v1/search/search", json=payload)
response.raise_for_status()
data = response.json()
if data.get("status") != "ok":
logger.warning(f"搜索返回错误: {data.get('error')}")
return SearchResult(results=[], total=0)
result = data.get("result", {})
memories = result.get("memories", [])
resources = result.get("resources", [])
all_results = []
for m in memories + resources:
all_results.append(
{
"uri": m.get("uri"),
"abstract": m.get("abstract"),
"score": m.get("score"),
"context_type": m.get("context_type"),
}
)
return SearchResult(results=all_results, total=result.get("total", len(all_results)))
except httpx.HTTPError as e:
logger.error(f"搜索失败: {e}")
return SearchResult(results=[], total=0)
async def add_memory(
self,
content: str,
namespace: Optional[str] = None,
memory_type: str = "general",
) -> dict[str, Any]:
"""Add memory via session commit flow."""
client = await self._get_client()
ns = namespace or self.config.memory.default_namespace or "user/default/memories"
try:
response = await client.post("/api/v1/sessions", json={"mode": "interactive"})
response.raise_for_status()
session_data = response.json()
if session_data.get("status") != "ok":
return session_data
session_id = session_data["result"]["session_id"]
commit_response = await client.post(
f"/api/v1/sessions/{session_id}/commit",
json={
"messages": [
{
"role": "user",
"content": f"[{ns}/{memory_type}] {content}",
}
]
},
)
commit_response.raise_for_status()
return commit_response.json()
except httpx.HTTPError as e:
logger.error(f"添加记忆失败: {e}")
raise
async def _upload_temp_file(self, file_path: str | Path) -> str:
client = await self._get_client()
file_path = Path(file_path)
mime_type = mimetypes.guess_type(file_path.name)[0] or "application/octet-stream"
with file_path.open("rb") as f:
response = await client.post(
"/api/v1/resources/temp_upload",
files={"file": (file_path.name, f, mime_type)},
)
response.raise_for_status()
data = response.json()
result = data.get("result", {})
if "temp_path" in result:
return result["temp_path"]
if "temp_file_id" in result:
return result["temp_file_id"]
raise KeyError(f"Unexpected temp upload response: {data}")
async def add_resource(
self,
uri: str,
content: str,
resource_type: str = "text",
wait: bool = False,
) -> dict[str, Any]:
"""Add a text/json resource by uploading a temporary file first.
OpenViking HTTP API does not accept raw `uri + content` directly. The
client must upload a temp file and then create the resource with `to`.
"""
client = await self._get_client()
suffix_map = {
"json": ".json",
"text": ".txt",
"markdown": ".md",
"md": ".md",
}
suffix = suffix_map.get(resource_type, ".txt")
with tempfile.NamedTemporaryFile("w", encoding="utf-8", suffix=suffix, delete=False) as tmp:
tmp.write(content)
tmp_path = Path(tmp.name)
try:
temp_ref = await self._upload_temp_file(tmp_path)
payload = {
"temp_path": temp_ref,
"to": uri,
"wait": wait,
"source_name": Path(uri).name or tmp_path.name,
"strict": False,
}
response = await client.post("/api/v1/resources", json=payload)
if response.status_code >= 400:
logger.error("添加资源失败响应: %s", response.text)
response.raise_for_status()
return response.json()
except httpx.HTTPError as e:
logger.error(f"添加资源失败: {e}")
raise
finally:
tmp_path.unlink(missing_ok=True)
async def list_memories(
self,
namespace: Optional[str] = None,
memory_type: Optional[str] = None,
limit: Optional[int] = None,
) -> list[MemoryEntry]:
client = await self._get_client()
ns = namespace or "user/default/memories"
if memory_type:
ns = f"{ns}/{memory_type}"
try:
response = await client.post(
"/api/v1/search/search",
json={"query": "", "uri": f"viking://{ns}", "limit": limit or 10},
)
response.raise_for_status()
data = response.json()
if data.get("status") == "ok":
result = data.get("result", {})
memories = result.get("memories", [])
return [
MemoryEntry(
id=m.get("uri", ""),
content=m.get("abstract", ""),
namespace=ns,
memory_type=memory_type or "general",
)
for m in memories
]
return []
except httpx.HTTPError as e:
logger.error(f"列出记忆失败: {e}")
return []
async def list_resources(
self,
namespace: Optional[str] = None,
limit: Optional[int] = None,
) -> list[ResourceEntry]:
client = await self._get_client()
uri = f"viking://{namespace}" if namespace else "viking://resources"
try:
response = await client.post(
"/api/v1/search/search",
json={"query": "", "uri": uri, "limit": limit or 10},
)
response.raise_for_status()
data = response.json()
if data.get("status") == "ok":
result = data.get("result", {})
resources = result.get("resources", [])
return [
ResourceEntry(
uri=r.get("uri", ""),
content=r.get("abstract", ""),
resource_type="text",
)
for r in resources
]
return []
except httpx.HTTPError as e:
logger.error(f"列出资源失败: {e}")
return []
_client: Optional[OpenVikingClient] = None
async def get_openviking_client() -> OpenVikingClient:
global _client
if _client is None:
_client = OpenVikingClient()
return _client
async def close_openviking_client():
global _client
if _client:
await _client.close()
_client = None

387
memory_gateway/server.py Normal file
View File

@ -0,0 +1,387 @@
"""Memory Gateway MCP Server.
基于 Model Context Protocol 的记忆网关服务,为局域网内的 AI Agent 提供统一的 OpenViking 访问入口。
"""
import asyncio
import json
import logging
from contextlib import asynccontextmanager
from typing import Any, Optional
from fastapi import APIRouter, Depends, FastAPI, Header, HTTPException, Request, status
from fastapi.responses import JSONResponse
from fastapi.middleware.cors import CORSMiddleware
from mcp.server import Server
from mcp.types import TextContent, Tool
from sse_starlette import EventSourceResponse
from .config import get_config, set_config, Config
from .openviking_client import get_openviking_client, close_openviking_client
from .types import SearchRequest, AddMemoryRequest, AddResourceRequest
# 配置日志
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)
logger = logging.getLogger(__name__)
# 创建 MCP Server
mcp_server = Server("memory-gateway")
@mcp_server.list_tools()
async def list_tools() -> list[Tool]:
"""列出可用的 MCP 工具"""
return [
Tool(
name="search",
description="语义搜索记忆和资源",
inputSchema={
"type": "object",
"properties": {
"query": {"type": "string", "description": "搜索查询"},
"namespace": {"type": "string", "description": "命名空间(可选)"},
"limit": {"type": "integer", "description": "返回结果数量默认10"},
"uri": {"type": "string", "description": "资源 URI可选"},
},
"required": ["query"],
},
),
Tool(
name="add_memory",
description="添加新记忆",
inputSchema={
"type": "object",
"properties": {
"content": {"type": "string", "description": "记忆内容"},
"namespace": {"type": "string", "description": "命名空间(可选)"},
"memory_type": {"type": "string", "description": "记忆类型默认general"},
},
"required": ["content"],
},
),
Tool(
name="add_resource",
description="添加资源",
inputSchema={
"type": "object",
"properties": {
"uri": {"type": "string", "description": "资源 URI"},
"content": {"type": "string", "description": "资源内容"},
"resource_type": {"type": "string", "description": "资源类型默认text"},
},
"required": ["uri", "content"],
},
),
Tool(
name="get_status",
description="检查系统状态",
inputSchema={
"type": "object",
"properties": {},
},
),
Tool(
name="list_memories",
description="列出已存储的记忆",
inputSchema={
"type": "object",
"properties": {
"namespace": {"type": "string", "description": "命名空间(可选)"},
"memory_type": {"type": "string", "description": "记忆类型(可选)"},
"limit": {"type": "integer", "description": "返回数量默认10"},
},
},
),
Tool(
name="list_resources",
description="列出已存储的资源",
inputSchema={
"type": "object",
"properties": {
"namespace": {"type": "string", "description": "命名空间(可选)"},
"limit": {"type": "integer", "description": "返回数量默认10"},
},
},
),
]
@mcp_server.call_tool()
async def call_tool(name: str, arguments: Any) -> list[TextContent]:
"""调用 MCP 工具"""
try:
ov_client = await get_openviking_client()
if name == "search":
result = await ov_client.search(
query=arguments.get("query"),
namespace=arguments.get("namespace"),
limit=arguments.get("limit"),
uri=arguments.get("uri"),
)
return [TextContent(type="text", text=str(result.results))]
elif name == "add_memory":
result = await ov_client.add_memory(
content=arguments.get("content"),
namespace=arguments.get("namespace"),
memory_type=arguments.get("memory_type", "general"),
)
return [TextContent(type="text", text=str(result))]
elif name == "add_resource":
result = await ov_client.add_resource(
uri=arguments.get("uri"),
content=arguments.get("content"),
resource_type=arguments.get("resource_type", "text"),
)
return [TextContent(type="text", text=str(result))]
elif name == "get_status":
ov_status = await ov_client.health_check()
return [TextContent(type="text", text=f"Memory Gateway: OK\nOpenViking: {ov_status}")]
elif name == "list_memories":
memories = await ov_client.list_memories(
namespace=arguments.get("namespace"),
memory_type=arguments.get("memory_type"),
limit=arguments.get("limit"),
)
return [TextContent(type="text", text=str([m.model_dump() for m in memories]))]
elif name == "list_resources":
resources = await ov_client.list_resources(
namespace=arguments.get("namespace"),
limit=arguments.get("limit"),
)
return [TextContent(type="text", text=str([r.model_dump() for r in resources]))]
else:
raise ValueError(f"Unknown tool: {name}")
except Exception as e:
logger.error(f"工具执行失败: {e}")
return [TextContent(type="text", text=f"Error: {str(e)}")]
@asynccontextmanager
async def lifespan(app: FastAPI):
"""应用生命周期管理"""
logger.info("Memory Gateway 启动中...")
config = get_config()
logger.info(f"配置加载完成: {config.server.host}:{config.server.port}")
logger.info(f"OpenViking 后端: {config.openviking.url}")
# 测试 OpenViking 连接
try:
ov_client = await get_openviking_client()
status = await ov_client.health_check()
logger.info(f"OpenViking 连接状态: {status}")
except Exception as e:
logger.warning(f"OpenViking 连接失败: {e}")
yield
logger.info("Memory Gateway 关闭中...")
await close_openviking_client()
def verify_api_key(x_api_key: Optional[str] = Header(default=None)) -> None:
"""在配置了 API Key 时校验请求头。"""
expected_key = get_config().server.api_key
if not expected_key:
return
if x_api_key != expected_key:
raise HTTPException(
status_code=status.HTTP_401_UNAUTHORIZED,
detail="Invalid or missing API key",
)
# FastAPI 应用
app = FastAPI(title="Memory Gateway", version="0.1.0", lifespan=lifespan)
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/health", dependencies=[Depends(verify_api_key)])
async def health_check():
"""健康检查"""
try:
ov_client = await get_openviking_client()
ov_status = await ov_client.health_check()
return {
"status": "ok",
"gateway": "memory-gateway",
"openviking": ov_status,
}
except Exception as e:
return {
"status": "degraded",
"gateway": "memory-gateway",
"error": str(e),
}
mcp_router = APIRouter()
async def mcp_server_events(request: Request, _: None = Depends(verify_api_key)):
"""MCP Server-Sent Events 端点 - 使用 stdio 模式模拟"""
async def event_generator():
# 发送初始化消息
yield {"event": "initialize", "data": json.dumps({"protocolVersion": "2024-11-05"})}
# 保持连接
try:
while True:
await asyncio.sleep(30)
yield {"event": "ping", "data": ""}
except asyncio.CancelledError:
pass
return EventSourceResponse(event_generator())
mcp_router.add_api_route("/sse", mcp_server_events, methods=["GET"])
# MCP JSON-RPC 端点(简化实现)
async def mcp_rpc(request: Request, _: None = Depends(verify_api_key)):
"""处理 MCP JSON-RPC 请求"""
body = await request.json()
method = body.get("method")
params = body.get("params", {})
msg_id = body.get("id")
try:
if method == "tools/list":
tools = await list_tools()
result = {
"tools": [
{
"name": t.name,
"description": t.description,
"inputSchema": t.inputSchema,
}
for t in tools
]
}
elif method == "tools/call":
tool_name = params.get("name")
tool_args = params.get("arguments", {})
result_content = await call_tool_tool(tool_name, tool_args)
result = {"content": [c.model_dump() for c in result_content]}
else:
return JSONResponse(
status_code=400,
content={"jsonrpc": "2.0", "error": {"code": -32601, "message": f"Method not found: {method}"}, "id": msg_id}
)
return {"jsonrpc": "2.0", "result": result, "id": msg_id}
except Exception as e:
logger.error(f"MCP RPC 错误: {e}")
return JSONResponse(
status_code=500,
content={"jsonrpc": "2.0", "error": {"code": -32603, "message": str(e)}, "id": msg_id}
)
async def call_tool_tool(name: str, arguments: dict) -> list[TextContent]:
"""调用工具的内部函数"""
return await call_tool(name, arguments)
mcp_router.add_api_route("/rpc", mcp_rpc, methods=["POST"])
# 注册 MCP 路由
app.include_router(mcp_router, prefix="/mcp", tags=["mcp"])
@app.post("/api/search", dependencies=[Depends(verify_api_key)])
async def api_search(request: SearchRequest):
"""REST API: 搜索"""
ov_client = await get_openviking_client()
result = await ov_client.search(
query=request.query,
namespace=request.namespace or get_config().memory.default_namespace,
limit=request.limit or get_config().memory.search_limit,
uri=request.uri,
)
return {"results": result.results, "total": result.total}
@app.post("/api/memory", dependencies=[Depends(verify_api_key)])
async def api_add_memory(request: AddMemoryRequest):
"""REST API: 添加记忆"""
ov_client = await get_openviking_client()
result = await ov_client.add_memory(
content=request.content,
namespace=request.namespace or get_config().memory.default_namespace,
memory_type=request.memory_type,
)
return result
@app.post("/api/resource", dependencies=[Depends(verify_api_key)])
async def api_add_resource(request: AddResourceRequest):
"""REST API: 添加资源"""
ov_client = await get_openviking_client()
result = await ov_client.add_resource(
uri=request.uri,
content=request.content,
resource_type=request.resource_type,
)
return result
def create_app(config: Optional[Config] = None) -> FastAPI:
"""创建 FastAPI 应用"""
if config:
set_config(config)
return app
# 入口点
def main():
"""主入口"""
import argparse
import uvicorn
parser = argparse.ArgumentParser(description="Memory Gateway MCP Server")
parser.add_argument("--config", default="config.yaml", help="配置文件路径")
parser.add_argument("--host", default=None, help="监听地址")
parser.add_argument("--port", type=int, default=None, help="监听端口")
args = parser.parse_args()
# 加载配置
from .config import load_config as load
config = load(args.config)
if args.host:
config.server.host = args.host
if args.port:
config.server.port = args.port
set_config(config)
# 启动服务
uvicorn.run(
app,
host=config.server.host,
port=config.server.port,
log_level=config.logging.level.lower(),
)
if __name__ == "__main__":
main()

82
memory_gateway/types.py Normal file
View File

@ -0,0 +1,82 @@
"""类型定义"""
from typing import Optional, Any
from pydantic import BaseModel, Field
class ServerConfig(BaseModel):
"""服务器配置"""
host: str = "0.0.0.0"
port: int = 1934
api_key: str = ""
class OpenVikingConfig(BaseModel):
"""OpenViking 后端配置"""
url: str = "http://localhost:1933"
api_key: str = ""
timeout: int = 30
class MemoryConfig(BaseModel):
"""记忆配置"""
default_namespace: str = "soc"
search_limit: int = 10
class LoggingConfig(BaseModel):
"""日志配置"""
level: str = "INFO"
format: str = "%(asctime)s - %(name)s - %(levelname)s - %(message)s"
class Config(BaseModel):
"""完整配置"""
server: ServerConfig = Field(default_factory=ServerConfig)
openviking: OpenVikingConfig = Field(default_factory=OpenVikingConfig)
memory: MemoryConfig = Field(default_factory=MemoryConfig)
logging: LoggingConfig = Field(default_factory=LoggingConfig)
class SearchRequest(BaseModel):
"""搜索请求"""
query: str
namespace: Optional[str] = None
limit: Optional[int] = None
uri: Optional[str] = None
class AddMemoryRequest(BaseModel):
"""添加记忆请求"""
content: str
namespace: Optional[str] = None
memory_type: Optional[str] = "general"
class AddResourceRequest(BaseModel):
"""添加资源请求"""
uri: str
content: str
resource_type: Optional[str] = "text"
class SearchResult(BaseModel):
"""搜索结果"""
results: list[dict[str, Any]]
total: int
class MemoryEntry(BaseModel):
"""记忆条目"""
id: str
content: str
namespace: str
memory_type: str
created_at: Optional[str] = None
class ResourceEntry(BaseModel):
"""资源条目"""
uri: str
content: str
resource_type: str
created_at: Optional[str] = None

View File

@ -0,0 +1,101 @@
---
case_id: CASE-2026-1001
scenario: o365_suspicious_login
alert_type: azuread_impossible_travel
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1001 Impossible travel login followed by MFA prompt fatigue
## 基本信息
- Case ID: CASE-2026-1001
- 标题: Impossible travel login followed by MFA prompt fatigue
- 告警类型: azuread_impossible_travel
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.
## 关键实体
- 用户: david@corp.example
- 主机: WS-DAVID-01
- 邮箱: david@corp.example
- IP: 203.0.113.150, 198.51.100.61
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- Two successful sign-ins from geographically impossible locations within 15 minutes.
- MFA challenge volume increased abnormally before final success.
- User confirmed they did not initiate overseas login.
## 研判过程摘要
1. 确认告警场景与核心风险User account showed impossible travel between Shanghai and Amsterdam, followed by repeated MFA prompts and successful sign-in.
2. 提取关键证据并交叉验证Two successful sign-ins from geographically impossible locations within 15 minutes.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Two successful sign-ins from geographically impossible locations within 15 minutes.
- 补充依据MFA challenge volume increased abnormally before final success.
## 处置建议
- 复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。
- 若存在账号接管迹象,立即执行会话失效和凭据重置。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_impossible_travel
- 误报特征: 无
- 需关注的变体: 相关标签o365, login, impossible-travel, mfa-fatigue
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-IMPOSSIBLE-TRAVEL]], [[KB-O365-MFA-FATIGUE]]
- 关联历史 Case: [[CASE-2026-1005]], [[CASE-2026-1004]]
- 关联实体: [[david@corp.example]], [[WS-DAVID-01]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1005]] (case score=0.687) This directory contains a single case record documenting a false positive alert triggered by Microsoft 365s impossible travel detection sys...
- [[CASE-2026-1004]] (case score=0.636) This directory contains a single incident case file related to a suspicious Microsoft 365 login attempt, identified as CASE-2026-1004. The c...
### 推荐知识条目
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.69) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
- [[PB-O365-LOGIN-001]] (knowledge score=0.63) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_impossible_travel
- #verdict/true-positive
- #o365
- #login
- #impossible-travel
- #mfa-fatigue

View File

@ -0,0 +1,100 @@
---
case_id: CASE-2026-1002
scenario: o365_suspicious_login
alert_type: azuread_legacy_auth_attempt
severity: medium
verdict: false_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1002 Legacy protocol sign-in from unfamiliar IP blocked by policy
## 基本信息
- Case ID: CASE-2026-1002
- 标题: Legacy protocol sign-in from unfamiliar IP blocked by policy
- 告警类型: azuread_legacy_auth_attempt
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 误报
- 严重等级: medium
## 告警摘要
Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.
## 关键实体
- 用户: svc-migration@corp.example
- 主机: 无
- 邮箱: svc-migration@corp.example
- IP: 192.0.2.24
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- The account is a known migration service account.
- Source IP matched approved cloud migration vendor range.
- No successful sign-in occurred due to policy block.
## 研判过程摘要
1. 确认告警场景与核心风险Legacy authentication attempt from a cloud IP was blocked; investigation tied it to an approved migration tool test.
2. 提取关键证据并交叉验证The account is a known migration service account.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:误报。
## 结论依据
- 结论为误报。
- 最关键依据The account is a known migration service account.
- 补充依据Source IP matched approved cloud migration vendor range.
## 处置建议
- 记录误报原因,并更新检测例外或抑制条件。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_legacy_auth_attempt
- 误报特征: 本案最终确认为误报,可用于补充抑制条件。
- 需关注的变体: 相关标签o365, login, false-positive, legacy-auth
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-LEGACY-AUTH]], [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1001]], [[CASE-2026-1004]]
- 关联实体: [[svc-migration@corp.example]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1001]] (case score=0.651) This directory contains a structured security incident case report related to a high-severity event in an Office 365 environment, identified...
- [[CASE-2026-1004]] (case score=0.634) This directory contains a single incident case file related to a suspicious Microsoft 365 login attempt, identified as CASE-2026-1004. The c...
### 推荐知识条目
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.626) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
- [[PB-O365-LOGIN-001]] (knowledge score=0.61) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_legacy_auth_attempt
- #verdict/false-positive
- #o365
- #login
- #false-positive
- #legacy-auth

View File

@ -0,0 +1,101 @@
---
case_id: CASE-2026-1003
scenario: o365_suspicious_login
alert_type: azuread_suspicious_inbox_rule_after_login
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1003 Suspicious inbox rule creation after successful foreign login
## 基本信息
- Case ID: CASE-2026-1003
- 标题: Suspicious inbox rule creation after successful foreign login
- 告警类型: azuread_suspicious_inbox_rule_after_login
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.
## 关键实体
- 用户: emma@corp.example
- 主机: WS-EMMA-07
- 邮箱: emma@corp.example
- IP: 198.51.100.98
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- Successful sign-in from untrusted ASN.
- Inbox rule moved wire transfer emails to RSS Feeds folder.
- Mailbox audit showed rule creation minutes after login.
## 研判过程摘要
1. 确认告警场景与核心风险An overseas sign-in to Microsoft 365 was followed by inbox rule creation to hide finance-related emails.
2. 提取关键证据并交叉验证Successful sign-in from untrusted ASN.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Successful sign-in from untrusted ASN.
- 补充依据Inbox rule moved wire transfer emails to RSS Feeds folder.
## 处置建议
- 复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。
- 若存在账号接管迹象,立即执行会话失效和凭据重置。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_suspicious_inbox_rule_after_login
- 误报特征: 无
- 需关注的变体: 相关标签o365, login, inbox-rule, account-compromise
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-INBOX-RULE-ABUSE]], [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1005]], [[CASE-2026-1001]]
- 关联实体: [[emma@corp.example]], [[WS-EMMA-07]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1005]] (case score=0.667) This directory contains a single case record documenting a false positive alert triggered by Microsoft 365s impossible travel detection sys...
- [[CASE-2026-1001]] (case score=0.666) This document is a structured case report detailing a high-severity security incident involving suspicious login activity in an Office 365 e...
### 推荐知识条目
- [[PB-O365-LOGIN-001]] (knowledge score=0.653) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.645) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_suspicious_inbox_rule_after_login
- #verdict/true-positive
- #o365
- #login
- #inbox-rule
- #account-compromise

View File

@ -0,0 +1,101 @@
---
case_id: CASE-2026-1004
scenario: o365_suspicious_login
alert_type: azuread_password_spray_attempt
severity: medium
verdict: uncertain
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1004 Multiple failed logins from residential proxy but no successful access
## 基本信息
- Case ID: CASE-2026-1004
- 标题: Multiple failed logins from residential proxy but no successful access
- 告警类型: azuread_password_spray_attempt
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: uncertain
- 严重等级: medium
## 告警摘要
Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.
## 关键实体
- 用户: frank@corp.example
- 主机: 无
- 邮箱: frank@corp.example
- IP: 203.0.113.201
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- High-volume failed attempts over a short period.
- Source IP attributed to a residential proxy provider.
- No matching successful sign-in or MFA event found.
## 研判过程摘要
1. 确认告警场景与核心风险Repeated failed Microsoft 365 sign-in attempts targeted one user from a residential proxy network, with no successful authentication observed.
2. 提取关键证据并交叉验证High-volume failed attempts over a short period.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定uncertain。
## 结论依据
- 结论为uncertain。
- 最关键依据High-volume failed attempts over a short period.
- 补充依据Source IP attributed to a residential proxy provider.
## 处置建议
- 复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。
- 若存在账号接管迹象,立即执行会话失效和凭据重置。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_password_spray_attempt
- 误报特征: 无
- 需关注的变体: 相关标签o365, login, password-spray, pending
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1001]], [[CASE-2026-1003]]
- 关联实体: [[frank@corp.example]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1001]] (case score=0.665) This directory contains a structured security incident case report related to a high-severity event in an Office 365 environment, identified...
- [[CASE-2026-1003]] (case score=0.627) This directory contains a structured incident case report focused on a confirmed Microsoft 365 account compromise involving suspicious login...
### 推荐知识条目
- [[PB-O365-LOGIN-001]] (knowledge score=0.614) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.609) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_password_spray_attempt
- #verdict/uncertain
- #o365
- #login
- #password-spray
- #pending

View File

@ -0,0 +1,100 @@
---
case_id: CASE-2026-1005
scenario: o365_suspicious_login
alert_type: azuread_impossible_travel
severity: medium
verdict: false_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-1005 Traveling executive triggered impossible travel but activity was legitimate
## 基本信息
- Case ID: CASE-2026-1005
- 标题: Traveling executive triggered impossible travel but activity was legitimate
- 告警类型: azuread_impossible_travel
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 误报
- 严重等级: medium
## 告警摘要
Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.
## 关键实体
- 用户: grace@corp.example
- 主机: VIP-LAPTOP-01
- 邮箱: grace@corp.example
- IP: 192.0.2.90, 203.0.113.77
- 域名: 无
- 文件 Hash: 无
- 其他 IOC: 无
## 关键证据
- Approved travel request existed.
- One login originated from corporate VPN exit node.
- Device and user agent were consistent with known user profile.
## 研判过程摘要
1. 确认告警场景与核心风险Executive account triggered impossible travel due to corporate VPN exit node while the user was on an approved overseas trip.
2. 提取关键证据并交叉验证Approved travel request existed.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:误报。
## 结论依据
- 结论为误报。
- 最关键依据Approved travel request existed.
- 补充依据One login originated from corporate VPN exit node.
## 处置建议
- 记录误报原因,并更新检测例外或抑制条件。
## 可复用模式
- 命中模式: scenario:o365_suspicious_login, alert_type:azuread_impossible_travel
- 误报特征: 本案最终确认为误报,可用于补充抑制条件。
- 需关注的变体: 相关标签o365, login, false-positive, travel
## 关联知识
- 关联 Playbook: [[PB-O365-LOGIN-001]]
- 关联 KB: [[KB-O365-IMPOSSIBLE-TRAVEL]]
- 关联历史 Case: [[CASE-2026-1001]], [[CASE-2026-1004]]
- 关联实体: [[grace@corp.example]], [[VIP-LAPTOP-01]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-1001]] (case score=0.684) This directory contains a structured security incident case report related to a high-severity event in an Office 365 environment, identified...
- [[CASE-2026-1004]] (case score=0.63) This directory contains a single incident case file related to a suspicious Microsoft 365 login attempt, identified as CASE-2026-1004. The c...
### 推荐知识条目
- [[KB-O365-IMPOSSIBLE-TRAVEL]] (knowledge score=0.703) This directory contains a knowledge base artifact focused on analyzing and validating Microsoft 365 impossible travel alerts—security events...
- [[PB-O365-LOGIN-001]] (knowledge score=0.626) This directory contains a security playbook focused on detecting and responding to suspicious Microsoft Entra ID sign-in activities within M...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/o365_suspicious_login
- #alert/azuread_impossible_travel
- #verdict/false-positive
- #o365
- #login
- #false-positive
- #travel

View File

@ -0,0 +1,101 @@
---
case_id: CASE-2026-0001
scenario: phishing
alert_type: mail_suspicious_attachment
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0001 Finance user received invoice-themed phishing email
## 基本信息
- Case ID: CASE-2026-0001
- 标题: Finance user received invoice-themed phishing email
- 告警类型: mail_suspicious_attachment
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.
## 关键实体
- 用户: alice@corp.example
- 主机: FIN-LAPTOP-12
- 邮箱: alice@corp.example
- IP: 198.51.100.20
- 域名: vendor-payments.com, vendor-payments-login.com
- 文件 Hash: sha256:phish0001
- 其他 IOC: https://vendor-payments-login.com/review, billing@vendor-payments.com
## 关键证据
- Sender domain was newly observed and failed DMARC.
- Attachment redirected to a fake Microsoft 365 login page.
- User clicked the link before mail quarantine completed.
## 研判过程摘要
1. 确认告警场景与核心风险Finance user received an invoice-themed phishing email containing a malicious HTML attachment that redirected to a credential harvesting page.
2. 提取关键证据并交叉验证Sender domain was newly observed and failed DMARC.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Sender domain was newly observed and failed DMARC.
- 补充依据Attachment redirected to a fake Microsoft 365 login page.
## 处置建议
- 隔离相同主题、发件人或 URL 的邮件样本。
- 核查用户是否点击或提交凭据,并按需执行凭据重置。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_suspicious_attachment
- 误报特征: 无
- 需关注的变体: 相关标签phishing, email, credential-harvest, finance
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-PHISH-HEADER-CHECK]], [[KB-CRED-HARVEST-PATTERNS]]
- 关联历史 Case: [[CASE-2026-0004]], [[CASE-2026-0002]]
- 关联实体: [[alice@corp.example]], [[FIN-LAPTOP-12]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0004]] (case score=0.662) This directory contains a structured incident case report related to a phishing attack targeting a shared mailbox via a spoofed OneDrive not...
- [[CASE-2026-0002]] (case score=0.631) This directory contains a single case record detailing the investigation of a suspicious payroll notification email flagged due to a shorten...
### 推荐知识条目
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.656) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
- [[PB-PHISH-001]] (knowledge score=0.639) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_suspicious_attachment
- #verdict/true-positive
- #phishing
- #email
- #credential-harvest
- #finance

View File

@ -0,0 +1,100 @@
---
case_id: CASE-2026-0002
scenario: phishing
alert_type: mail_suspicious_link
severity: medium
verdict: false_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0002 Payroll notification email flagged but determined benign
## 基本信息
- Case ID: CASE-2026-0002
- 标题: Payroll notification email flagged but determined benign
- 告警类型: mail_suspicious_link
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 误报
- 严重等级: medium
## 告警摘要
Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.
## 关键实体
- 用户: bob@corp.example
- 主机: HR-LAPTOP-03
- 邮箱: bob@corp.example
- IP: 无
- 域名: hr-vendor.example
- 文件 Hash: 无
- 其他 IOC: https://bit.ly/hr-portal-example, notify@hr-vendor.example
## 关键证据
- Sender domain aligned with SPF and DKIM.
- Destination domain matched approved supplier inventory.
- No credential prompt anomaly observed.
## 研判过程摘要
1. 确认告警场景与核心风险Payroll update email was flagged due to a shortened URL, but the destination was the approved HR vendor portal.
2. 提取关键证据并交叉验证Sender domain aligned with SPF and DKIM.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:误报。
## 结论依据
- 结论为误报。
- 最关键依据Sender domain aligned with SPF and DKIM.
- 补充依据Destination domain matched approved supplier inventory.
## 处置建议
- 记录误报原因,并更新检测例外或抑制条件。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_suspicious_link
- 误报特征: 本案最终确认为误报,可用于补充抑制条件。
- 需关注的变体: 相关标签phishing, email, false-positive, vendor
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-PHISH-HEADER-CHECK]], [[KB-CRED-HARVEST-PATTERNS]]
- 关联历史 Case: [[CASE-2026-0004]], [[CASE-2026-0001]]
- 关联实体: [[bob@corp.example]], [[HR-LAPTOP-03]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0004]] (case score=0.549) This directory contains a structured incident case report related to a phishing attack targeting a shared mailbox via a spoofed OneDrive not...
- [[CASE-2026-0001]] (case score=0.532) This directory contains a structured case report detailing a high-severity phishing incident targeting a finance user via a malicious invoic...
### 推荐知识条目
- [[PB-PHISH-001]] (knowledge score=0.514) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.494) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_suspicious_link
- #verdict/false-positive
- #phishing
- #email
- #false-positive
- #vendor

View File

@ -0,0 +1,101 @@
---
case_id: CASE-2026-0003
scenario: phishing
alert_type: mail_bec_impersonation
severity: high
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0003 Executive impersonation email requested urgent wire transfer
## 基本信息
- Case ID: CASE-2026-0003
- 标题: Executive impersonation email requested urgent wire transfer
- 告警类型: mail_bec_impersonation
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: high
## 告警摘要
An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.
## 关键实体
- 用户: carol@corp.example
- 主机: FIN-LAPTOP-08
- 邮箱: carol@corp.example
- IP: 203.0.113.45
- 域名: c0rp-example.com
- 文件 Hash: 无
- 其他 IOC: ceo@c0rp-example.com
## 关键证据
- Lookalike domain used numeric substitution.
- Language pressure matched prior BEC pattern.
- No historical communication from sender domain.
## 研判过程摘要
1. 确认告警场景与核心风险An executive impersonation email targeted finance staff with an urgent wire transfer request from a lookalike domain.
2. 提取关键证据并交叉验证Lookalike domain used numeric substitution.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Lookalike domain used numeric substitution.
- 补充依据Language pressure matched prior BEC pattern.
## 处置建议
- 隔离相同主题、发件人或 URL 的邮件样本。
- 核查用户是否点击或提交凭据,并按需执行凭据重置。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_bec_impersonation
- 误报特征: 无
- 需关注的变体: 相关标签phishing, bec, executive-impersonation
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-CRED-HARVEST-PATTERNS]], [[KB-PHISH-HEADER-CHECK]]
- 关联历史 Case: [[CASE-2026-0001]], [[CASE-2026-0004]]
- 关联实体: [[carol@corp.example]], [[FIN-LAPTOP-08]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0001]] (case score=0.572) This directory contains a structured case report detailing a high-severity phishing incident targeting a finance user via a malicious invoic...
- [[CASE-2026-0004]] (case score=0.566) This directory contains a structured incident case report related to a phishing attack targeting a shared mailbox via a spoofed OneDrive not...
### 推荐知识条目
- [[PB-PHISH-001]] (knowledge score=0.538) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.522) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
- [[KB-PHISH-HEADER-CHECK]] (knowledge score=0.512) This directory contains a structured knowledge base document focused on validating phishing emails through detailed analysis of email header...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_bec_impersonation
- #verdict/true-positive
- #phishing
- #bec
- #executive-impersonation

View File

@ -0,0 +1,100 @@
---
case_id: CASE-2026-0004
scenario: phishing
alert_type: mail_suspicious_attachment
severity: medium
verdict: true_positive
source: soc-memory-poc
openviking_enriched: true
---
# CASE-2026-0004 Shared mailbox received OneDrive lure with HTML attachment
## 基本信息
- Case ID: CASE-2026-0004
- 标题: Shared mailbox received OneDrive lure with HTML attachment
- 告警类型: mail_suspicious_attachment
- 来源系统: SOC Memory POC Mock Dataset
- 时间范围: 待补充
- 研判人 / Agent: AI Agent Draft
- 最终结论: 真报
- 严重等级: medium
## 告警摘要
Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.
## 关键实体
- 用户: shared-finance@corp.example
- 主机: 无
- 邮箱: shared-finance@corp.example
- IP: 198.51.100.87
- 域名: sharepoint-notify.com
- 文件 Hash: sha256:phish0004
- 其他 IOC: https://onedrive-review-login.example, noreply@sharepoint-notify.com
## 关键证据
- Attachment rendered a fake Microsoft sign-in page.
- Landing page hosted outside Microsoft IP space.
- Mail body reused branding from previous phishing campaign.
## 研判过程摘要
1. 确认告警场景与核心风险Shared finance mailbox received a fake OneDrive notification with an HTML attachment that led to credential collection.
2. 提取关键证据并交叉验证Attachment rendered a fake Microsoft sign-in page.
3. 对照关联 playbook / KB 复核告警模式与处置路径。
4. 基于关键证据与场景模式完成结论判定:真报。
## 结论依据
- 结论为真报。
- 最关键依据Attachment rendered a fake Microsoft sign-in page.
- 补充依据Landing page hosted outside Microsoft IP space.
## 处置建议
- 隔离相同主题、发件人或 URL 的邮件样本。
- 核查用户是否点击或提交凭据,并按需执行凭据重置。
## 可复用模式
- 命中模式: scenario:phishing, alert_type:mail_suspicious_attachment
- 误报特征: 无
- 需关注的变体: 相关标签phishing, email, onedrive-lure
## 关联知识
- 关联 Playbook: [[PB-PHISH-001]]
- 关联 KB: [[KB-CRED-HARVEST-PATTERNS]]
- 关联历史 Case: [[CASE-2026-0001]], [[CASE-2026-0003]]
- 关联实体: [[shared-finance@corp.example]]
## 自动关联推荐
### 推荐历史 Case
- [[CASE-2026-0001]] (case score=0.675) This directory contains a structured case report detailing a high-severity phishing incident targeting a finance user via a malicious invoic...
- [[CASE-2026-0003]] (case score=0.606) This directory contains a structured incident report for a high-severity phishing attack involving executive impersonation, classified under...
### 推荐知识条目
- [[KB-CRED-HARVEST-PATTERNS]] (knowledge score=0.652) This directory contains a structured knowledge base artifact focused on identifying and investigating credential harvesting campaigns, parti...
- [[PB-PHISH-001]] (knowledge score=0.608) This directory contains a phishing email investigation playbook designed to standardize incident response procedures for suspicious emails, ...
## Lessons Learned
- 本案可沉淀为后续同类告警的快速判定参考。
- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。
## 标签
- #case
- #scenario/phishing
- #alert/mail_suspicious_attachment
- #verdict/true-positive
- #phishing
- #email
- #onedrive-lure

View File

@ -0,0 +1,76 @@
# Case Note Template
## 基本信息
- Case ID:
- 标题:
- 告警类型:
- 来源系统:
- 时间范围:
- 研判人 / Agent:
- 最终结论:
- 严重等级:
## 告警摘要
一句话概述这次 case 的核心问题。
## 关键实体
- 用户:
- 主机:
- 邮箱:
- IP:
- 域名:
- 文件 Hash:
- 其他 IOC:
## 关键证据
- 证据 1:
- 证据 2:
- 证据 3:
## 研判过程摘要
只保留对后续复用有价值的关键步骤,不记录所有原始过程。
1.
2.
3.
## 结论依据
- 为什么判定为真报 / 误报 / 可疑待定
- 哪些信号最关键
## 处置建议
-
-
## 可复用模式
- 命中模式:
- 误报特征:
- 需关注的变体:
## 关联知识
- 关联 Playbook:
- 关联 KB:
- 关联历史 Case:
- 关联实体:
## Lessons Learned
- 本案新增了什么可复用经验
- 哪些规则、知识或流程应更新
## 标签
- `#case`
- `#alert/...`
- `#verdict/true-positive`
- `#verdict/false-positive`
- `#ttp/...`

View File

@ -0,0 +1,59 @@
# Playbook Template
## 基本信息
- 名称:
- 适用告警类型:
- 场景:
- 最近更新时间:
- 负责人:
## 场景描述
这个 playbook 解决什么问题,适用于哪些前置条件。
## 输入信号
- 必要信号:
- 可选信号:
- 常见数据源:
## 调查步骤
1.
2.
3.
## 关键判断点
- 什么情况下倾向真报
- 什么情况下倾向误报
- 哪些证据最关键
## 常见误报模式
-
-
## 常见真报模式
-
-
## 升级 / 处置建议
-
-
## 关联内容
- 相关 Case:
- 相关 KB:
- 相关 IOC:
- 相关 TTP:
## 标签
- `#playbook`
- `#alert/...`
- `#ttp/...`

View File

@ -0,0 +1,52 @@
# Report Summary Template
## 基本信息
- 标题:
- 来源:
- 日期:
- 作者 / 团队:
- 类型:
## 核心摘要
用 3 到 5 句话总结对 SOC 研判最有帮助的内容。
## 关键发现
- 发现 1:
- 发现 2:
- 发现 3:
## 关键实体
- 攻击者:
- 工具:
- 域名 / IP:
- Hash:
- 邮件主题 / 发件特征:
## 对 SOC 的实际价值
- 对哪些告警类型有帮助
- 对哪些 playbook 需要更新
- 对哪些规则或研判路径有启发
## 可沉淀记忆
- 哪些内容适合作为 Knowledge Memory
- 哪些内容适合作为 Case Pattern
## 关联内容
- 关联 KB:
- 关联 Playbook:
- 关联 Case:
- 关联 TTP:
## 标签
- `#report`
- `#intel`
- `#ttp/...`
- `#campaign/...`

15
obsidian-vault/README.md Normal file
View File

@ -0,0 +1,15 @@
# Obsidian Vault
这个目录用于保存 Obsidian Vault 的推荐骨架。
原则:
- 只存高价值、可人工维护的沉淀
- 不存全量原始资料
- 不把 ticket 原文、报告全文直接塞进 Vault
建议优先建设:
- `01_Knowledge/`
- `02_Cases/`
- `05_Templates/`

14
pipeline/README.md Normal file
View File

@ -0,0 +1,14 @@
# Pipeline
这个目录用于保存知识源接入和数据清洗流程。
建议优先接入:
- 历史 case
- KB / Playbook
后续再逐步扩展:
- ticket system
- intel system
- 月报 / 报告

View File

@ -0,0 +1,41 @@
"""Batch-ingest mock case files and emit normalized case JSON documents."""
from __future__ import annotations
import json
from dataclasses import asdict
from pathlib import Path
from pipeline.transforms.normalize_case import load_and_normalize_case
def ingest_cases(input_dir: str | Path, output_dir: str | Path) -> list[Path]:
input_dir = Path(input_dir)
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
written: list[Path] = []
for src in sorted(input_dir.rglob("*.json")):
normalized = load_and_normalize_case(src)
dest = output_dir / f"{normalized.id}.json"
with dest.open("w", encoding="utf-8") as f:
json.dump(asdict(normalized), f, ensure_ascii=False, indent=2)
written.append(dest)
return written
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a directory of mock case JSON files.")
parser.add_argument("--input-dir", default="evaluation/datasets/mock_cases", help="Directory containing raw mock case files")
parser.add_argument("--output-dir", default="evaluation/datasets/normalized_cases", help="Directory to write normalized case files")
args = parser.parse_args()
written = ingest_cases(args.input_dir, args.output_dir)
print(f"normalized_cases={len(written)}")
for path in written:
print(path)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,41 @@
"""Batch-ingest mock KB/playbook files and emit normalized knowledge JSON documents."""
from __future__ import annotations
import json
from dataclasses import asdict
from pathlib import Path
from pipeline.transforms.normalize_kb import load_and_normalize_kb
def ingest_kb(input_dir: str | Path, output_dir: str | Path) -> list[Path]:
input_dir = Path(input_dir)
output_dir = Path(output_dir)
output_dir.mkdir(parents=True, exist_ok=True)
written: list[Path] = []
for src in sorted(input_dir.rglob("*.json")):
normalized = load_and_normalize_kb(src)
dest = output_dir / f"{normalized.id}.json"
with dest.open("w", encoding="utf-8") as f:
json.dump(asdict(normalized), f, ensure_ascii=False, indent=2)
written.append(dest)
return written
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a directory of mock KB/playbook JSON files.")
parser.add_argument("--input-dir", default="evaluation/datasets/mock_kb", help="Directory containing raw mock KB/playbook files")
parser.add_argument("--output-dir", default="evaluation/datasets/normalized_kb", help="Directory to write normalized KB/playbook files")
args = parser.parse_args()
written = ingest_kb(args.input_dir, args.output_dir)
print(f"normalized_kb={len(written)}")
for path in written:
print(path)
if __name__ == "__main__":
main()

View File

@ -0,0 +1,91 @@
"""Normalize raw mock SOC cases into a retrieval-friendly structure.
This module is intentionally small and deterministic so it can be used with
mock data before real connectors are available.
"""
from __future__ import annotations
import json
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Any
@dataclass
class NormalizedCase:
id: str
memory_type: str
scenario: str
title: str
abstract: str
verdict: str
severity: str
entities: dict[str, list[str]]
observables: dict[str, list[str]]
evidence: list[str]
patterns: list[str]
related_refs: dict[str, list[str]]
source_path: str
tags: list[str]
def _derive_patterns(raw_case: dict[str, Any]) -> list[str]:
"""Derive a small set of reusable patterns from the case payload."""
patterns: list[str] = []
verdict = raw_case.get("conclusion", {}).get("verdict")
if verdict:
patterns.append(f"verdict:{verdict}")
scenario = raw_case.get("scenario")
if scenario:
patterns.append(f"scenario:{scenario}")
alert_type = raw_case.get("alert_type")
if alert_type:
patterns.append(f"alert_type:{alert_type}")
return patterns
def normalize_case(raw_case: dict[str, Any], source_path: str = "") -> NormalizedCase:
"""Convert a raw case document into the internal normalized case model."""
conclusion = raw_case.get("conclusion", {})
return NormalizedCase(
id=raw_case["case_id"],
memory_type="case",
scenario=raw_case["scenario"],
title=raw_case["title"],
abstract=raw_case.get("summary", ""),
verdict=conclusion.get("verdict", raw_case.get("status", "unknown")),
severity=raw_case.get("severity", "unknown"),
entities=raw_case.get("entities", {}),
observables=raw_case.get("observables", {}),
evidence=raw_case.get("evidence", []),
patterns=_derive_patterns(raw_case),
related_refs=raw_case.get("related_refs", {}),
source_path=source_path,
tags=raw_case.get("tags", []),
)
def load_and_normalize_case(path: str | Path) -> NormalizedCase:
path = Path(path)
with path.open("r", encoding="utf-8") as f:
raw_case = json.load(f)
return normalize_case(raw_case, source_path=str(path))
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a mock SOC case JSON file.")
parser.add_argument("path", help="Path to a raw case JSON file")
args = parser.parse_args()
normalized = load_and_normalize_case(args.path)
print(json.dumps(asdict(normalized), ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,63 @@
"""Normalize raw mock KB/playbook documents into a retrieval-friendly structure."""
from __future__ import annotations
import json
from dataclasses import dataclass, asdict
from pathlib import Path
from typing import Any
@dataclass
class NormalizedKnowledge:
id: str
memory_type: str
doc_type: str
scenario: str
title: str
abstract: str
key_points: list[str]
investigation_guidance: list[str]
decision_points: list[str]
related_refs: dict[str, list[str]]
source_path: str
tags: list[str]
def normalize_kb(raw_doc: dict[str, Any], source_path: str = "") -> NormalizedKnowledge:
"""Convert a raw KB or playbook document into the normalized knowledge model."""
return NormalizedKnowledge(
id=raw_doc["doc_id"],
memory_type="knowledge",
doc_type=raw_doc["doc_type"],
scenario=raw_doc["scenario"],
title=raw_doc["title"],
abstract=raw_doc.get("summary", ""),
key_points=raw_doc.get("key_points", []),
investigation_guidance=raw_doc.get("investigation_guidance", []),
decision_points=raw_doc.get("decision_points", []),
related_refs=raw_doc.get("related_refs", {}),
source_path=source_path,
tags=raw_doc.get("tags", []),
)
def load_and_normalize_kb(path: str | Path) -> NormalizedKnowledge:
path = Path(path)
with path.open("r", encoding="utf-8") as f:
raw_doc = json.load(f)
return normalize_kb(raw_doc, source_path=str(path))
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Normalize a mock KB or playbook JSON file.")
parser.add_argument("path", help="Path to a raw KB/playbook JSON file")
args = parser.parse_args()
normalized = load_and_normalize_kb(args.path)
print(json.dumps(asdict(normalized), ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

33
pyproject.toml Normal file
View File

@ -0,0 +1,33 @@
[project]
name = "memory-gateway"
version = "0.1.0"
description = "基于 OpenViking 的统一记忆入口 MCP Server"
readme = "README.md"
requires-python = ">=3.10"
dependencies = [
"fastapi>=0.109.0",
"sse-starlette>=2.0.0",
"mcp[cli]>=1.1.0",
"httpx>=0.26.0",
"pydantic>=2.5.0",
"pyyaml>=6.0",
"uvicorn>=0.27.0",
"tenacity>=8.2.0",
]
[project.optional-dependencies]
dev = [
"pytest>=8.0.0",
"pytest-asyncio>=0.23.0",
"ruff>=0.1.0",
]
[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"
[tool.uv]
dev-dependencies = []
[tool.ruff]
target-version = "py310"

17
skills/README.md Normal file
View File

@ -0,0 +1,17 @@
# Skills
建议优先落地的 skills
- `ingest_skill`
- `extract_memory_skill`
- `classify_memory_skill`
- `retrieve_context_skill`
- `summarize_case_skill`
- `commit_memory_skill`
- `prune_memory_skill`
POC 第一阶段建议先做:
- `retrieve_context_skill`
- `summarize_case_skill`
- `commit_memory_skill`

View File

@ -0,0 +1,36 @@
# commit_memory_skill
这个 skill 负责把标准化后的高价值记忆写回 OpenViking。
## 当前阶段职责
第一阶段优先把标准化后的 `case``knowledge` 以 resource 形式写入 OpenViking。
原因:
- 结构化数据适合用 URI 明确组织
- 相比通过会话提交 `add_memory`resource 写入更可控
- 便于后续按 namespace 和 URI 组织 case / knowledge / report
## 第一阶段输入
- 标准化后的 case JSON
- 标准化后的 KB / Playbook JSON
## 第一阶段输出
- OpenViking resource 写入结果
- 统一 URI 组织的资源
## 默认 URI 约定
- case: `viking://soc/case/<scenario>/<id>`
- knowledge: `viking://soc/knowledge/<doc_type>/<id>`
## 后续扩展
后续可以在 resource 写入稳定后,再增加:
- 高价值 summary 写入 `memory`
- EverMemOS 提炼结果回灌
- Obsidian / OpenViking 双写策略

View File

@ -0,0 +1,29 @@
# commit_memory_skill
## 用途
把已经过标准化和筛选的 case / knowledge 内容写入 OpenViking。
## 当前默认策略
第一阶段只做 resource 写入,不强行做复杂 memory 演化。
- `case` -> `viking://soc/case/<scenario>/<id>`
- `knowledge` -> `viking://soc/knowledge/<doc_type>/<id>`
## 输入
- 标准化后的 case / knowledge JSON 文件
- OpenViking 配置URL / API Key
## 输出
- 写入结果
- 目标 URI
- 成功 / 失败状态
## 成功标准
- 可以把本地标准化样本成功写入 OpenViking
- URI 组织符合 namespace 设计
- 后续可以被检索和引用

View File

@ -0,0 +1,89 @@
"""Commit normalized SOC memory items to OpenViking as structured resources."""
from __future__ import annotations
import argparse
import asyncio
import json
from pathlib import Path
from typing import Any
from memory_gateway.openviking_client import OpenVikingClient
def build_resource_uri(item: dict[str, Any]) -> str:
memory_type = item.get("memory_type")
item_id = item["id"]
if memory_type == "case":
scenario = item.get("scenario", "general")
return f"viking://resources/soc-memory-poc/case/{scenario}/{item_id}.json"
if memory_type == "knowledge":
doc_type = item.get("doc_type", "general")
return f"viking://resources/soc-memory-poc/knowledge/{doc_type}/{item_id}.json"
raise ValueError(f"Unsupported memory_type for commit: {memory_type}")
def load_item(path: str | Path) -> dict[str, Any]:
path = Path(path)
with path.open("r", encoding="utf-8") as f:
return json.load(f)
async def commit_file(path: str | Path, client: OpenVikingClient) -> dict[str, Any]:
item = load_item(path)
uri = build_resource_uri(item)
result = await client.add_resource(
uri=uri,
content=json.dumps(item, ensure_ascii=False, indent=2),
resource_type="json",
wait=False,
)
return {
"path": str(path),
"uri": uri,
"result": result,
}
async def commit_directory(directory: str | Path, client: OpenVikingClient, limit: int | None = None) -> list[dict[str, Any]]:
directory = Path(directory)
paths = sorted(directory.rglob("*.json"))
if limit is not None:
paths = paths[:limit]
results: list[dict[str, Any]] = []
for path in paths:
results.append(await commit_file(path, client))
return results
async def main_async(args: argparse.Namespace) -> None:
client = OpenVikingClient()
try:
if args.path:
result = await commit_file(args.path, client)
print(json.dumps(result, ensure_ascii=False, indent=2))
else:
results = await commit_directory(args.directory, client, limit=args.limit)
print(json.dumps(results, ensure_ascii=False, indent=2))
finally:
await client.close()
def main() -> None:
parser = argparse.ArgumentParser(description="Commit normalized SOC items to OpenViking.")
parser.add_argument("--path", help="Single normalized JSON file to commit")
parser.add_argument("--directory", help="Directory of normalized JSON files to commit")
parser.add_argument("--limit", type=int, default=None, help="Optional limit for directory commits")
args = parser.parse_args()
if not args.path and not args.directory:
parser.error("Either --path or --directory is required")
asyncio.run(main_async(args))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,42 @@
# retrieve_context_skill
这个 skill 用于根据当前 case 的关键信号,从 OpenViking 或 mock dataset 中召回最相关的上下文。
## 目标
输入当前 case 的场景、告警类型、IOC、描述输出一组排序后的相关内容
- 相似历史 case
- 相关 KB
- 相关 Playbook
- 关键 decision points
## 第一阶段输入
- `scenario`
- `alert_type`
- `summary`
- `entities`
- `observables`
- `top_k`
## 第一阶段输出
- `matched_cases`
- `matched_knowledge`
- `decision_points`
- `next_actions`
## 第一阶段检索策略
1. 先按 `scenario` 过滤
2. 再按 `alert_type`、IOC、关键词做匹配
3. 再按 evidence / tags 做轻量重排序
4. 输出 top-k
## 第一阶段不做
- 向量检索
- 图检索
- 个性化排序
- 多源复杂重排

View File

@ -0,0 +1,39 @@
# retrieve_context_skill
## 用途
在 SOC case 研判时,为 agent 检索最相关的历史 case 和知识上下文。
## 输入
- `scenario`: 场景,如 `phishing``o365_suspicious_login`
- `alert_type`: 告警类型
- `summary`: 当前 case 摘要
- `entities`: 用户、主机、邮箱等
- `observables`: 域名、IP、URL、Hash 等
- `top_k`: 期望返回条数
## 输出
- 相关历史 case 列表
- 相关 KB / Playbook 列表
- 关键 evidence / decision points
- 推荐下一步调查动作
## 默认检索顺序
1. `session/<session_id>`
2. `soc/case`
3. `soc/knowledge`
4. `agent/<agent_id>`
5. `user/<user_id>`
## Mock 阶段工作方式
在没有真实数据和完整 OpenViking 检索链路时,先使用 `evaluation/datasets/mock_cases/``evaluation/datasets/mock_kb/` 做本地检索验证。
## 成功标准
- 钓鱼 case 能召回钓鱼 playbook 和相似 phishing case
- O365 异常登录 case 能召回登录异常 KB 和相似 case
- 返回结果对人工 reviewer 看起来是“有帮助的上下文”,而不是泛资料堆积

View File

@ -0,0 +1,216 @@
"""Retrieval entrypoint for SOC Memory POC.
Supports two modes:
- local: retrieve from normalized mock datasets
- openviking: retrieve from OpenViking resource namespaces and filter results
"""
from __future__ import annotations
import asyncio
import json
from dataclasses import asdict, dataclass
from pathlib import Path
from typing import Any
from memory_gateway.openviking_client import OpenVikingClient
CASE_URI_PREFIX = "viking://resources/soc-memory-poc/case"
KNOWLEDGE_URI_PREFIX = "viking://resources/soc-memory-poc/knowledge"
def _load_json_dir(path: str | Path) -> list[dict[str, Any]]:
path = Path(path)
items: list[dict[str, Any]] = []
for file in sorted(path.rglob("*.json")):
with file.open("r", encoding="utf-8") as f:
items.append(json.load(f))
return items
@dataclass
class RetrievalQuery:
scenario: str
alert_type: str = ""
summary: str = ""
entities: dict[str, list[str]] | None = None
observables: dict[str, list[str]] | None = None
top_k: int = 3
def _flatten_values(data: dict[str, list[str]] | None) -> set[str]:
if not data:
return set()
values: set[str] = set()
for items in data.values():
values.update(str(item).lower() for item in items)
return values
def _score_case(query: RetrievalQuery, item: dict[str, Any]) -> int:
score = 0
if item.get("scenario") == query.scenario:
score += 50
for pattern in item.get("patterns", []):
if query.alert_type and pattern == f"alert_type:{query.alert_type}":
score += 20
query_observables = _flatten_values(query.observables)
item_observables = _flatten_values(item.get("observables"))
score += 8 * len(query_observables & item_observables)
summary = query.summary.lower()
haystacks = [item.get("title", "").lower(), item.get("abstract", "").lower()]
for token in [t for t in summary.split() if len(t) > 4]:
if any(token in text for text in haystacks):
score += 2
return score
def _score_knowledge(query: RetrievalQuery, item: dict[str, Any]) -> int:
score = 0
if item.get("scenario") == query.scenario:
score += 40
title = item.get("title", "").lower()
abstract = item.get("abstract", "").lower()
for token in [t for t in query.summary.lower().split() if len(t) > 4]:
if token in title or token in abstract:
score += 2
if query.alert_type and query.alert_type in " ".join(item.get("related_refs", {}).get("cases", [])).lower():
score += 5
return score
def retrieve_context_local(
query: RetrievalQuery,
cases_dir: str | Path = "evaluation/datasets/normalized_cases",
knowledge_dir: str | Path = "evaluation/datasets/normalized_kb",
) -> dict[str, Any]:
cases = _load_json_dir(cases_dir)
knowledge = _load_json_dir(knowledge_dir)
ranked_cases = sorted(
({"score": _score_case(query, item), "item": item} for item in cases),
key=lambda x: x["score"],
reverse=True,
)
ranked_knowledge = sorted(
({"score": _score_knowledge(query, item), "item": item} for item in knowledge),
key=lambda x: x["score"],
reverse=True,
)
matched_cases = [entry for entry in ranked_cases if entry["score"] > 0][: query.top_k]
matched_knowledge = [entry for entry in ranked_knowledge if entry["score"] > 0][: query.top_k]
decision_points: list[str] = []
next_actions: list[str] = []
for entry in matched_knowledge:
item = entry["item"]
decision_points.extend(item.get("decision_points", []))
next_actions.extend(item.get("investigation_guidance", []))
return {
"backend": "local",
"query": asdict(query),
"matched_cases": matched_cases,
"matched_knowledge": matched_knowledge,
"decision_points": decision_points[: query.top_k],
"next_actions": next_actions[: query.top_k],
}
def _canonicalize_resource_uri(uri: str) -> str:
if ".json/" in uri:
return uri.split(".json/", 1)[0] + ".json"
return uri
def _query_text(query: RetrievalQuery) -> str:
parts = [query.scenario, query.alert_type, query.summary]
parts.extend(sorted(_flatten_values(query.observables)))
return " ".join(part for part in parts if part).strip()
def _dedupe_openviking_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
deduped: dict[str, dict[str, Any]] = {}
for item in results:
uri = item.get("uri") or ""
if not uri.startswith(prefix):
continue
canonical_uri = _canonicalize_resource_uri(uri)
score = item.get("score") or 0
existing = deduped.get(canonical_uri)
payload = {
"uri": canonical_uri,
"abstract": item.get("abstract", ""),
"score": score,
"context_type": item.get("context_type"),
"source_uri": uri,
}
if existing is None or score > existing.get("score", 0):
deduped[canonical_uri] = payload
return sorted(deduped.values(), key=lambda x: x["score"], reverse=True)
async def retrieve_context_openviking(
query: RetrievalQuery,
case_uri: str = CASE_URI_PREFIX,
knowledge_uri: str = KNOWLEDGE_URI_PREFIX,
) -> dict[str, Any]:
client = OpenVikingClient()
try:
query_text = _query_text(query)
case_result = await client.search(query=query_text, uri=case_uri, limit=max(query.top_k * 5, 10))
knowledge_result = await client.search(query=query_text, uri=knowledge_uri, limit=max(query.top_k * 5, 10))
matched_cases = _dedupe_openviking_results(case_result.results, case_uri)[: query.top_k]
matched_knowledge = _dedupe_openviking_results(knowledge_result.results, knowledge_uri)[: query.top_k]
return {
"backend": "openviking",
"query": asdict(query),
"matched_cases": matched_cases,
"matched_knowledge": matched_knowledge,
"decision_points": [],
"next_actions": [],
}
finally:
await client.close()
def main() -> None:
import argparse
parser = argparse.ArgumentParser(description="Retrieve SOC context from local datasets or OpenViking.")
parser.add_argument("--backend", choices=["local", "openviking"], default="local", help="Retrieval backend")
parser.add_argument("--scenario", required=True, help="Scenario, e.g. phishing or o365_suspicious_login")
parser.add_argument("--alert-type", default="", help="Alert type")
parser.add_argument("--summary", default="", help="Short case summary")
parser.add_argument("--top-k", type=int, default=3, help="Number of results to return")
parser.add_argument("--cases-dir", default="evaluation/datasets/normalized_cases", help="Normalized case dataset directory")
parser.add_argument("--knowledge-dir", default="evaluation/datasets/normalized_kb", help="Normalized knowledge dataset directory")
parser.add_argument("--case-uri", default=CASE_URI_PREFIX, help="OpenViking case URI prefix")
parser.add_argument("--knowledge-uri", default=KNOWLEDGE_URI_PREFIX, help="OpenViking knowledge URI prefix")
args = parser.parse_args()
query = RetrievalQuery(
scenario=args.scenario,
alert_type=args.alert_type,
summary=args.summary,
top_k=args.top_k,
)
if args.backend == "openviking":
result = asyncio.run(retrieve_context_openviking(query, case_uri=args.case_uri, knowledge_uri=args.knowledge_uri))
else:
result = retrieve_context_local(query, cases_dir=args.cases_dir, knowledge_dir=args.knowledge_dir)
print(json.dumps(result, ensure_ascii=False, indent=2))
if __name__ == "__main__":
main()

View File

@ -0,0 +1,17 @@
# summarize_case_skill
This skill turns a normalized SOC case record into a reusable Obsidian case note.
Current scope:
- input: normalized case JSON from `evaluation/datasets/normalized_cases/`
- output: markdown case note under `obsidian-vault/02_Cases/`
- goal: produce a clean analyst-facing note, not a raw process dump
Typical usage:
```bash
source /home/tom/OpenViking/.venv/bin/activate
PYTHONPATH=/home/tom/soc_memory_poc python /home/tom/soc_memory_poc/skills/summarize_case_skill/generate_case_note.py \
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json \
--output-dir /home/tom/soc_memory_poc/obsidian-vault/02_Cases
```

View File

@ -0,0 +1,21 @@
# summarize_case_skill
## Purpose
Summarize one normalized SOC case into a high-quality Obsidian case note that can be reviewed and maintained by analysts.
## Inputs
- A normalized case JSON document
- Optional output directory for Obsidian notes
## Outputs
- One markdown case note per case
- Stable structure aligned with the vault template
## Guardrails
- Do not dump raw logs or full tool traces
- Keep only reusable evidence, conclusions, and response guidance
- Prefer linked references to playbooks, KBs, and related cases
- Preserve case identifiers and observable values exactly
## Current implementation
Use `generate_case_note.py` to render a local markdown note from a normalized case.

View File

@ -0,0 +1,346 @@
"""Generate an Obsidian case note from a normalized SOC case JSON file."""
from __future__ import annotations
import argparse
import asyncio
import json
from pathlib import Path
from typing import Any
from skills.retrieve_context_skill.retrieve_context import RetrievalQuery, retrieve_context_openviking
def _load_case(path: str | Path) -> dict[str, Any]:
with Path(path).open("r", encoding="utf-8") as f:
return json.load(f)
def _extract_alert_type(patterns: list[str]) -> str:
for pattern in patterns:
if pattern.startswith("alert_type:"):
return pattern.split(":", 1)[1]
return "unknown"
def _verdict_label(verdict: str) -> str:
mapping = {
"true_positive": "真报",
"false_positive": "误报",
"suspicious": "可疑待定",
}
return mapping.get(verdict, verdict or "未知")
def _join_values(values: list[str]) -> str:
return ", ".join(values) if values else ""
def _bullet_lines(values: list[str], default: str = "- 无") -> str:
if not values:
return default
return "\n".join(f"- {value}" for value in values)
def _wikilinks(values: list[str]) -> str:
if not values:
return ""
return ", ".join(f"[[{value}]]" for value in values)
def _uri_to_id(uri: str) -> str:
name = uri.rstrip("/").rsplit("/", 1)[-1]
if name.endswith(".json"):
name = name[:-5]
return name
def _derive_process_summary(item: dict[str, Any]) -> list[str]:
steps: list[str] = []
if item.get("abstract"):
steps.append(f"确认告警场景与核心风险:{item['abstract']}")
if item.get("evidence"):
steps.append(f"提取关键证据并交叉验证:{item['evidence'][0]}")
related = item.get("related_refs", {})
if related.get("playbooks") or related.get("kb"):
steps.append("对照关联 playbook / KB 复核告警模式与处置路径。")
if item.get("verdict"):
steps.append(f"基于关键证据与场景模式完成结论判定:{_verdict_label(item['verdict'])}")
return steps[:4]
def _derive_disposition(item: dict[str, Any]) -> list[str]:
verdict = item.get("verdict", "")
evidence = item.get("evidence", [])
lines: list[str] = []
if verdict:
lines.append(f"结论为{_verdict_label(verdict)}")
if evidence:
lines.append(f"最关键依据:{evidence[0]}")
if len(evidence) > 1:
lines.append(f"补充依据:{evidence[1]}")
return lines
def _derive_actions(item: dict[str, Any]) -> list[str]:
scenario = item.get("scenario", "")
verdict = item.get("verdict", "")
actions: list[str] = []
if scenario == "phishing":
actions.extend([
"隔离相同主题、发件人或 URL 的邮件样本。",
"核查用户是否点击或提交凭据,并按需执行凭据重置。",
])
elif scenario == "o365_suspicious_login":
actions.extend([
"复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。",
"若存在账号接管迹象,立即执行会话失效和凭据重置。",
])
else:
actions.append("结合关联 playbook 执行后续处置。")
if verdict == "false_positive":
actions = ["记录误报原因,并更新检测例外或抑制条件。"]
return actions
def _derive_reusable_patterns(item: dict[str, Any]) -> tuple[list[str], list[str], list[str]]:
patterns = item.get("patterns", [])
tags = item.get("tags", [])
hit_patterns = [pattern for pattern in patterns if not pattern.startswith("verdict:")]
false_positive_traits = []
variants = []
if item.get("verdict") == "false_positive":
false_positive_traits.append("本案最终确认为误报,可用于补充抑制条件。")
if tags:
variants.append("相关标签:" + ", ".join(tags))
return hit_patterns or [""], false_positive_traits or [""], variants or [""]
async def _fetch_openviking_recommendations(item: dict[str, Any], top_k: int = 3) -> dict[str, list[dict[str, Any]]]:
query = RetrievalQuery(
scenario=item.get("scenario", "general"),
alert_type=_extract_alert_type(item.get("patterns", [])),
summary=item.get("abstract", ""),
observables=item.get("observables"),
top_k=top_k + 1,
)
result = await retrieve_context_openviking(query)
case_entries: list[dict[str, Any]] = []
for entry in result.get("matched_cases", []):
candidate_id = _uri_to_id(entry.get("uri", ""))
if candidate_id == item.get("id"):
continue
case_entries.append(
{
"id": candidate_id,
"score": round(float(entry.get("score") or 0), 3),
"abstract": entry.get("abstract", ""),
}
)
if len(case_entries) >= top_k:
break
knowledge_entries: list[dict[str, Any]] = []
for entry in result.get("matched_knowledge", []):
knowledge_entries.append(
{
"id": _uri_to_id(entry.get("uri", "")),
"score": round(float(entry.get("score") or 0), 3),
"abstract": entry.get("abstract", ""),
}
)
if len(knowledge_entries) >= top_k:
break
return {
"cases": case_entries,
"knowledge": knowledge_entries,
}
def _merge_unique(primary: list[str], secondary: list[str]) -> list[str]:
merged: list[str] = []
for value in primary + secondary:
if value and value not in merged:
merged.append(value)
return merged
def _recommendation_lines(entries: list[dict[str, Any]], prefix: str) -> list[str]:
lines: list[str] = []
for entry in entries:
abstract = entry.get("abstract", "")
abstract = abstract[:140] + "..." if len(abstract) > 140 else abstract
lines.append(f"[[{entry['id']}]] ({prefix} score={entry['score']}) {abstract}")
return lines
def render_case_note(item: dict[str, Any], recommendations: dict[str, list[dict[str, Any]]] | None = None) -> str:
case_id = item["id"]
title = item.get("title", case_id)
alert_type = _extract_alert_type(item.get("patterns", []))
severity = item.get("severity", "unknown")
verdict = _verdict_label(item.get("verdict", ""))
entities = item.get("entities", {})
observables = item.get("observables", {})
related = item.get("related_refs", {})
recommendations = recommendations or {"cases": [], "knowledge": []}
recommended_cases = [entry["id"] for entry in recommendations.get("cases", [])]
recommended_knowledge = [entry["id"] for entry in recommendations.get("knowledge", [])]
merged_cases = _merge_unique(related.get("cases", []), recommended_cases)
playbooks = related.get("playbooks", [])
kb_items = related.get("kb", [])
for knowledge_id in recommended_knowledge:
if knowledge_id.startswith("PB-"):
playbooks = _merge_unique(playbooks, [knowledge_id])
else:
kb_items = _merge_unique(kb_items, [knowledge_id])
process_summary = _derive_process_summary(item)
disposition = _derive_disposition(item)
actions = _derive_actions(item)
hit_patterns, false_positive_traits, variants = _derive_reusable_patterns(item)
tags = ["#case", f"#scenario/{item.get('scenario', 'general')}", f"#alert/{alert_type}"]
if item.get("verdict"):
tags.append(f"#verdict/{item['verdict'].replace('_', '-')}")
tags.extend(f"#{tag}" for tag in item.get("tags", []))
recommendation_case_lines = _recommendation_lines(recommendations.get("cases", []), "case")
recommendation_knowledge_lines = _recommendation_lines(recommendations.get("knowledge", []), "knowledge")
lines = [
"---",
f"case_id: {case_id}",
f"scenario: {item.get('scenario', 'general')}",
f"alert_type: {alert_type}",
f"severity: {severity}",
f"verdict: {item.get('verdict', 'unknown')}",
"source: soc-memory-poc",
f"openviking_enriched: {'true' if recommendation_case_lines or recommendation_knowledge_lines else 'false'}",
"---",
"",
f"# {case_id} {title}",
"",
"## 基本信息",
"",
f"- Case ID: {case_id}",
f"- 标题: {title}",
f"- 告警类型: {alert_type}",
f"- 来源系统: SOC Memory POC Mock Dataset",
f"- 时间范围: 待补充",
f"- 研判人 / Agent: AI Agent Draft",
f"- 最终结论: {verdict}",
f"- 严重等级: {severity}",
"",
"## 告警摘要",
"",
item.get("abstract", ""),
"",
"## 关键实体",
"",
f"- 用户: {_join_values(entities.get('users', []))}",
f"- 主机: {_join_values(entities.get('hosts', []))}",
f"- 邮箱: {_join_values(entities.get('mailboxes', []))}",
f"- IP: {_join_values(observables.get('ips', []))}",
f"- 域名: {_join_values(observables.get('domains', []))}",
f"- 文件 Hash: {_join_values(observables.get('hashes', []))}",
f"- 其他 IOC: {_join_values(observables.get('urls', []) + observables.get('sender_emails', []))}",
"",
"## 关键证据",
"",
_bullet_lines(item.get("evidence", [])),
"",
"## 研判过程摘要",
"",
"\n".join(f"{index}. {step}" for index, step in enumerate(process_summary, start=1)),
"",
"## 结论依据",
"",
_bullet_lines(disposition),
"",
"## 处置建议",
"",
_bullet_lines(actions),
"",
"## 可复用模式",
"",
f"- 命中模式: {_join_values(hit_patterns)}",
f"- 误报特征: {_join_values(false_positive_traits)}",
f"- 需关注的变体: {_join_values(variants)}",
"",
"## 关联知识",
"",
f"- 关联 Playbook: {_wikilinks(playbooks)}",
f"- 关联 KB: {_wikilinks(kb_items)}",
f"- 关联历史 Case: {_wikilinks(merged_cases)}",
f"- 关联实体: {_wikilinks(entities.get('users', []) + entities.get('hosts', []))}",
"",
"## 自动关联推荐",
"",
"### 推荐历史 Case",
"",
_bullet_lines(recommendation_case_lines),
"",
"### 推荐知识条目",
"",
_bullet_lines(recommendation_knowledge_lines),
"",
"## Lessons Learned",
"",
"- 本案可沉淀为后续同类告警的快速判定参考。",
"- 若后续出现相同 lure、同类登录模式或相同关键证据应优先联想本案与关联知识。",
"",
"## 标签",
"",
_bullet_lines(tags),
"",
]
return "\n".join(lines)
def build_output_path(item: dict[str, Any], output_dir: str | Path) -> Path:
scenario = item.get("scenario", "general")
case_id = item["id"]
safe_title = item.get("title", case_id).replace("/", "-")
return Path(output_dir) / scenario / f"{case_id} - {safe_title}.md"
async def generate_case_note_async(
input_path: str | Path,
output_dir: str | Path,
enrich_from_openviking: bool = False,
top_k: int = 3,
) -> Path:
item = _load_case(input_path)
recommendations: dict[str, list[dict[str, Any]]] | None = None
if enrich_from_openviking:
recommendations = await _fetch_openviking_recommendations(item, top_k=top_k)
output_path = build_output_path(item, output_dir)
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(render_case_note(item, recommendations=recommendations), encoding="utf-8")
return output_path
def main() -> None:
parser = argparse.ArgumentParser(description="Generate an Obsidian case note from a normalized case JSON file.")
parser.add_argument("--input", required=True, help="Normalized case JSON path")
parser.add_argument("--output-dir", default="obsidian-vault/02_Cases", help="Obsidian cases output directory")
parser.add_argument("--enrich-from-openviking", action="store_true", help="Retrieve related cases and knowledge from OpenViking")
parser.add_argument("--top-k", type=int, default=3, help="Number of OpenViking recommendations per type")
args = parser.parse_args()
output_path = asyncio.run(
generate_case_note_async(
args.input,
args.output_dir,
enrich_from_openviking=args.enrich_from_openviking,
top_k=args.top_k,
)
)
print(output_path)
if __name__ == "__main__":
main()

170
tests/test_server.py Normal file
View File

@ -0,0 +1,170 @@
import sys
import types
from fastapi.responses import StreamingResponse
from fastapi.testclient import TestClient
def install_test_stubs() -> None:
if "mcp.server" not in sys.modules:
mcp_module = types.ModuleType("mcp")
mcp_server_module = types.ModuleType("mcp.server")
mcp_types_module = types.ModuleType("mcp.types")
class Server:
def __init__(self, name):
self.name = name
def list_tools(self):
def decorator(func):
return func
return decorator
def call_tool(self):
def decorator(func):
return func
return decorator
class Tool:
def __init__(self, name, description, inputSchema):
self.name = name
self.description = description
self.inputSchema = inputSchema
class TextContent:
def __init__(self, type, text):
self.type = type
self.text = text
def model_dump(self):
return {"type": self.type, "text": self.text}
mcp_server_module.Server = Server
mcp_types_module.Tool = Tool
mcp_types_module.TextContent = TextContent
sys.modules["mcp"] = mcp_module
sys.modules["mcp.server"] = mcp_server_module
sys.modules["mcp.types"] = mcp_types_module
if "sse_starlette" not in sys.modules:
sse_module = types.ModuleType("sse_starlette")
class EventSourceResponse(StreamingResponse):
def __init__(self, content, *args, **kwargs):
super().__init__(content, media_type="text/event-stream", *args, **kwargs)
sse_module.EventSourceResponse = EventSourceResponse
sys.modules["sse_starlette"] = sse_module
install_test_stubs()
from memory_gateway.server import app
from memory_gateway.types import Config, SearchResult, ServerConfig
class FakeOVClient:
async def health_check(self):
return {"status": "ok", "backend": "fake"}
async def search(self, query, namespace=None, limit=None, uri=None):
return SearchResult(
results=[
{
"uri": "viking://soc/test",
"abstract": query,
"score": 1.0,
"context_type": "memory",
}
],
total=1,
)
async def add_memory(self, content, namespace=None, memory_type="general"):
return {
"status": "ok",
"content": content,
"namespace": namespace,
"memory_type": memory_type,
}
async def add_resource(self, uri, content, resource_type="text"):
return {
"status": "ok",
"uri": uri,
"content": content,
"resource_type": resource_type,
}
async def list_memories(self, namespace=None, memory_type=None, limit=None):
return []
async def list_resources(self, namespace=None, limit=None):
return []
async def fake_get_openviking_client():
return FakeOVClient()
def build_headers(api_key: str | None):
return {"x-api-key": api_key} if api_key is not None else {}
def test_health_requires_api_key(monkeypatch):
monkeypatch.setattr(
"memory_gateway.server.get_config",
lambda: Config(server=ServerConfig(api_key="secret")),
)
monkeypatch.setattr(
"memory_gateway.server.get_openviking_client",
fake_get_openviking_client,
)
with TestClient(app) as client:
response = client.get("/health")
assert response.status_code == 401
response = client.get("/health", headers=build_headers("secret"))
assert response.status_code == 200
assert response.json()["openviking"]["status"] == "ok"
def test_mcp_rpc_lists_tools_with_api_key(monkeypatch):
monkeypatch.setattr(
"memory_gateway.server.get_config",
lambda: Config(server=ServerConfig(api_key="secret")),
)
monkeypatch.setattr(
"memory_gateway.server.get_openviking_client",
fake_get_openviking_client,
)
with TestClient(app) as client:
response = client.post(
"/mcp/rpc",
json={"jsonrpc": "2.0", "id": 1, "method": "tools/list", "params": {}},
headers=build_headers("secret"),
)
assert response.status_code == 200
payload = response.json()
assert payload["jsonrpc"] == "2.0"
assert len(payload["result"]["tools"]) == 6
def test_search_passes_through_gateway(monkeypatch):
monkeypatch.setattr(
"memory_gateway.server.get_config",
lambda: Config(server=ServerConfig(api_key="")),
)
monkeypatch.setattr(
"memory_gateway.server.get_openviking_client",
fake_get_openviking_client,
)
with TestClient(app) as client:
response = client.post("/api/search", json={"query": "phishing"})
assert response.status_code == 200
payload = response.json()
assert payload["total"] == 1
assert payload["results"][0]["abstract"] == "phishing"

4
uvicorn.yaml Normal file
View File

@ -0,0 +1,4 @@
host: "0.0.0.0"
port: 1934
reload: true
log_level: "info"