Initial SOC memory POC implementation
This commit is contained in:
190
docs/architecture.md
Normal file
190
docs/architecture.md
Normal file
@ -0,0 +1,190 @@
|
||||
# Architecture
|
||||
|
||||
## 整体目标
|
||||
|
||||
构建一个面向 SOC case 研判辅助的记忆系统 POC,用于提升 AI agent 在以下环节的效果:
|
||||
|
||||
- 告警研判
|
||||
- 历史 case 检索
|
||||
- 上下文补全
|
||||
- 结论生成
|
||||
- 高价值记忆沉淀
|
||||
|
||||
## 总体架构图
|
||||
|
||||
```text
|
||||
┌────────────────────────────┐
|
||||
│ 知识源 / 数据源 │
|
||||
│ KB / Playbook / 月报 / 报告 │
|
||||
│ Ticket / Intel / 历史 Case │
|
||||
└─────────────┬──────────────┘
|
||||
│
|
||||
│ ingest / normalize
|
||||
▼
|
||||
┌──────────────────────────────┐
|
||||
│ Pipeline 层 │
|
||||
│ connectors / transforms / jobs│
|
||||
└─────────────┬────────────────┘
|
||||
│
|
||||
│ extracted inputs
|
||||
▼
|
||||
┌──────────────────────────────┐
|
||||
│ Skills 层 │
|
||||
│ ingest / classify / retrieve │
|
||||
│ summarize / commit / prune │
|
||||
└───────┬─────────────┬────────┘
|
||||
│ │
|
||||
query/write │ │ write notes / long-term
|
||||
▼ ▼
|
||||
┌────────────────────┐ ┌────────────────────┐
|
||||
│ Memory Gateway │ │ Obsidian Vault │
|
||||
│ MCP / REST / Auth │ │ Human-maintained │
|
||||
└─────────┬──────────┘ └────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌────────────────────┐
|
||||
│ OpenViking │
|
||||
│ context / memory │
|
||||
│ resources / skills │
|
||||
└─────────┬──────────┘
|
||||
│
|
||||
┌─────────┴──────────┐
|
||||
▼ ▼
|
||||
┌──────────────────┐ ┌──────────────────┐
|
||||
│ Session / Online │ │ EverMemOS │
|
||||
│ retrieval │ │ long-term memory │
|
||||
└──────────────────┘ └──────────────────┘
|
||||
▲
|
||||
│
|
||||
▼
|
||||
┌────────────────────┐
|
||||
│ AI Agent / Harness │
|
||||
│ Nanobot / Hermes │
|
||||
│ OpenClaw / others │
|
||||
└────────────────────┘
|
||||
```
|
||||
|
||||
## 分层说明
|
||||
|
||||
### 1. 知识源层
|
||||
|
||||
外部系统和已有资料:
|
||||
|
||||
- KB
|
||||
- Playbook
|
||||
- 月报
|
||||
- 报告
|
||||
- Ticket system
|
||||
- 情报系统
|
||||
- 历史 case
|
||||
|
||||
特点:
|
||||
|
||||
- 来源多样
|
||||
- 结构不一致
|
||||
- 不能直接全部当记忆使用
|
||||
|
||||
### 2. Pipeline 层
|
||||
|
||||
负责:
|
||||
|
||||
- 数据接入
|
||||
- 格式标准化
|
||||
- 提取元数据
|
||||
- 过滤噪声
|
||||
|
||||
边界:
|
||||
|
||||
- 不做最终检索
|
||||
- 不做最终长期沉淀判断
|
||||
|
||||
### 3. Skills 层
|
||||
|
||||
负责:
|
||||
|
||||
- 抽取高价值记忆
|
||||
- 分类为 knowledge / case / process / session
|
||||
- 检索相关上下文
|
||||
- 生成 case 总结
|
||||
- 写回 OpenViking / Obsidian / EverMemOS
|
||||
|
||||
这是整套系统的流程编排层。
|
||||
|
||||
### 4. Memory Gateway 层
|
||||
|
||||
负责:
|
||||
|
||||
- 给 AI agent 提供统一入口
|
||||
- 屏蔽 OpenViking 细节
|
||||
- 提供 MCP / REST 接口
|
||||
- 处理鉴权和协议兼容
|
||||
|
||||
### 5. OpenViking 统一上下文层
|
||||
|
||||
负责:
|
||||
|
||||
- 保存 memory
|
||||
- 保存 resources
|
||||
- 组织 skills
|
||||
- 按 namespace 管理不同类型上下文
|
||||
|
||||
### 6. Obsidian 层
|
||||
|
||||
负责人工可维护的知识沉淀:
|
||||
|
||||
- 高质量 case note
|
||||
- playbook
|
||||
- 月报 / 报告摘要
|
||||
- 关键实体说明
|
||||
|
||||
### 7. EverMemOS 层
|
||||
|
||||
负责后台长期记忆整理:
|
||||
|
||||
- episode -> long-term memory
|
||||
- 去重
|
||||
- 合并
|
||||
- 更新
|
||||
- 衰减
|
||||
|
||||
## 多 Agent 共享方式
|
||||
|
||||
多 agent 不直接彼此共享临时内存,而是通过统一上下文层协作:
|
||||
|
||||
- 公共稳定知识走 `soc/knowledge`
|
||||
- 历史案例走 `soc/case`
|
||||
- 当前任务走 `session/<session_id>`
|
||||
- agent 私有偏好走 `agent/<agent_id>`
|
||||
|
||||
这样可以做到:
|
||||
|
||||
- 公共知识共享
|
||||
- 当前会话隔离
|
||||
- 不同 agent 框架可复用同一体系
|
||||
|
||||
## 检索质量控制原则
|
||||
|
||||
为避免“所有东西全塞进去”导致检索质量下降,必须坚持:
|
||||
|
||||
- 原始资料不直接全部进入长期记忆
|
||||
- 只保留高价值摘要、模式、结论、证据
|
||||
- session / process memory 默认短期保留
|
||||
- 历史 case 和 playbook 优先于泛知识
|
||||
- Obsidian 只放人工维护内容,不放全量原文
|
||||
|
||||
## 第一阶段默认方案
|
||||
|
||||
第一阶段推荐组合:
|
||||
|
||||
- OpenViking:统一 context / memory 层
|
||||
- Memory Gateway:统一访问入口
|
||||
- Skills:检索、总结、沉淀
|
||||
- Obsidian:人工可维护知识沉淀
|
||||
- EverMemOS:后台长期记忆整理
|
||||
|
||||
推荐原因:
|
||||
|
||||
- 模块边界清晰
|
||||
- 最适合 POC 小步快跑
|
||||
- 最容易控制系统复杂度
|
||||
- 最容易对不同 agent 框架复用
|
||||
138
docs/data-model.md
Normal file
138
docs/data-model.md
Normal file
@ -0,0 +1,138 @@
|
||||
# Data Model
|
||||
|
||||
## 目标
|
||||
|
||||
这个数据模型面向 SOC case 研判辅助场景,不追求全量归档,而强调高价值记忆抽取。
|
||||
|
||||
## 数据分层
|
||||
|
||||
### 1. Knowledge Memory
|
||||
|
||||
适用内容:
|
||||
|
||||
- KB
|
||||
- Playbook
|
||||
- 月报摘要
|
||||
- 报告摘要
|
||||
- PO
|
||||
- 检测规则说明
|
||||
|
||||
特点:
|
||||
|
||||
- 偏稳定、可复用
|
||||
- 面向方法、知识、模式
|
||||
- 适合长期保存
|
||||
|
||||
建议字段:
|
||||
|
||||
- `id`
|
||||
- `title`
|
||||
- `source_type`
|
||||
- `summary`
|
||||
- `tags`
|
||||
- `entities`
|
||||
- `ttp`
|
||||
- `confidence`
|
||||
- `updated_at`
|
||||
|
||||
### 2. Case Memory
|
||||
|
||||
适用内容:
|
||||
|
||||
- 历史 case
|
||||
- 最终研判结论
|
||||
- 关键证据
|
||||
- 误报 / 真报模式
|
||||
- 处置建议
|
||||
|
||||
特点:
|
||||
|
||||
- 面向具体案例
|
||||
- 适合检索相似 case
|
||||
- 是 POC 阶段最重要的数据层
|
||||
|
||||
建议字段:
|
||||
|
||||
- `case_id`
|
||||
- `title`
|
||||
- `alert_type`
|
||||
- `verdict`
|
||||
- `summary`
|
||||
- `key_evidence`
|
||||
- `entities`
|
||||
- `detection_logic`
|
||||
- `lessons_learned`
|
||||
- `source_links`
|
||||
|
||||
### 3. Process Memory
|
||||
|
||||
适用内容:
|
||||
|
||||
- agent 中间步骤
|
||||
- 工具调用结果
|
||||
- 推理路径
|
||||
- 临时分析结论
|
||||
|
||||
特点:
|
||||
|
||||
- 生命周期短
|
||||
- 价值不均匀
|
||||
- 只应抽取高价值部分转化为长期记忆
|
||||
|
||||
建议字段:
|
||||
|
||||
- `session_id`
|
||||
- `step_id`
|
||||
- `tool_name`
|
||||
- `observation`
|
||||
- `intermediate_conclusion`
|
||||
- `value_score`
|
||||
- `timestamp`
|
||||
|
||||
### 4. Profile / Preference Memory
|
||||
|
||||
适用内容:
|
||||
|
||||
- analyst 偏好
|
||||
- 默认输出风格
|
||||
- 常用研判路径
|
||||
|
||||
特点:
|
||||
|
||||
- 数量小
|
||||
- 用于个性化辅助
|
||||
|
||||
建议字段:
|
||||
|
||||
- `user_id`
|
||||
- `preference_type`
|
||||
- `value`
|
||||
- `scope`
|
||||
|
||||
### 5. Session Memory
|
||||
|
||||
适用内容:
|
||||
|
||||
- 当前 case 的上下文
|
||||
- 当前轮对话、当前任务的临时缓存
|
||||
|
||||
特点:
|
||||
|
||||
- 强时效
|
||||
- 默认不长期保留
|
||||
|
||||
建议字段:
|
||||
|
||||
- `session_id`
|
||||
- `task_id`
|
||||
- `active_entities`
|
||||
- `active_hypotheses`
|
||||
- `recent_observations`
|
||||
- `expires_at`
|
||||
|
||||
## 设计原则
|
||||
|
||||
- 原始材料不直接当记忆
|
||||
- 只沉淀对后续研判有帮助的高价值信息
|
||||
- Process Memory 默认短期,经过抽取后才升级为长期记忆
|
||||
- Knowledge 与 Case 是 POC 阶段优先建设的两层
|
||||
91
docs/hermes-demo-prompts.md
Normal file
91
docs/hermes-demo-prompts.md
Normal file
@ -0,0 +1,91 @@
|
||||
# Hermes Demo Prompts
|
||||
|
||||
## Recommended: Raw Email / Freeform Alert
|
||||
|
||||
Use this when you want to show that Hermes does not need a rigid input schema. The `soc-memory-poc` skill should route the content through `triage_email.py`, extract useful fields, retrieve memory, search Obsidian, and return the fixed SOC triage sections.
|
||||
|
||||
```text
|
||||
Use the soc-memory-poc skill. Triage this email alert and include Memory Retrieval and Obsidian references.
|
||||
|
||||
From: billing@vendor-payments.com
|
||||
To: alice@corp.example
|
||||
Subject: Invoice overdue notice
|
||||
Attachment: invoice_review.html
|
||||
|
||||
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12.
|
||||
|
||||
Return exactly these sections:
|
||||
研判结果
|
||||
关键证据
|
||||
关联 Memory Retrieval
|
||||
关联 Obsidian 文档
|
||||
建议动作
|
||||
```
|
||||
|
||||
Equivalent direct script check:
|
||||
|
||||
```bash
|
||||
python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
|
||||
To: alice@corp.example
|
||||
Subject: Invoice overdue notice
|
||||
Attachment: invoice_review.html
|
||||
User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
|
||||
```
|
||||
|
||||
## Structured Phishing Alert
|
||||
|
||||
Use this when you want maximum repeatability with explicit fields.
|
||||
|
||||
```text
|
||||
Use the soc-memory-poc skill. Treat the following as a structured SOC alert and use the preferred Scheme A path.
|
||||
|
||||
Scenario: phishing
|
||||
Alert type: mail_suspicious_attachment
|
||||
User: alice@corp.example
|
||||
Host: FIN-LAPTOP-12
|
||||
Sender: billing@vendor-payments.com
|
||||
Subject: Invoice overdue notice
|
||||
Attachment: invoice_review.html
|
||||
URL: https://vendor-payments-login.com/review
|
||||
IP: 198.51.100.20
|
||||
Known facts:
|
||||
- DMARC failed
|
||||
- User may have clicked the link
|
||||
|
||||
Return exactly these sections:
|
||||
研判结果
|
||||
关键证据
|
||||
关联 Memory Retrieval
|
||||
关联 Obsidian 文档
|
||||
建议动作
|
||||
```
|
||||
|
||||
## Structured O365 Alert
|
||||
|
||||
```text
|
||||
Use the soc-memory-poc skill. Treat the following as a structured SOC alert and use the preferred Scheme A path.
|
||||
|
||||
Scenario: o365_suspicious_login
|
||||
Alert type: azuread_impossible_travel
|
||||
User: david@corp.example
|
||||
Host: WS-DAVID-01
|
||||
IP: 203.0.113.150
|
||||
Known facts:
|
||||
- Impossible travel observed between Shanghai and Amsterdam within 15 minutes
|
||||
- MFA fatigue occurred before final success
|
||||
- User denied initiating the overseas login
|
||||
- Inbox rule creation was observed after login
|
||||
|
||||
Return exactly these sections:
|
||||
研判结果
|
||||
关键证据
|
||||
关联 Memory Retrieval
|
||||
关联 Obsidian 文档
|
||||
建议动作
|
||||
```
|
||||
|
||||
## Generate Case Note
|
||||
|
||||
```text
|
||||
Use the soc-memory-poc skill. Generate an Obsidian case note for /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0003.json with OpenViking enrichment, then tell me the output path and confirm whether the note was written successfully.
|
||||
```
|
||||
120
docs/namespaces.md
Normal file
120
docs/namespaces.md
Normal file
@ -0,0 +1,120 @@
|
||||
# OpenViking Namespaces
|
||||
|
||||
## 目标
|
||||
|
||||
通过明确 namespace 和 URI 组织方式,把 OpenViking 用作统一的 context / memory gateway。
|
||||
|
||||
## 推荐 namespace
|
||||
|
||||
### 1. `soc/knowledge`
|
||||
|
||||
用于稳定知识:
|
||||
|
||||
- KB
|
||||
- Playbook
|
||||
- 月报摘要
|
||||
- 报告摘要
|
||||
- PO
|
||||
|
||||
示例:
|
||||
|
||||
- `viking://soc/knowledge/kb/phishing-mail-header-analysis`
|
||||
- `viking://soc/knowledge/playbook/o365-suspicious-login`
|
||||
|
||||
### 2. `soc/case`
|
||||
|
||||
用于历史案例和 case 结论:
|
||||
|
||||
- 历史 case
|
||||
- 真报 / 误报模式
|
||||
- 关键证据
|
||||
|
||||
示例:
|
||||
|
||||
- `viking://soc/case/true-positive/case-2026-00128`
|
||||
- `viking://soc/case/false-positive/case-2026-00072`
|
||||
|
||||
### 3. `soc/process`
|
||||
|
||||
用于流程级记忆:
|
||||
|
||||
- agent 中间分析
|
||||
- 工具输出摘要
|
||||
- 可复用的中间判断模式
|
||||
|
||||
示例:
|
||||
|
||||
- `viking://soc/process/session-abc123/step-04`
|
||||
|
||||
### 4. `session/<session_id>`
|
||||
|
||||
用于当前任务的临时上下文。
|
||||
|
||||
示例:
|
||||
|
||||
- `viking://session/incident-20260421-001/context`
|
||||
- `viking://session/incident-20260421-001/tools`
|
||||
|
||||
### 5. `agent/<agent_id>`
|
||||
|
||||
用于 agent 级别的私有或半私有上下文。
|
||||
|
||||
示例:
|
||||
|
||||
- `viking://agent/hermes-soc/default`
|
||||
- `viking://agent/nanobot-soc/preferences`
|
||||
|
||||
### 6. `user/<user_id>`
|
||||
|
||||
用于 analyst 偏好、展示习惯等小规模 profile 信息。
|
||||
|
||||
示例:
|
||||
|
||||
- `viking://user/alice/preferences`
|
||||
|
||||
## 资源组织建议
|
||||
|
||||
### memory
|
||||
|
||||
适用于:
|
||||
|
||||
- 高价值摘要
|
||||
- case 结论
|
||||
- pattern
|
||||
- lesson learned
|
||||
|
||||
### resources
|
||||
|
||||
适用于:
|
||||
|
||||
- 原始附件链接
|
||||
- 外部文档引用
|
||||
- Obsidian note 路径
|
||||
- ticket / report / intel 引用
|
||||
|
||||
### skills
|
||||
|
||||
适用于:
|
||||
|
||||
- 检索 skill
|
||||
- 记忆抽取 skill
|
||||
- case 沉淀 skill
|
||||
|
||||
## 检索顺序建议
|
||||
|
||||
当前 case 发生检索时,建议按以下顺序召回:
|
||||
|
||||
1. `session/<session_id>`
|
||||
2. `soc/case`
|
||||
3. `soc/knowledge`
|
||||
4. `agent/<agent_id>`
|
||||
5. `user/<user_id>`
|
||||
|
||||
这样可以优先保证“当前上下文”和“历史相似 case”的相关性,不让通用知识淹没 case 信号。
|
||||
|
||||
## 约束建议
|
||||
|
||||
- 不要把所有原始资料直接写入 `soc/knowledge`
|
||||
- `soc/process` 默认应该设置清理策略
|
||||
- 长期稳定内容再写入 `soc/knowledge` 或 `soc/case`
|
||||
- Obsidian 只存人工可维护的摘要和结构化沉淀,不做全量原文仓
|
||||
130
docs/poc-scope.md
Normal file
130
docs/poc-scope.md
Normal file
@ -0,0 +1,130 @@
|
||||
# POC Scope
|
||||
|
||||
## 目标
|
||||
|
||||
第一阶段 POC 只验证一件事:
|
||||
|
||||
**高价值记忆抽取 + 相似 case / 知识召回,是否能有效提升 SOC case 研判效率和质量。**
|
||||
|
||||
## POC 范围
|
||||
|
||||
### 聚焦 case 类型
|
||||
|
||||
建议只选 1 到 2 类典型场景:
|
||||
|
||||
1. 钓鱼邮件 / 恶意附件
|
||||
2. O365 异常登录 / 疑似账号被盗
|
||||
|
||||
原因:
|
||||
|
||||
- 数据可获得性较高
|
||||
- 历史 case 重用价值高
|
||||
- playbook / KB 通常较完整
|
||||
- 便于定义“相似 case 命中率”
|
||||
|
||||
## 第一阶段只接入的数据
|
||||
|
||||
### 必接
|
||||
|
||||
- 历史 case
|
||||
- KB
|
||||
- Playbook
|
||||
|
||||
### 可选接入
|
||||
|
||||
- 月报摘要
|
||||
- 报告摘要
|
||||
|
||||
### 暂不接入
|
||||
|
||||
- ticket system 双向同步
|
||||
- 全量情报系统自动拉取
|
||||
- 全量报告原文
|
||||
- 大规模 process trace 持久化
|
||||
- analyst 偏好个性化
|
||||
|
||||
## 第一阶段要做的能力
|
||||
|
||||
### 必做
|
||||
|
||||
- 历史 case 导入
|
||||
- KB / Playbook 导入
|
||||
- 高价值信息抽取
|
||||
- 基于当前 case 的相关上下文检索
|
||||
- case 总结沉淀
|
||||
- 结构化写回 OpenViking
|
||||
- 生成 Obsidian case note
|
||||
|
||||
### 第二阶段再做
|
||||
|
||||
- EverMemOS 长期整理自动化
|
||||
- 更复杂的去重和衰减
|
||||
- 多数据源自动同步
|
||||
- 多 agent 协同策略优化
|
||||
|
||||
## 不做的事情
|
||||
|
||||
为了保证 POC 可落地,第一阶段明确不做:
|
||||
|
||||
- 泛化的企业级记忆平台
|
||||
- 所有原始数据全量入库
|
||||
- 全量全文检索系统重构
|
||||
- 覆盖所有 SOC 告警类型
|
||||
- 复杂权限系统
|
||||
- 完整的在线标注平台
|
||||
|
||||
## 交付物
|
||||
|
||||
第一阶段建议交付:
|
||||
|
||||
1. 可运行的 memory gateway
|
||||
2. 一批可导入的历史 case 与 KB / Playbook 样本
|
||||
3. 最小的 ingest / retrieve / summarize / commit 闭环
|
||||
4. Obsidian 模板和样例 note
|
||||
5. 一份 baseline 与 POC 对比评估结果
|
||||
|
||||
## 2 到 4 周实施建议
|
||||
|
||||
### 第 1 周
|
||||
|
||||
- 冻结 POC 范围
|
||||
- 整理样本数据
|
||||
- 完成数据模型与 namespace 约定
|
||||
- 建好 Obsidian 模板
|
||||
|
||||
### 第 2 周
|
||||
|
||||
- 完成历史 case / KB 导入脚本
|
||||
- 完成 `retrieve_context_skill`
|
||||
- 接通 OpenViking 的 `soc/case` 和 `soc/knowledge`
|
||||
|
||||
### 第 3 周
|
||||
|
||||
- 完成 `summarize_case_skill`
|
||||
- 完成 `commit_memory_skill`
|
||||
- 输出标准 case note 到 Obsidian
|
||||
|
||||
### 第 4 周
|
||||
|
||||
- 跑评估脚本
|
||||
- 做人工 review
|
||||
- 收敛下一阶段需求
|
||||
|
||||
## 评估指标
|
||||
|
||||
建议至少跟踪以下指标:
|
||||
|
||||
- 相似 case 命中率
|
||||
- 检索上下文相关性
|
||||
- 平均研判时间
|
||||
- 最终结论准确率
|
||||
- 人工满意度
|
||||
|
||||
## 验收标准
|
||||
|
||||
POC 第一阶段可以认为成功,当同时满足:
|
||||
|
||||
- 能稳定召回相关历史 case 或知识
|
||||
- 能辅助生成结构化 case note
|
||||
- 人工评估认为上下文质量有明显提升
|
||||
- 没有因为“塞入太多资料”导致检索明显劣化
|
||||
188
docs/sample-data-spec.md
Normal file
188
docs/sample-data-spec.md
Normal file
@ -0,0 +1,188 @@
|
||||
# Sample Data Spec
|
||||
|
||||
## 目标
|
||||
|
||||
这个文档定义 SOC Memory POC 在无真实数据阶段使用的 mock 数据格式,用于:
|
||||
|
||||
- 验证 ingestion pipeline
|
||||
- 验证标准化脚本
|
||||
- 验证 context retrieval
|
||||
- 验证 case summary 与 memory commit 流程
|
||||
|
||||
当前只覆盖两类场景:
|
||||
|
||||
- 钓鱼邮件
|
||||
- O365 异常登录 / 疑似账号被盗
|
||||
|
||||
## 目录约定
|
||||
|
||||
```text
|
||||
evaluation/datasets/
|
||||
├── mock_cases/
|
||||
│ ├── phishing/
|
||||
│ └── o365_suspicious_login/
|
||||
└── mock_kb/
|
||||
├── playbooks/
|
||||
├── kb/
|
||||
└── reports/
|
||||
```
|
||||
|
||||
## Mock Case 原始格式
|
||||
|
||||
每个 case 使用一个 JSON 文件,文件名建议:
|
||||
|
||||
```text
|
||||
<case_id>.json
|
||||
```
|
||||
|
||||
### 字段定义
|
||||
|
||||
| 字段 | 类型 | 必填 | 说明 |
|
||||
|---|---|---:|---|
|
||||
| `case_id` | string | 是 | case 唯一 ID |
|
||||
| `title` | string | 是 | 简短标题 |
|
||||
| `scenario` | string | 是 | `phishing` 或 `o365_suspicious_login` |
|
||||
| `alert_type` | string | 是 | 告警类型 |
|
||||
| `severity` | string | 是 | `low` / `medium` / `high` / `critical` |
|
||||
| `status` | string | 是 | `confirmed` / `false_positive` / `pending` |
|
||||
| `time_window` | object | 是 | 开始和结束时间 |
|
||||
| `summary` | string | 是 | 一句话摘要 |
|
||||
| `alert_source` | string | 是 | 告警来源系统 |
|
||||
| `entities` | object | 是 | 关键实体 |
|
||||
| `observables` | object | 否 | IOC/可观测对象 |
|
||||
| `evidence` | array | 是 | 关键证据列表 |
|
||||
| `investigation_steps` | array | 是 | 关键调查步骤 |
|
||||
| `conclusion` | object | 是 | 研判结论 |
|
||||
| `related_refs` | object | 否 | 相关 KB / playbook / case |
|
||||
| `lessons_learned` | array | 否 | 复用经验 |
|
||||
| `tags` | array | 否 | 标签 |
|
||||
|
||||
### 示例骨架
|
||||
|
||||
```json
|
||||
{
|
||||
"case_id": "CASE-2026-0001",
|
||||
"title": "Potential phishing email targeting finance user",
|
||||
"scenario": "phishing",
|
||||
"alert_type": "mail_suspicious_attachment",
|
||||
"severity": "high",
|
||||
"status": "confirmed",
|
||||
"time_window": {
|
||||
"start": "2026-04-01T09:10:00+08:00",
|
||||
"end": "2026-04-01T11:30:00+08:00"
|
||||
},
|
||||
"summary": "Finance user received an invoice-themed phishing email with a malicious HTML attachment.",
|
||||
"alert_source": "Secure Email Gateway",
|
||||
"entities": {
|
||||
"users": ["alice@corp.example"],
|
||||
"hosts": ["FIN-LAPTOP-12"],
|
||||
"mailboxes": ["alice@corp.example"]
|
||||
},
|
||||
"observables": {
|
||||
"sender_emails": ["billing@vendor-payments.com"],
|
||||
"domains": ["vendor-payments.com"],
|
||||
"urls": ["https://vendor-payments-login.com/review"],
|
||||
"hashes": ["sha256:..."],
|
||||
"ips": ["198.51.100.20"]
|
||||
},
|
||||
"evidence": [
|
||||
"The sender domain was newly observed and failed DMARC.",
|
||||
"The attachment redirected the user to a credential harvesting page."
|
||||
],
|
||||
"investigation_steps": [
|
||||
"Validate sender reputation and authentication results.",
|
||||
"Detonate attachment in sandbox.",
|
||||
"Check click telemetry and account sign-in logs."
|
||||
],
|
||||
"conclusion": {
|
||||
"verdict": "true_positive",
|
||||
"reason": "Multiple aligned phishing indicators and confirmed click behavior.",
|
||||
"recommended_actions": [
|
||||
"Reset the impacted account password.",
|
||||
"Block the sender domain and landing URL."
|
||||
]
|
||||
},
|
||||
"related_refs": {
|
||||
"playbooks": ["PB-PHISH-001"],
|
||||
"kb": ["KB-PHISH-HEADER-CHECK"],
|
||||
"cases": []
|
||||
},
|
||||
"lessons_learned": [
|
||||
"Invoice-themed phishing remains effective against finance users."
|
||||
],
|
||||
"tags": ["phishing", "email", "credential-harvest"]
|
||||
}
|
||||
```
|
||||
|
||||
## Mock KB / Playbook 原始格式
|
||||
|
||||
每个知识条目使用一个 JSON 文件,文件名建议:
|
||||
|
||||
```text
|
||||
<doc_id>.json
|
||||
```
|
||||
|
||||
### 字段定义
|
||||
|
||||
| 字段 | 类型 | 必填 | 说明 |
|
||||
|---|---|---:|---|
|
||||
| `doc_id` | string | 是 | 文档唯一 ID |
|
||||
| `doc_type` | string | 是 | `kb` / `playbook` / `report_summary` |
|
||||
| `title` | string | 是 | 标题 |
|
||||
| `scenario` | string | 是 | 适用场景 |
|
||||
| `summary` | string | 是 | 核心摘要 |
|
||||
| `applicability` | array | 否 | 适用条件 |
|
||||
| `key_points` | array | 是 | 核心知识点 |
|
||||
| `investigation_guidance` | array | 否 | 调查建议 |
|
||||
| `decision_points` | array | 否 | 判定关键点 |
|
||||
| `related_entities` | object | 否 | 相关实体/TTP/IOC |
|
||||
| `related_refs` | object | 否 | 相关文档 |
|
||||
| `tags` | array | 否 | 标签 |
|
||||
| `updated_at` | string | 否 | 更新时间 |
|
||||
|
||||
## 标准化输出目标
|
||||
|
||||
### 标准化后的 Case 结构
|
||||
|
||||
标准化脚本输出建议字段:
|
||||
|
||||
- `id`
|
||||
- `memory_type` = `case`
|
||||
- `scenario`
|
||||
- `title`
|
||||
- `abstract`
|
||||
- `verdict`
|
||||
- `severity`
|
||||
- `entities`
|
||||
- `observables`
|
||||
- `evidence`
|
||||
- `patterns`
|
||||
- `related_refs`
|
||||
- `source_path`
|
||||
- `tags`
|
||||
|
||||
### 标准化后的 KB 结构
|
||||
|
||||
标准化脚本输出建议字段:
|
||||
|
||||
- `id`
|
||||
- `memory_type` = `knowledge`
|
||||
- `doc_type`
|
||||
- `scenario`
|
||||
- `title`
|
||||
- `abstract`
|
||||
- `key_points`
|
||||
- `investigation_guidance`
|
||||
- `decision_points`
|
||||
- `related_refs`
|
||||
- `source_path`
|
||||
- `tags`
|
||||
|
||||
## 检索测试建议
|
||||
|
||||
在 mock 数据阶段,优先验证:
|
||||
|
||||
- 钓鱼 case 是否能召回 phishing playbook 和相似 phishing case
|
||||
- O365 登录异常 case 是否能召回登录异常 KB 和相似 case
|
||||
- 真报与误报 case 是否能被区分并保留不同模式
|
||||
- 召回结果是否包含关键 evidence / decision points
|
||||
68
docs/system-positioning.md
Normal file
68
docs/system-positioning.md
Normal file
@ -0,0 +1,68 @@
|
||||
# System Positioning
|
||||
|
||||
## 当前项目定位
|
||||
|
||||
`memory_gateway` 不是完整的 SOC 记忆系统,而是整套方案里的统一上下文入口层。
|
||||
|
||||
它当前承担的职责是:
|
||||
|
||||
- 为 AI agent 提供统一的 MCP / REST 访问入口
|
||||
- 将检索和写入请求转发给 OpenViking
|
||||
- 提供基础鉴权、协议兼容和网关能力
|
||||
- 作为多 agent 共享记忆体系的最薄接入层
|
||||
|
||||
它不直接承担以下职责:
|
||||
|
||||
- 原始知识源的批量导入
|
||||
- 高价值记忆抽取和筛选
|
||||
- Obsidian Vault 的人工知识沉淀
|
||||
- EverMemOS 的长期记忆整理与演化
|
||||
- 评估数据集与实验流程管理
|
||||
|
||||
## 在整套 SOC 记忆系统中的位置
|
||||
|
||||
```text
|
||||
SOC 数据源
|
||||
KB / Playbook / 月报 / 报告 / Ticket / Intel / 历史 Case
|
||||
|
|
||||
v
|
||||
Skills / Pipeline
|
||||
ingest / extract / classify / summarize / commit / prune
|
||||
|
|
||||
v
|
||||
memory_gateway
|
||||
统一入口层(MCP / REST / Auth / Routing)
|
||||
|
|
||||
v
|
||||
OpenViking
|
||||
统一 context / memory / resource / skill 层
|
||||
| |
|
||||
v v
|
||||
Obsidian Vault EverMemOS
|
||||
人工沉淀层 长期整理层
|
||||
```
|
||||
|
||||
## 下一阶段模块建议
|
||||
|
||||
建议把后续 POC 能力分成以下模块:
|
||||
|
||||
- `docs/`
|
||||
保存系统设计、数据模型、命名空间规范
|
||||
- `poc/skills/`
|
||||
保存检索、抽取、沉淀相关的 skills
|
||||
- `poc/pipeline/`
|
||||
保存接入 ticket、intel、历史 case 的导入流程
|
||||
- `poc/obsidian-vault/`
|
||||
保存人工维护知识和 case note 模板
|
||||
- `poc/evermemos/`
|
||||
保存长期记忆整理逻辑和策略
|
||||
- `poc/evaluation/`
|
||||
保存数据集、评估脚本和结果
|
||||
|
||||
## 当前仓库边界建议
|
||||
|
||||
建议继续把本仓库控制在“网关项目”边界内:
|
||||
|
||||
- 保留:服务入口、OpenViking 接入、配置、协议、测试
|
||||
- 新增:系统设计文档、POC 骨架目录
|
||||
- 不建议继续堆积:大量业务规则、海量导入脚本、Vault 内容本体
|
||||
Reference in New Issue
Block a user