5.0 KiB
5.0 KiB
Sample Data Spec
目标
这个文档定义 SOC Memory POC 在无真实数据阶段使用的 mock 数据格式,用于:
- 验证 ingestion pipeline
- 验证标准化脚本
- 验证 context retrieval
- 验证 case summary 与 memory commit 流程
当前只覆盖两类场景:
- 钓鱼邮件
- O365 异常登录 / 疑似账号被盗
目录约定
evaluation/datasets/
├── mock_cases/
│ ├── phishing/
│ └── o365_suspicious_login/
└── mock_kb/
├── playbooks/
├── kb/
└── reports/
Mock Case 原始格式
每个 case 使用一个 JSON 文件,文件名建议:
<case_id>.json
字段定义
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
case_id |
string | 是 | case 唯一 ID |
title |
string | 是 | 简短标题 |
scenario |
string | 是 | phishing 或 o365_suspicious_login |
alert_type |
string | 是 | 告警类型 |
severity |
string | 是 | low / medium / high / critical |
status |
string | 是 | confirmed / false_positive / pending |
time_window |
object | 是 | 开始和结束时间 |
summary |
string | 是 | 一句话摘要 |
alert_source |
string | 是 | 告警来源系统 |
entities |
object | 是 | 关键实体 |
observables |
object | 否 | IOC/可观测对象 |
evidence |
array | 是 | 关键证据列表 |
investigation_steps |
array | 是 | 关键调查步骤 |
conclusion |
object | 是 | 研判结论 |
related_refs |
object | 否 | 相关 KB / playbook / case |
lessons_learned |
array | 否 | 复用经验 |
tags |
array | 否 | 标签 |
示例骨架
{
"case_id": "CASE-2026-0001",
"title": "Potential phishing email targeting finance user",
"scenario": "phishing",
"alert_type": "mail_suspicious_attachment",
"severity": "high",
"status": "confirmed",
"time_window": {
"start": "2026-04-01T09:10:00+08:00",
"end": "2026-04-01T11:30:00+08:00"
},
"summary": "Finance user received an invoice-themed phishing email with a malicious HTML attachment.",
"alert_source": "Secure Email Gateway",
"entities": {
"users": ["alice@corp.example"],
"hosts": ["FIN-LAPTOP-12"],
"mailboxes": ["alice@corp.example"]
},
"observables": {
"sender_emails": ["billing@vendor-payments.com"],
"domains": ["vendor-payments.com"],
"urls": ["https://vendor-payments-login.com/review"],
"hashes": ["sha256:..."],
"ips": ["198.51.100.20"]
},
"evidence": [
"The sender domain was newly observed and failed DMARC.",
"The attachment redirected the user to a credential harvesting page."
],
"investigation_steps": [
"Validate sender reputation and authentication results.",
"Detonate attachment in sandbox.",
"Check click telemetry and account sign-in logs."
],
"conclusion": {
"verdict": "true_positive",
"reason": "Multiple aligned phishing indicators and confirmed click behavior.",
"recommended_actions": [
"Reset the impacted account password.",
"Block the sender domain and landing URL."
]
},
"related_refs": {
"playbooks": ["PB-PHISH-001"],
"kb": ["KB-PHISH-HEADER-CHECK"],
"cases": []
},
"lessons_learned": [
"Invoice-themed phishing remains effective against finance users."
],
"tags": ["phishing", "email", "credential-harvest"]
}
Mock KB / Playbook 原始格式
每个知识条目使用一个 JSON 文件,文件名建议:
<doc_id>.json
字段定义
| 字段 | 类型 | 必填 | 说明 |
|---|---|---|---|
doc_id |
string | 是 | 文档唯一 ID |
doc_type |
string | 是 | kb / playbook / report_summary |
title |
string | 是 | 标题 |
scenario |
string | 是 | 适用场景 |
summary |
string | 是 | 核心摘要 |
applicability |
array | 否 | 适用条件 |
key_points |
array | 是 | 核心知识点 |
investigation_guidance |
array | 否 | 调查建议 |
decision_points |
array | 否 | 判定关键点 |
related_entities |
object | 否 | 相关实体/TTP/IOC |
related_refs |
object | 否 | 相关文档 |
tags |
array | 否 | 标签 |
updated_at |
string | 否 | 更新时间 |
标准化输出目标
标准化后的 Case 结构
标准化脚本输出建议字段:
idmemory_type=casescenariotitleabstractverdictseverityentitiesobservablesevidencepatternsrelated_refssource_pathtags
标准化后的 KB 结构
标准化脚本输出建议字段:
idmemory_type=knowledgedoc_typescenariotitleabstractkey_pointsinvestigation_guidancedecision_pointsrelated_refssource_pathtags
检索测试建议
在 mock 数据阶段,优先验证:
- 钓鱼 case 是否能召回 phishing playbook 和相似 phishing case
- O365 登录异常 case 是否能召回登录异常 KB 和相似 case
- 真报与误报 case 是否能被区分并保留不同模式
- 召回结果是否包含关键 evidence / decision points