189 lines
5.0 KiB
Markdown
189 lines
5.0 KiB
Markdown
# Sample Data Spec
|
|
|
|
## 目标
|
|
|
|
这个文档定义 SOC Memory POC 在无真实数据阶段使用的 mock 数据格式,用于:
|
|
|
|
- 验证 ingestion pipeline
|
|
- 验证标准化脚本
|
|
- 验证 context retrieval
|
|
- 验证 case summary 与 memory commit 流程
|
|
|
|
当前只覆盖两类场景:
|
|
|
|
- 钓鱼邮件
|
|
- O365 异常登录 / 疑似账号被盗
|
|
|
|
## 目录约定
|
|
|
|
```text
|
|
evaluation/datasets/
|
|
├── mock_cases/
|
|
│ ├── phishing/
|
|
│ └── o365_suspicious_login/
|
|
└── mock_kb/
|
|
├── playbooks/
|
|
├── kb/
|
|
└── reports/
|
|
```
|
|
|
|
## Mock Case 原始格式
|
|
|
|
每个 case 使用一个 JSON 文件,文件名建议:
|
|
|
|
```text
|
|
<case_id>.json
|
|
```
|
|
|
|
### 字段定义
|
|
|
|
| 字段 | 类型 | 必填 | 说明 |
|
|
|---|---|---:|---|
|
|
| `case_id` | string | 是 | case 唯一 ID |
|
|
| `title` | string | 是 | 简短标题 |
|
|
| `scenario` | string | 是 | `phishing` 或 `o365_suspicious_login` |
|
|
| `alert_type` | string | 是 | 告警类型 |
|
|
| `severity` | string | 是 | `low` / `medium` / `high` / `critical` |
|
|
| `status` | string | 是 | `confirmed` / `false_positive` / `pending` |
|
|
| `time_window` | object | 是 | 开始和结束时间 |
|
|
| `summary` | string | 是 | 一句话摘要 |
|
|
| `alert_source` | string | 是 | 告警来源系统 |
|
|
| `entities` | object | 是 | 关键实体 |
|
|
| `observables` | object | 否 | IOC/可观测对象 |
|
|
| `evidence` | array | 是 | 关键证据列表 |
|
|
| `investigation_steps` | array | 是 | 关键调查步骤 |
|
|
| `conclusion` | object | 是 | 研判结论 |
|
|
| `related_refs` | object | 否 | 相关 KB / playbook / case |
|
|
| `lessons_learned` | array | 否 | 复用经验 |
|
|
| `tags` | array | 否 | 标签 |
|
|
|
|
### 示例骨架
|
|
|
|
```json
|
|
{
|
|
"case_id": "CASE-2026-0001",
|
|
"title": "Potential phishing email targeting finance user",
|
|
"scenario": "phishing",
|
|
"alert_type": "mail_suspicious_attachment",
|
|
"severity": "high",
|
|
"status": "confirmed",
|
|
"time_window": {
|
|
"start": "2026-04-01T09:10:00+08:00",
|
|
"end": "2026-04-01T11:30:00+08:00"
|
|
},
|
|
"summary": "Finance user received an invoice-themed phishing email with a malicious HTML attachment.",
|
|
"alert_source": "Secure Email Gateway",
|
|
"entities": {
|
|
"users": ["alice@corp.example"],
|
|
"hosts": ["FIN-LAPTOP-12"],
|
|
"mailboxes": ["alice@corp.example"]
|
|
},
|
|
"observables": {
|
|
"sender_emails": ["billing@vendor-payments.com"],
|
|
"domains": ["vendor-payments.com"],
|
|
"urls": ["https://vendor-payments-login.com/review"],
|
|
"hashes": ["sha256:..."],
|
|
"ips": ["198.51.100.20"]
|
|
},
|
|
"evidence": [
|
|
"The sender domain was newly observed and failed DMARC.",
|
|
"The attachment redirected the user to a credential harvesting page."
|
|
],
|
|
"investigation_steps": [
|
|
"Validate sender reputation and authentication results.",
|
|
"Detonate attachment in sandbox.",
|
|
"Check click telemetry and account sign-in logs."
|
|
],
|
|
"conclusion": {
|
|
"verdict": "true_positive",
|
|
"reason": "Multiple aligned phishing indicators and confirmed click behavior.",
|
|
"recommended_actions": [
|
|
"Reset the impacted account password.",
|
|
"Block the sender domain and landing URL."
|
|
]
|
|
},
|
|
"related_refs": {
|
|
"playbooks": ["PB-PHISH-001"],
|
|
"kb": ["KB-PHISH-HEADER-CHECK"],
|
|
"cases": []
|
|
},
|
|
"lessons_learned": [
|
|
"Invoice-themed phishing remains effective against finance users."
|
|
],
|
|
"tags": ["phishing", "email", "credential-harvest"]
|
|
}
|
|
```
|
|
|
|
## Mock KB / Playbook 原始格式
|
|
|
|
每个知识条目使用一个 JSON 文件,文件名建议:
|
|
|
|
```text
|
|
<doc_id>.json
|
|
```
|
|
|
|
### 字段定义
|
|
|
|
| 字段 | 类型 | 必填 | 说明 |
|
|
|---|---|---:|---|
|
|
| `doc_id` | string | 是 | 文档唯一 ID |
|
|
| `doc_type` | string | 是 | `kb` / `playbook` / `report_summary` |
|
|
| `title` | string | 是 | 标题 |
|
|
| `scenario` | string | 是 | 适用场景 |
|
|
| `summary` | string | 是 | 核心摘要 |
|
|
| `applicability` | array | 否 | 适用条件 |
|
|
| `key_points` | array | 是 | 核心知识点 |
|
|
| `investigation_guidance` | array | 否 | 调查建议 |
|
|
| `decision_points` | array | 否 | 判定关键点 |
|
|
| `related_entities` | object | 否 | 相关实体/TTP/IOC |
|
|
| `related_refs` | object | 否 | 相关文档 |
|
|
| `tags` | array | 否 | 标签 |
|
|
| `updated_at` | string | 否 | 更新时间 |
|
|
|
|
## 标准化输出目标
|
|
|
|
### 标准化后的 Case 结构
|
|
|
|
标准化脚本输出建议字段:
|
|
|
|
- `id`
|
|
- `memory_type` = `case`
|
|
- `scenario`
|
|
- `title`
|
|
- `abstract`
|
|
- `verdict`
|
|
- `severity`
|
|
- `entities`
|
|
- `observables`
|
|
- `evidence`
|
|
- `patterns`
|
|
- `related_refs`
|
|
- `source_path`
|
|
- `tags`
|
|
|
|
### 标准化后的 KB 结构
|
|
|
|
标准化脚本输出建议字段:
|
|
|
|
- `id`
|
|
- `memory_type` = `knowledge`
|
|
- `doc_type`
|
|
- `scenario`
|
|
- `title`
|
|
- `abstract`
|
|
- `key_points`
|
|
- `investigation_guidance`
|
|
- `decision_points`
|
|
- `related_refs`
|
|
- `source_path`
|
|
- `tags`
|
|
|
|
## 检索测试建议
|
|
|
|
在 mock 数据阶段,优先验证:
|
|
|
|
- 钓鱼 case 是否能召回 phishing playbook 和相似 phishing case
|
|
- O365 登录异常 case 是否能召回登录异常 KB 和相似 case
|
|
- 真报与误报 case 是否能被区分并保留不同模式
|
|
- 召回结果是否包含关键 evidence / decision points
|