Initial SOC memory POC implementation
This commit is contained in:
17
skills/README.md
Normal file
17
skills/README.md
Normal file
@ -0,0 +1,17 @@
|
||||
# Skills
|
||||
|
||||
建议优先落地的 skills:
|
||||
|
||||
- `ingest_skill`
|
||||
- `extract_memory_skill`
|
||||
- `classify_memory_skill`
|
||||
- `retrieve_context_skill`
|
||||
- `summarize_case_skill`
|
||||
- `commit_memory_skill`
|
||||
- `prune_memory_skill`
|
||||
|
||||
POC 第一阶段建议先做:
|
||||
|
||||
- `retrieve_context_skill`
|
||||
- `summarize_case_skill`
|
||||
- `commit_memory_skill`
|
||||
36
skills/commit_memory_skill/README.md
Normal file
36
skills/commit_memory_skill/README.md
Normal file
@ -0,0 +1,36 @@
|
||||
# commit_memory_skill
|
||||
|
||||
这个 skill 负责把标准化后的高价值记忆写回 OpenViking。
|
||||
|
||||
## 当前阶段职责
|
||||
|
||||
第一阶段优先把标准化后的 `case` 和 `knowledge` 以 resource 形式写入 OpenViking。
|
||||
|
||||
原因:
|
||||
|
||||
- 结构化数据适合用 URI 明确组织
|
||||
- 相比通过会话提交 `add_memory`,resource 写入更可控
|
||||
- 便于后续按 namespace 和 URI 组织 case / knowledge / report
|
||||
|
||||
## 第一阶段输入
|
||||
|
||||
- 标准化后的 case JSON
|
||||
- 标准化后的 KB / Playbook JSON
|
||||
|
||||
## 第一阶段输出
|
||||
|
||||
- OpenViking resource 写入结果
|
||||
- 统一 URI 组织的资源
|
||||
|
||||
## 默认 URI 约定
|
||||
|
||||
- case: `viking://soc/case/<scenario>/<id>`
|
||||
- knowledge: `viking://soc/knowledge/<doc_type>/<id>`
|
||||
|
||||
## 后续扩展
|
||||
|
||||
后续可以在 resource 写入稳定后,再增加:
|
||||
|
||||
- 高价值 summary 写入 `memory`
|
||||
- EverMemOS 提炼结果回灌
|
||||
- Obsidian / OpenViking 双写策略
|
||||
29
skills/commit_memory_skill/SKILL.md
Normal file
29
skills/commit_memory_skill/SKILL.md
Normal file
@ -0,0 +1,29 @@
|
||||
# commit_memory_skill
|
||||
|
||||
## 用途
|
||||
|
||||
把已经过标准化和筛选的 case / knowledge 内容写入 OpenViking。
|
||||
|
||||
## 当前默认策略
|
||||
|
||||
第一阶段只做 resource 写入,不强行做复杂 memory 演化。
|
||||
|
||||
- `case` -> `viking://soc/case/<scenario>/<id>`
|
||||
- `knowledge` -> `viking://soc/knowledge/<doc_type>/<id>`
|
||||
|
||||
## 输入
|
||||
|
||||
- 标准化后的 case / knowledge JSON 文件
|
||||
- OpenViking 配置(URL / API Key)
|
||||
|
||||
## 输出
|
||||
|
||||
- 写入结果
|
||||
- 目标 URI
|
||||
- 成功 / 失败状态
|
||||
|
||||
## 成功标准
|
||||
|
||||
- 可以把本地标准化样本成功写入 OpenViking
|
||||
- URI 组织符合 namespace 设计
|
||||
- 后续可以被检索和引用
|
||||
89
skills/commit_memory_skill/commit_to_openviking.py
Normal file
89
skills/commit_memory_skill/commit_to_openviking.py
Normal file
@ -0,0 +1,89 @@
|
||||
"""Commit normalized SOC memory items to OpenViking as structured resources."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from memory_gateway.openviking_client import OpenVikingClient
|
||||
|
||||
|
||||
def build_resource_uri(item: dict[str, Any]) -> str:
|
||||
memory_type = item.get("memory_type")
|
||||
item_id = item["id"]
|
||||
|
||||
if memory_type == "case":
|
||||
scenario = item.get("scenario", "general")
|
||||
return f"viking://resources/soc-memory-poc/case/{scenario}/{item_id}.json"
|
||||
|
||||
if memory_type == "knowledge":
|
||||
doc_type = item.get("doc_type", "general")
|
||||
return f"viking://resources/soc-memory-poc/knowledge/{doc_type}/{item_id}.json"
|
||||
|
||||
raise ValueError(f"Unsupported memory_type for commit: {memory_type}")
|
||||
|
||||
|
||||
def load_item(path: str | Path) -> dict[str, Any]:
|
||||
path = Path(path)
|
||||
with path.open("r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
async def commit_file(path: str | Path, client: OpenVikingClient) -> dict[str, Any]:
|
||||
item = load_item(path)
|
||||
uri = build_resource_uri(item)
|
||||
result = await client.add_resource(
|
||||
uri=uri,
|
||||
content=json.dumps(item, ensure_ascii=False, indent=2),
|
||||
resource_type="json",
|
||||
wait=False,
|
||||
)
|
||||
return {
|
||||
"path": str(path),
|
||||
"uri": uri,
|
||||
"result": result,
|
||||
}
|
||||
|
||||
|
||||
async def commit_directory(directory: str | Path, client: OpenVikingClient, limit: int | None = None) -> list[dict[str, Any]]:
|
||||
directory = Path(directory)
|
||||
paths = sorted(directory.rglob("*.json"))
|
||||
if limit is not None:
|
||||
paths = paths[:limit]
|
||||
|
||||
results: list[dict[str, Any]] = []
|
||||
for path in paths:
|
||||
results.append(await commit_file(path, client))
|
||||
return results
|
||||
|
||||
|
||||
async def main_async(args: argparse.Namespace) -> None:
|
||||
client = OpenVikingClient()
|
||||
try:
|
||||
if args.path:
|
||||
result = await commit_file(args.path, client)
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
else:
|
||||
results = await commit_directory(args.directory, client, limit=args.limit)
|
||||
print(json.dumps(results, ensure_ascii=False, indent=2))
|
||||
finally:
|
||||
await client.close()
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Commit normalized SOC items to OpenViking.")
|
||||
parser.add_argument("--path", help="Single normalized JSON file to commit")
|
||||
parser.add_argument("--directory", help="Directory of normalized JSON files to commit")
|
||||
parser.add_argument("--limit", type=int, default=None, help="Optional limit for directory commits")
|
||||
args = parser.parse_args()
|
||||
|
||||
if not args.path and not args.directory:
|
||||
parser.error("Either --path or --directory is required")
|
||||
|
||||
asyncio.run(main_async(args))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
42
skills/retrieve_context_skill/README.md
Normal file
42
skills/retrieve_context_skill/README.md
Normal file
@ -0,0 +1,42 @@
|
||||
# retrieve_context_skill
|
||||
|
||||
这个 skill 用于根据当前 case 的关键信号,从 OpenViking 或 mock dataset 中召回最相关的上下文。
|
||||
|
||||
## 目标
|
||||
|
||||
输入当前 case 的场景、告警类型、IOC、描述,输出一组排序后的相关内容:
|
||||
|
||||
- 相似历史 case
|
||||
- 相关 KB
|
||||
- 相关 Playbook
|
||||
- 关键 decision points
|
||||
|
||||
## 第一阶段输入
|
||||
|
||||
- `scenario`
|
||||
- `alert_type`
|
||||
- `summary`
|
||||
- `entities`
|
||||
- `observables`
|
||||
- `top_k`
|
||||
|
||||
## 第一阶段输出
|
||||
|
||||
- `matched_cases`
|
||||
- `matched_knowledge`
|
||||
- `decision_points`
|
||||
- `next_actions`
|
||||
|
||||
## 第一阶段检索策略
|
||||
|
||||
1. 先按 `scenario` 过滤
|
||||
2. 再按 `alert_type`、IOC、关键词做匹配
|
||||
3. 再按 evidence / tags 做轻量重排序
|
||||
4. 输出 top-k
|
||||
|
||||
## 第一阶段不做
|
||||
|
||||
- 向量检索
|
||||
- 图检索
|
||||
- 个性化排序
|
||||
- 多源复杂重排
|
||||
39
skills/retrieve_context_skill/SKILL.md
Normal file
39
skills/retrieve_context_skill/SKILL.md
Normal file
@ -0,0 +1,39 @@
|
||||
# retrieve_context_skill
|
||||
|
||||
## 用途
|
||||
|
||||
在 SOC case 研判时,为 agent 检索最相关的历史 case 和知识上下文。
|
||||
|
||||
## 输入
|
||||
|
||||
- `scenario`: 场景,如 `phishing`、`o365_suspicious_login`
|
||||
- `alert_type`: 告警类型
|
||||
- `summary`: 当前 case 摘要
|
||||
- `entities`: 用户、主机、邮箱等
|
||||
- `observables`: 域名、IP、URL、Hash 等
|
||||
- `top_k`: 期望返回条数
|
||||
|
||||
## 输出
|
||||
|
||||
- 相关历史 case 列表
|
||||
- 相关 KB / Playbook 列表
|
||||
- 关键 evidence / decision points
|
||||
- 推荐下一步调查动作
|
||||
|
||||
## 默认检索顺序
|
||||
|
||||
1. `session/<session_id>`
|
||||
2. `soc/case`
|
||||
3. `soc/knowledge`
|
||||
4. `agent/<agent_id>`
|
||||
5. `user/<user_id>`
|
||||
|
||||
## Mock 阶段工作方式
|
||||
|
||||
在没有真实数据和完整 OpenViking 检索链路时,先使用 `evaluation/datasets/mock_cases/` 和 `evaluation/datasets/mock_kb/` 做本地检索验证。
|
||||
|
||||
## 成功标准
|
||||
|
||||
- 钓鱼 case 能召回钓鱼 playbook 和相似 phishing case
|
||||
- O365 异常登录 case 能召回登录异常 KB 和相似 case
|
||||
- 返回结果对人工 reviewer 看起来是“有帮助的上下文”,而不是泛资料堆积
|
||||
216
skills/retrieve_context_skill/retrieve_context.py
Normal file
216
skills/retrieve_context_skill/retrieve_context.py
Normal file
@ -0,0 +1,216 @@
|
||||
"""Retrieval entrypoint for SOC Memory POC.
|
||||
|
||||
Supports two modes:
|
||||
- local: retrieve from normalized mock datasets
|
||||
- openviking: retrieve from OpenViking resource namespaces and filter results
|
||||
"""
|
||||
from __future__ import annotations
|
||||
|
||||
import asyncio
|
||||
import json
|
||||
from dataclasses import asdict, dataclass
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from memory_gateway.openviking_client import OpenVikingClient
|
||||
|
||||
CASE_URI_PREFIX = "viking://resources/soc-memory-poc/case"
|
||||
KNOWLEDGE_URI_PREFIX = "viking://resources/soc-memory-poc/knowledge"
|
||||
|
||||
|
||||
def _load_json_dir(path: str | Path) -> list[dict[str, Any]]:
|
||||
path = Path(path)
|
||||
items: list[dict[str, Any]] = []
|
||||
for file in sorted(path.rglob("*.json")):
|
||||
with file.open("r", encoding="utf-8") as f:
|
||||
items.append(json.load(f))
|
||||
return items
|
||||
|
||||
|
||||
@dataclass
|
||||
class RetrievalQuery:
|
||||
scenario: str
|
||||
alert_type: str = ""
|
||||
summary: str = ""
|
||||
entities: dict[str, list[str]] | None = None
|
||||
observables: dict[str, list[str]] | None = None
|
||||
top_k: int = 3
|
||||
|
||||
|
||||
def _flatten_values(data: dict[str, list[str]] | None) -> set[str]:
|
||||
if not data:
|
||||
return set()
|
||||
values: set[str] = set()
|
||||
for items in data.values():
|
||||
values.update(str(item).lower() for item in items)
|
||||
return values
|
||||
|
||||
|
||||
def _score_case(query: RetrievalQuery, item: dict[str, Any]) -> int:
|
||||
score = 0
|
||||
if item.get("scenario") == query.scenario:
|
||||
score += 50
|
||||
|
||||
for pattern in item.get("patterns", []):
|
||||
if query.alert_type and pattern == f"alert_type:{query.alert_type}":
|
||||
score += 20
|
||||
|
||||
query_observables = _flatten_values(query.observables)
|
||||
item_observables = _flatten_values(item.get("observables"))
|
||||
score += 8 * len(query_observables & item_observables)
|
||||
|
||||
summary = query.summary.lower()
|
||||
haystacks = [item.get("title", "").lower(), item.get("abstract", "").lower()]
|
||||
for token in [t for t in summary.split() if len(t) > 4]:
|
||||
if any(token in text for text in haystacks):
|
||||
score += 2
|
||||
|
||||
return score
|
||||
|
||||
|
||||
def _score_knowledge(query: RetrievalQuery, item: dict[str, Any]) -> int:
|
||||
score = 0
|
||||
if item.get("scenario") == query.scenario:
|
||||
score += 40
|
||||
|
||||
title = item.get("title", "").lower()
|
||||
abstract = item.get("abstract", "").lower()
|
||||
for token in [t for t in query.summary.lower().split() if len(t) > 4]:
|
||||
if token in title or token in abstract:
|
||||
score += 2
|
||||
|
||||
if query.alert_type and query.alert_type in " ".join(item.get("related_refs", {}).get("cases", [])).lower():
|
||||
score += 5
|
||||
|
||||
return score
|
||||
|
||||
|
||||
def retrieve_context_local(
|
||||
query: RetrievalQuery,
|
||||
cases_dir: str | Path = "evaluation/datasets/normalized_cases",
|
||||
knowledge_dir: str | Path = "evaluation/datasets/normalized_kb",
|
||||
) -> dict[str, Any]:
|
||||
cases = _load_json_dir(cases_dir)
|
||||
knowledge = _load_json_dir(knowledge_dir)
|
||||
|
||||
ranked_cases = sorted(
|
||||
({"score": _score_case(query, item), "item": item} for item in cases),
|
||||
key=lambda x: x["score"],
|
||||
reverse=True,
|
||||
)
|
||||
ranked_knowledge = sorted(
|
||||
({"score": _score_knowledge(query, item), "item": item} for item in knowledge),
|
||||
key=lambda x: x["score"],
|
||||
reverse=True,
|
||||
)
|
||||
|
||||
matched_cases = [entry for entry in ranked_cases if entry["score"] > 0][: query.top_k]
|
||||
matched_knowledge = [entry for entry in ranked_knowledge if entry["score"] > 0][: query.top_k]
|
||||
|
||||
decision_points: list[str] = []
|
||||
next_actions: list[str] = []
|
||||
for entry in matched_knowledge:
|
||||
item = entry["item"]
|
||||
decision_points.extend(item.get("decision_points", []))
|
||||
next_actions.extend(item.get("investigation_guidance", []))
|
||||
|
||||
return {
|
||||
"backend": "local",
|
||||
"query": asdict(query),
|
||||
"matched_cases": matched_cases,
|
||||
"matched_knowledge": matched_knowledge,
|
||||
"decision_points": decision_points[: query.top_k],
|
||||
"next_actions": next_actions[: query.top_k],
|
||||
}
|
||||
|
||||
|
||||
def _canonicalize_resource_uri(uri: str) -> str:
|
||||
if ".json/" in uri:
|
||||
return uri.split(".json/", 1)[0] + ".json"
|
||||
return uri
|
||||
|
||||
|
||||
def _query_text(query: RetrievalQuery) -> str:
|
||||
parts = [query.scenario, query.alert_type, query.summary]
|
||||
parts.extend(sorted(_flatten_values(query.observables)))
|
||||
return " ".join(part for part in parts if part).strip()
|
||||
|
||||
|
||||
def _dedupe_openviking_results(results: list[dict[str, Any]], prefix: str) -> list[dict[str, Any]]:
|
||||
deduped: dict[str, dict[str, Any]] = {}
|
||||
for item in results:
|
||||
uri = item.get("uri") or ""
|
||||
if not uri.startswith(prefix):
|
||||
continue
|
||||
canonical_uri = _canonicalize_resource_uri(uri)
|
||||
score = item.get("score") or 0
|
||||
existing = deduped.get(canonical_uri)
|
||||
payload = {
|
||||
"uri": canonical_uri,
|
||||
"abstract": item.get("abstract", ""),
|
||||
"score": score,
|
||||
"context_type": item.get("context_type"),
|
||||
"source_uri": uri,
|
||||
}
|
||||
if existing is None or score > existing.get("score", 0):
|
||||
deduped[canonical_uri] = payload
|
||||
return sorted(deduped.values(), key=lambda x: x["score"], reverse=True)
|
||||
|
||||
|
||||
async def retrieve_context_openviking(
|
||||
query: RetrievalQuery,
|
||||
case_uri: str = CASE_URI_PREFIX,
|
||||
knowledge_uri: str = KNOWLEDGE_URI_PREFIX,
|
||||
) -> dict[str, Any]:
|
||||
client = OpenVikingClient()
|
||||
try:
|
||||
query_text = _query_text(query)
|
||||
case_result = await client.search(query=query_text, uri=case_uri, limit=max(query.top_k * 5, 10))
|
||||
knowledge_result = await client.search(query=query_text, uri=knowledge_uri, limit=max(query.top_k * 5, 10))
|
||||
|
||||
matched_cases = _dedupe_openviking_results(case_result.results, case_uri)[: query.top_k]
|
||||
matched_knowledge = _dedupe_openviking_results(knowledge_result.results, knowledge_uri)[: query.top_k]
|
||||
|
||||
return {
|
||||
"backend": "openviking",
|
||||
"query": asdict(query),
|
||||
"matched_cases": matched_cases,
|
||||
"matched_knowledge": matched_knowledge,
|
||||
"decision_points": [],
|
||||
"next_actions": [],
|
||||
}
|
||||
finally:
|
||||
await client.close()
|
||||
|
||||
|
||||
def main() -> None:
|
||||
import argparse
|
||||
|
||||
parser = argparse.ArgumentParser(description="Retrieve SOC context from local datasets or OpenViking.")
|
||||
parser.add_argument("--backend", choices=["local", "openviking"], default="local", help="Retrieval backend")
|
||||
parser.add_argument("--scenario", required=True, help="Scenario, e.g. phishing or o365_suspicious_login")
|
||||
parser.add_argument("--alert-type", default="", help="Alert type")
|
||||
parser.add_argument("--summary", default="", help="Short case summary")
|
||||
parser.add_argument("--top-k", type=int, default=3, help="Number of results to return")
|
||||
parser.add_argument("--cases-dir", default="evaluation/datasets/normalized_cases", help="Normalized case dataset directory")
|
||||
parser.add_argument("--knowledge-dir", default="evaluation/datasets/normalized_kb", help="Normalized knowledge dataset directory")
|
||||
parser.add_argument("--case-uri", default=CASE_URI_PREFIX, help="OpenViking case URI prefix")
|
||||
parser.add_argument("--knowledge-uri", default=KNOWLEDGE_URI_PREFIX, help="OpenViking knowledge URI prefix")
|
||||
args = parser.parse_args()
|
||||
|
||||
query = RetrievalQuery(
|
||||
scenario=args.scenario,
|
||||
alert_type=args.alert_type,
|
||||
summary=args.summary,
|
||||
top_k=args.top_k,
|
||||
)
|
||||
|
||||
if args.backend == "openviking":
|
||||
result = asyncio.run(retrieve_context_openviking(query, case_uri=args.case_uri, knowledge_uri=args.knowledge_uri))
|
||||
else:
|
||||
result = retrieve_context_local(query, cases_dir=args.cases_dir, knowledge_dir=args.knowledge_dir)
|
||||
print(json.dumps(result, ensure_ascii=False, indent=2))
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
17
skills/summarize_case_skill/README.md
Normal file
17
skills/summarize_case_skill/README.md
Normal file
@ -0,0 +1,17 @@
|
||||
# summarize_case_skill
|
||||
|
||||
This skill turns a normalized SOC case record into a reusable Obsidian case note.
|
||||
|
||||
Current scope:
|
||||
- input: normalized case JSON from `evaluation/datasets/normalized_cases/`
|
||||
- output: markdown case note under `obsidian-vault/02_Cases/`
|
||||
- goal: produce a clean analyst-facing note, not a raw process dump
|
||||
|
||||
Typical usage:
|
||||
|
||||
```bash
|
||||
source /home/tom/OpenViking/.venv/bin/activate
|
||||
PYTHONPATH=/home/tom/soc_memory_poc python /home/tom/soc_memory_poc/skills/summarize_case_skill/generate_case_note.py \
|
||||
--input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json \
|
||||
--output-dir /home/tom/soc_memory_poc/obsidian-vault/02_Cases
|
||||
```
|
||||
21
skills/summarize_case_skill/SKILL.md
Normal file
21
skills/summarize_case_skill/SKILL.md
Normal file
@ -0,0 +1,21 @@
|
||||
# summarize_case_skill
|
||||
|
||||
## Purpose
|
||||
Summarize one normalized SOC case into a high-quality Obsidian case note that can be reviewed and maintained by analysts.
|
||||
|
||||
## Inputs
|
||||
- A normalized case JSON document
|
||||
- Optional output directory for Obsidian notes
|
||||
|
||||
## Outputs
|
||||
- One markdown case note per case
|
||||
- Stable structure aligned with the vault template
|
||||
|
||||
## Guardrails
|
||||
- Do not dump raw logs or full tool traces
|
||||
- Keep only reusable evidence, conclusions, and response guidance
|
||||
- Prefer linked references to playbooks, KBs, and related cases
|
||||
- Preserve case identifiers and observable values exactly
|
||||
|
||||
## Current implementation
|
||||
Use `generate_case_note.py` to render a local markdown note from a normalized case.
|
||||
346
skills/summarize_case_skill/generate_case_note.py
Normal file
346
skills/summarize_case_skill/generate_case_note.py
Normal file
@ -0,0 +1,346 @@
|
||||
"""Generate an Obsidian case note from a normalized SOC case JSON file."""
|
||||
from __future__ import annotations
|
||||
|
||||
import argparse
|
||||
import asyncio
|
||||
import json
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
||||
from skills.retrieve_context_skill.retrieve_context import RetrievalQuery, retrieve_context_openviking
|
||||
|
||||
|
||||
def _load_case(path: str | Path) -> dict[str, Any]:
|
||||
with Path(path).open("r", encoding="utf-8") as f:
|
||||
return json.load(f)
|
||||
|
||||
|
||||
def _extract_alert_type(patterns: list[str]) -> str:
|
||||
for pattern in patterns:
|
||||
if pattern.startswith("alert_type:"):
|
||||
return pattern.split(":", 1)[1]
|
||||
return "unknown"
|
||||
|
||||
|
||||
def _verdict_label(verdict: str) -> str:
|
||||
mapping = {
|
||||
"true_positive": "真报",
|
||||
"false_positive": "误报",
|
||||
"suspicious": "可疑待定",
|
||||
}
|
||||
return mapping.get(verdict, verdict or "未知")
|
||||
|
||||
|
||||
def _join_values(values: list[str]) -> str:
|
||||
return ", ".join(values) if values else "无"
|
||||
|
||||
|
||||
def _bullet_lines(values: list[str], default: str = "- 无") -> str:
|
||||
if not values:
|
||||
return default
|
||||
return "\n".join(f"- {value}" for value in values)
|
||||
|
||||
|
||||
def _wikilinks(values: list[str]) -> str:
|
||||
if not values:
|
||||
return "无"
|
||||
return ", ".join(f"[[{value}]]" for value in values)
|
||||
|
||||
|
||||
def _uri_to_id(uri: str) -> str:
|
||||
name = uri.rstrip("/").rsplit("/", 1)[-1]
|
||||
if name.endswith(".json"):
|
||||
name = name[:-5]
|
||||
return name
|
||||
|
||||
|
||||
def _derive_process_summary(item: dict[str, Any]) -> list[str]:
|
||||
steps: list[str] = []
|
||||
if item.get("abstract"):
|
||||
steps.append(f"确认告警场景与核心风险:{item['abstract']}")
|
||||
if item.get("evidence"):
|
||||
steps.append(f"提取关键证据并交叉验证:{item['evidence'][0]}")
|
||||
related = item.get("related_refs", {})
|
||||
if related.get("playbooks") or related.get("kb"):
|
||||
steps.append("对照关联 playbook / KB 复核告警模式与处置路径。")
|
||||
if item.get("verdict"):
|
||||
steps.append(f"基于关键证据与场景模式完成结论判定:{_verdict_label(item['verdict'])}。")
|
||||
return steps[:4]
|
||||
|
||||
|
||||
def _derive_disposition(item: dict[str, Any]) -> list[str]:
|
||||
verdict = item.get("verdict", "")
|
||||
evidence = item.get("evidence", [])
|
||||
lines: list[str] = []
|
||||
if verdict:
|
||||
lines.append(f"结论为{_verdict_label(verdict)}。")
|
||||
if evidence:
|
||||
lines.append(f"最关键依据:{evidence[0]}")
|
||||
if len(evidence) > 1:
|
||||
lines.append(f"补充依据:{evidence[1]}")
|
||||
return lines
|
||||
|
||||
|
||||
def _derive_actions(item: dict[str, Any]) -> list[str]:
|
||||
scenario = item.get("scenario", "")
|
||||
verdict = item.get("verdict", "")
|
||||
actions: list[str] = []
|
||||
if scenario == "phishing":
|
||||
actions.extend([
|
||||
"隔离相同主题、发件人或 URL 的邮件样本。",
|
||||
"核查用户是否点击或提交凭据,并按需执行凭据重置。",
|
||||
])
|
||||
elif scenario == "o365_suspicious_login":
|
||||
actions.extend([
|
||||
"复核登录来源、MFA 事件和后续邮箱规则或 OAuth 变更。",
|
||||
"若存在账号接管迹象,立即执行会话失效和凭据重置。",
|
||||
])
|
||||
else:
|
||||
actions.append("结合关联 playbook 执行后续处置。")
|
||||
if verdict == "false_positive":
|
||||
actions = ["记录误报原因,并更新检测例外或抑制条件。"]
|
||||
return actions
|
||||
|
||||
|
||||
def _derive_reusable_patterns(item: dict[str, Any]) -> tuple[list[str], list[str], list[str]]:
|
||||
patterns = item.get("patterns", [])
|
||||
tags = item.get("tags", [])
|
||||
hit_patterns = [pattern for pattern in patterns if not pattern.startswith("verdict:")]
|
||||
false_positive_traits = []
|
||||
variants = []
|
||||
if item.get("verdict") == "false_positive":
|
||||
false_positive_traits.append("本案最终确认为误报,可用于补充抑制条件。")
|
||||
if tags:
|
||||
variants.append("相关标签:" + ", ".join(tags))
|
||||
return hit_patterns or ["无"], false_positive_traits or ["无"], variants or ["无"]
|
||||
|
||||
|
||||
async def _fetch_openviking_recommendations(item: dict[str, Any], top_k: int = 3) -> dict[str, list[dict[str, Any]]]:
|
||||
query = RetrievalQuery(
|
||||
scenario=item.get("scenario", "general"),
|
||||
alert_type=_extract_alert_type(item.get("patterns", [])),
|
||||
summary=item.get("abstract", ""),
|
||||
observables=item.get("observables"),
|
||||
top_k=top_k + 1,
|
||||
)
|
||||
result = await retrieve_context_openviking(query)
|
||||
|
||||
case_entries: list[dict[str, Any]] = []
|
||||
for entry in result.get("matched_cases", []):
|
||||
candidate_id = _uri_to_id(entry.get("uri", ""))
|
||||
if candidate_id == item.get("id"):
|
||||
continue
|
||||
case_entries.append(
|
||||
{
|
||||
"id": candidate_id,
|
||||
"score": round(float(entry.get("score") or 0), 3),
|
||||
"abstract": entry.get("abstract", ""),
|
||||
}
|
||||
)
|
||||
if len(case_entries) >= top_k:
|
||||
break
|
||||
|
||||
knowledge_entries: list[dict[str, Any]] = []
|
||||
for entry in result.get("matched_knowledge", []):
|
||||
knowledge_entries.append(
|
||||
{
|
||||
"id": _uri_to_id(entry.get("uri", "")),
|
||||
"score": round(float(entry.get("score") or 0), 3),
|
||||
"abstract": entry.get("abstract", ""),
|
||||
}
|
||||
)
|
||||
if len(knowledge_entries) >= top_k:
|
||||
break
|
||||
|
||||
return {
|
||||
"cases": case_entries,
|
||||
"knowledge": knowledge_entries,
|
||||
}
|
||||
|
||||
|
||||
def _merge_unique(primary: list[str], secondary: list[str]) -> list[str]:
|
||||
merged: list[str] = []
|
||||
for value in primary + secondary:
|
||||
if value and value not in merged:
|
||||
merged.append(value)
|
||||
return merged
|
||||
|
||||
|
||||
def _recommendation_lines(entries: list[dict[str, Any]], prefix: str) -> list[str]:
|
||||
lines: list[str] = []
|
||||
for entry in entries:
|
||||
abstract = entry.get("abstract", "")
|
||||
abstract = abstract[:140] + "..." if len(abstract) > 140 else abstract
|
||||
lines.append(f"[[{entry['id']}]] ({prefix} score={entry['score']}) {abstract}")
|
||||
return lines
|
||||
|
||||
|
||||
def render_case_note(item: dict[str, Any], recommendations: dict[str, list[dict[str, Any]]] | None = None) -> str:
|
||||
case_id = item["id"]
|
||||
title = item.get("title", case_id)
|
||||
alert_type = _extract_alert_type(item.get("patterns", []))
|
||||
severity = item.get("severity", "unknown")
|
||||
verdict = _verdict_label(item.get("verdict", ""))
|
||||
entities = item.get("entities", {})
|
||||
observables = item.get("observables", {})
|
||||
related = item.get("related_refs", {})
|
||||
recommendations = recommendations or {"cases": [], "knowledge": []}
|
||||
|
||||
recommended_cases = [entry["id"] for entry in recommendations.get("cases", [])]
|
||||
recommended_knowledge = [entry["id"] for entry in recommendations.get("knowledge", [])]
|
||||
|
||||
merged_cases = _merge_unique(related.get("cases", []), recommended_cases)
|
||||
playbooks = related.get("playbooks", [])
|
||||
kb_items = related.get("kb", [])
|
||||
for knowledge_id in recommended_knowledge:
|
||||
if knowledge_id.startswith("PB-"):
|
||||
playbooks = _merge_unique(playbooks, [knowledge_id])
|
||||
else:
|
||||
kb_items = _merge_unique(kb_items, [knowledge_id])
|
||||
|
||||
process_summary = _derive_process_summary(item)
|
||||
disposition = _derive_disposition(item)
|
||||
actions = _derive_actions(item)
|
||||
hit_patterns, false_positive_traits, variants = _derive_reusable_patterns(item)
|
||||
tags = ["#case", f"#scenario/{item.get('scenario', 'general')}", f"#alert/{alert_type}"]
|
||||
if item.get("verdict"):
|
||||
tags.append(f"#verdict/{item['verdict'].replace('_', '-')}")
|
||||
tags.extend(f"#{tag}" for tag in item.get("tags", []))
|
||||
|
||||
recommendation_case_lines = _recommendation_lines(recommendations.get("cases", []), "case")
|
||||
recommendation_knowledge_lines = _recommendation_lines(recommendations.get("knowledge", []), "knowledge")
|
||||
|
||||
lines = [
|
||||
"---",
|
||||
f"case_id: {case_id}",
|
||||
f"scenario: {item.get('scenario', 'general')}",
|
||||
f"alert_type: {alert_type}",
|
||||
f"severity: {severity}",
|
||||
f"verdict: {item.get('verdict', 'unknown')}",
|
||||
"source: soc-memory-poc",
|
||||
f"openviking_enriched: {'true' if recommendation_case_lines or recommendation_knowledge_lines else 'false'}",
|
||||
"---",
|
||||
"",
|
||||
f"# {case_id} {title}",
|
||||
"",
|
||||
"## 基本信息",
|
||||
"",
|
||||
f"- Case ID: {case_id}",
|
||||
f"- 标题: {title}",
|
||||
f"- 告警类型: {alert_type}",
|
||||
f"- 来源系统: SOC Memory POC Mock Dataset",
|
||||
f"- 时间范围: 待补充",
|
||||
f"- 研判人 / Agent: AI Agent Draft",
|
||||
f"- 最终结论: {verdict}",
|
||||
f"- 严重等级: {severity}",
|
||||
"",
|
||||
"## 告警摘要",
|
||||
"",
|
||||
item.get("abstract", "无"),
|
||||
"",
|
||||
"## 关键实体",
|
||||
"",
|
||||
f"- 用户: {_join_values(entities.get('users', []))}",
|
||||
f"- 主机: {_join_values(entities.get('hosts', []))}",
|
||||
f"- 邮箱: {_join_values(entities.get('mailboxes', []))}",
|
||||
f"- IP: {_join_values(observables.get('ips', []))}",
|
||||
f"- 域名: {_join_values(observables.get('domains', []))}",
|
||||
f"- 文件 Hash: {_join_values(observables.get('hashes', []))}",
|
||||
f"- 其他 IOC: {_join_values(observables.get('urls', []) + observables.get('sender_emails', []))}",
|
||||
"",
|
||||
"## 关键证据",
|
||||
"",
|
||||
_bullet_lines(item.get("evidence", [])),
|
||||
"",
|
||||
"## 研判过程摘要",
|
||||
"",
|
||||
"\n".join(f"{index}. {step}" for index, step in enumerate(process_summary, start=1)),
|
||||
"",
|
||||
"## 结论依据",
|
||||
"",
|
||||
_bullet_lines(disposition),
|
||||
"",
|
||||
"## 处置建议",
|
||||
"",
|
||||
_bullet_lines(actions),
|
||||
"",
|
||||
"## 可复用模式",
|
||||
"",
|
||||
f"- 命中模式: {_join_values(hit_patterns)}",
|
||||
f"- 误报特征: {_join_values(false_positive_traits)}",
|
||||
f"- 需关注的变体: {_join_values(variants)}",
|
||||
"",
|
||||
"## 关联知识",
|
||||
"",
|
||||
f"- 关联 Playbook: {_wikilinks(playbooks)}",
|
||||
f"- 关联 KB: {_wikilinks(kb_items)}",
|
||||
f"- 关联历史 Case: {_wikilinks(merged_cases)}",
|
||||
f"- 关联实体: {_wikilinks(entities.get('users', []) + entities.get('hosts', []))}",
|
||||
"",
|
||||
"## 自动关联推荐",
|
||||
"",
|
||||
"### 推荐历史 Case",
|
||||
"",
|
||||
_bullet_lines(recommendation_case_lines),
|
||||
"",
|
||||
"### 推荐知识条目",
|
||||
"",
|
||||
_bullet_lines(recommendation_knowledge_lines),
|
||||
"",
|
||||
"## Lessons Learned",
|
||||
"",
|
||||
"- 本案可沉淀为后续同类告警的快速判定参考。",
|
||||
"- 若后续出现相同 lure、同类登录模式或相同关键证据,应优先联想本案与关联知识。",
|
||||
"",
|
||||
"## 标签",
|
||||
"",
|
||||
_bullet_lines(tags),
|
||||
"",
|
||||
]
|
||||
return "\n".join(lines)
|
||||
|
||||
|
||||
def build_output_path(item: dict[str, Any], output_dir: str | Path) -> Path:
|
||||
scenario = item.get("scenario", "general")
|
||||
case_id = item["id"]
|
||||
safe_title = item.get("title", case_id).replace("/", "-")
|
||||
return Path(output_dir) / scenario / f"{case_id} - {safe_title}.md"
|
||||
|
||||
|
||||
async def generate_case_note_async(
|
||||
input_path: str | Path,
|
||||
output_dir: str | Path,
|
||||
enrich_from_openviking: bool = False,
|
||||
top_k: int = 3,
|
||||
) -> Path:
|
||||
item = _load_case(input_path)
|
||||
recommendations: dict[str, list[dict[str, Any]]] | None = None
|
||||
if enrich_from_openviking:
|
||||
recommendations = await _fetch_openviking_recommendations(item, top_k=top_k)
|
||||
output_path = build_output_path(item, output_dir)
|
||||
output_path.parent.mkdir(parents=True, exist_ok=True)
|
||||
output_path.write_text(render_case_note(item, recommendations=recommendations), encoding="utf-8")
|
||||
return output_path
|
||||
|
||||
|
||||
def main() -> None:
|
||||
parser = argparse.ArgumentParser(description="Generate an Obsidian case note from a normalized case JSON file.")
|
||||
parser.add_argument("--input", required=True, help="Normalized case JSON path")
|
||||
parser.add_argument("--output-dir", default="obsidian-vault/02_Cases", help="Obsidian cases output directory")
|
||||
parser.add_argument("--enrich-from-openviking", action="store_true", help="Retrieve related cases and knowledge from OpenViking")
|
||||
parser.add_argument("--top-k", type=int, default=3, help="Number of OpenViking recommendations per type")
|
||||
args = parser.parse_args()
|
||||
|
||||
output_path = asyncio.run(
|
||||
generate_case_note_async(
|
||||
args.input,
|
||||
args.output_dir,
|
||||
enrich_from_openviking=args.enrich_from_openviking,
|
||||
top_k=args.top_k,
|
||||
)
|
||||
)
|
||||
print(output_path)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
Reference in New Issue
Block a user