Initial SOC memory POC implementation

2026-04-27 17:13:06 +08:00
parent fc68581198
commit e6b1520bce
89 changed files with 7610 additions and 1 deletions
--- a/integrations/hermes/soc-memory-poc/SKILL.md
+++ b/integrations/hermes/soc-memory-poc/SKILL.md
@ -0,0 +1,314 @@
+---
+name: soc-memory-poc
+description: Load this skill whenever Hermes is handling SOC alert triage, phishing investigation, suspicious O365 login analysis, historical case lookup, Obsidian note lookup, case-note generation, or committing high-value SOC findings into the SOC Memory POC. It provides a strict triage workflow using the SOC Memory Gateway for search/write operations, local Obsidian vault search, and local SOC Memory POC scripts for Obsidian case note generation.
+version: 1.3.0
+metadata:
+  hermes:
+    tags: [soc, memory, openviking, obsidian, incident-response, case-triage, phishing, o365]
+    related_skills: [hermes-agent]
+---
+
+# SOC Memory POC
+
+Use this skill for SOC case workflows only. It is the default procedure for phishing-style alerts, suspicious O365 / Entra ID login cases, historical case comparison, Obsidian knowledge lookup, and case-note generation.
+
+## Mandatory Trigger Rule
+
+Load this skill immediately when the user asks Hermes to do any of the following:
+- investigate or triage a SOC alert
+- find similar phishing or O365 suspicious-login cases
+- retrieve related KB or playbook context before concluding a case
+- check whether Obsidian already has a related case note or knowledge note
+- generate an Obsidian case note from a normalized case
+- commit a normalized case or knowledge artifact into the SOC memory system
+
+If the task is clearly SOC triage related, do not proceed without using this skill.
+
+## What This Skill Connects To
+
+This skill assumes:
+- SOC Memory POC root: `/home/tom/soc_memory_poc`
+- Memory Gateway URL: `http://127.0.0.1:1934`
+- Gateway API key: empty by default unless configured otherwise
+- Obsidian vault root: `/home/tom/soc_memory_poc/obsidian-vault`
+
+Override with environment variables when needed:
+- `SOC_MEMORY_POC_ROOT`
+- `SOC_MEMORY_GATEWAY_URL`
+- `SOC_MEMORY_GATEWAY_API_KEY`
+
+Capabilities:
+- search SOC case / knowledge context through the Memory Gateway
+- search existing Obsidian notes by case ID, scenario, keywords, or tags
+- commit normalized case / knowledge JSON through the Memory Gateway
+- generate Obsidian case notes from normalized case JSON
+
+## Triage Workflow
+
+Follow this order unless the user explicitly asks for something narrower.
+
+### Preferred Path For Structured Alerts (Scheme A)
+
+If the user provides a structured alert summary with fields like user, host, sender, subject, attachment, URL, IP, alert type, or known facts, do **not** manually improvise the final answer from memory search results alone.
+
+Use the deterministic triage helper first:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_alert.py \
+  --scenario phishing \
+  --alert-type mail_suspicious_attachment \
+  --user alice@corp.example \
+  --host FIN-LAPTOP-12 \
+  --sender billing@vendor-payments.com \
+  --subject "Invoice overdue notice" \
+  --attachment invoice_review.html \
+  --url https://vendor-payments-login.com/review \
+  --ip 198.51.100.20 \
+  --summary "Invoice-themed phishing email with HTML attachment and credential harvesting link" \
+  --fact "DMARC failed" \
+  --fact "User may have clicked the link"
+```
+
+This script performs:
+- case retrieval from the SOC Memory Gateway
+- knowledge retrieval from the SOC Memory Gateway
+- Obsidian note lookup from the local vault
+- final markdown rendering with all required sections populated
+
+For scheme A, prefer returning the script output with only light cleanup. Do not drop the `关联 Memory Retrieval` or `关联 Obsidian 文档` sections.
+
+### Preferred Path For Freeform Alerts Or Raw Email Content
+
+If the user does **not** provide neatly separated fields, or pastes raw email content / ticket text / freeform alert text, do not force them into Scheme A manually.
+
+Use the unified triage helper:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --text "From: billing@vendor-payments.com
+To: alice@corp.example
+Subject: Invoice overdue notice
+Attachment: invoice_review.html
+User clicked the link after opening the HTML attachment. DMARC failed. Review at https://vendor-payments-login.com/review from IP 198.51.100.20 on host FIN-LAPTOP-12."
+```
+
+Or point it at a file:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/triage_email.py --file /path/to/raw_email.txt
+```
+
+This helper will:
+- infer the most likely scenario and alert type
+- extract sender, user, subject, attachment, URL, IP, and host when possible
+- carry over important facts like DMARC failure, user click, MFA fatigue, inbox rule, or OAuth consent
+- run the deterministic triage pipeline so the final answer still contains `关联 Memory Retrieval` and `关联 Obsidian 文档`
+
+For non-structured input, prefer this helper over freehand reasoning.
+
+For all SOC triage inputs, `triage_email.py` is the preferred single entrypoint. It accepts raw text, a file, or optional structured overrides, then calls the deterministic retrieval pipeline.
+
+### Phase 1: Ground The Case
+
+First identify:
+- scenario: `phishing`, `o365_suspicious_login`, or another SOC scenario
+- likely alert type
+- short case summary in one sentence
+- key observables if available: sender, URL, domain, IP, mailbox, user, hash
+
+Do not start by writing memory. Start by grounding the case.
+
+### Phase 2: Retrieve Memory Context Before Judging
+
+Before concluding the case, search both related history and related knowledge.
+
+1. Search similar historical cases.
+2. Search KB / playbook context.
+3. Compare the current case against what comes back.
+
+Run these separately for better precision.
+
+Case search example:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
+  --query "invoice phishing html attachment credential harvesting" \
+  --kind case --limit 5
+```
+
+Knowledge search example:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
+  --query "invoice phishing html attachment credential harvesting" \
+  --kind knowledge --limit 5
+```
+
+O365 example:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_context.py \
+  --query "impossible travel MFA fatigue inbox rule oauth consent" \
+  --kind knowledge --limit 5
+```
+
+Search scopes:
+- `case` -> `viking://resources/soc-memory-poc/case`
+- `knowledge` -> `viking://resources/soc-memory-poc/knowledge`
+- `all` -> `viking://resources/soc-memory-poc`
+
+### Phase 3: Retrieve Obsidian References
+
+After memory retrieval, look for related notes in the Obsidian vault so the final answer can reference existing human-readable documentation.
+
+Example:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/search_obsidian_docs.py \
+  --query "invoice phishing html attachment credential harvesting" \
+  --scenario phishing \
+  --limit 5
+```
+
+Use this to surface:
+- existing case notes
+- related scenario notes
+- notes whose names, tags, or content closely match the current case
+
+When reporting Obsidian references, include at least:
+- note title or file name
+- relative path under `obsidian-vault/`
+- why the note is relevant
+
+### Phase 4: Produce The Triage Output
+
+After retrieval, synthesize a result that includes:
+- likely verdict or current assessment
+- strongest evidence
+- closest matching historical cases
+- most relevant KB / playbook guidance
+- related Obsidian notes
+- recommended next investigation or response actions
+
+Do not just paste raw search output. Summarize why the returned items matter.
+
+## Final Output Template
+
+Unless the user asks for a different format, use this structure for final SOC triage answers:
+
+### 研判结果
+- one short paragraph with the likely verdict / current assessment
+
+### 关键证据
+- 2 to 5 flat bullets with the strongest evidence
+
+### 关联 Memory Retrieval
+- one flat bullet per retrieved case / knowledge item
+- include: ID + short relevance reason
+- example: `CASE-2026-0001`: same invoice lure + HTML attachment + credential harvesting flow
+
+### 关联 Obsidian 文档
+- one flat bullet per note
+- include: note name + relative path + one-line relevance reason
+- example: `CASE-2026-0001 - Finance user ...md` — `02_Cases/phishing/...` — already documents a near-identical phishing pattern
+
+### 建议动作
+- 2 to 5 flat bullets with next investigation or response steps
+
+If no Obsidian note matches, explicitly say `未找到直接关联的 Obsidian 文档`.
+
+### Phase 5: Generate Case Note When The Case Is Mature Enough
+
+If the task includes documenting the result, or the case already has a normalized JSON artifact, generate an Obsidian case note.
+
+Example:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/generate_case_note.py \
+  --input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json \
+  --enrich-from-openviking \
+  --top-k 3
+```
+
+This writes under `obsidian-vault/02_Cases/<scenario>/`.
+
+Use `--enrich-from-openviking` by default when the gateway is available.
+
+### Phase 6: Commit Only High-Value Artifacts
+
+If Hermes has a normalized case or knowledge JSON that is worth preserving, commit it through the Gateway.
+
+Example:
+
+```bash
+python /home/tom/.hermes/skills/soc-memory-poc/scripts/commit_case_memory.py \
+  --input /home/tom/soc_memory_poc/evaluation/datasets/normalized_cases/CASE-2026-0001.json
+```
+
+Only commit normalized, reusable artifacts. Do not commit raw logs, raw tool traces, or ad hoc chat text.
+
+## Recommended Defaults By Scenario
+
+### Phishing
+
+Default order:
+1. search `case`
+2. search `knowledge`
+3. search related Obsidian notes
+4. assess sender auth, lure type, landing page, user interaction
+5. generate case note if the case is already structured
+6. commit only if the case artifact is normalized and high value
+
+Good query ingredients:
+- lure theme
+- attachment type
+- credential harvesting
+- fake M365 login
+- sender domain
+- landing URL pattern
+
+### O365 Suspicious Login
+
+Default order:
+1. search `case`
+2. search `knowledge`
+3. search related Obsidian notes
+4. assess impossible travel, MFA fatigue, inbox rule abuse, OAuth consent, legacy auth
+5. generate case note if the case is already structured
+6. commit only if the case artifact is normalized and high value
+
+Good query ingredients:
+- impossible travel
+- MFA fatigue
+- inbox rule
+- foreign login
+- OAuth consent
+- legacy protocol
+
+## Failure Handling
+
+If Gateway search fails:
+- say explicitly that the SOC Memory Gateway is unavailable
+- do not pretend retrieval succeeded
+- continue with local reasoning only if the user still wants that
+
+If Obsidian search fails:
+- say explicitly that Obsidian references could not be retrieved
+- do not invent note names or paths
+
+If note generation fails:
+- report the failing path or command
+- do not claim the note was written
+
+If commit fails:
+- report the URI or file that failed
+- do not claim the memory was stored
+
+## Guardrails
+
+- Search `case` and `knowledge` separately before concluding a triage result.
+- Search Obsidian notes after memory retrieval so final output can point to human-readable references.
+- Prefer narrow, scenario-specific queries over vague long prompts.
+- Do not dump raw investigative process into memory.
+- Generate case notes from normalized case JSON, not from freeform chat.
+- Commit only high-value, reusable artifacts.
+- When Gateway results look noisy, explain that retrieval quality may still need SOC-specific reranking.