Add Hermes memory evaluation framework with LoCoMo dataset support

- Implement HermesClient for interacting with the Hermes CLI. - Create judge module for grading QA outputs from Hermes memory. - Develop LoCoMo dataset parsing and formatting utilities. - Introduce run_eval script to facilitate memory evaluation using LoCoMo-style datasets.
2026-05-27 17:06:26 +08:00
parent ba59133d80
commit c173fa45a7
11 changed files with 68338 additions and 0 deletions
--- a/eval/init.py
+++ b/eval/init.py
@ -0,0 +1 @@
 """Evaluation utilities."""
--- a/eval/hermes_memory_eval/README.md
+++ b/eval/hermes_memory_eval/README.md
@ -0,0 +1,139 @@
 # Hermes Memory Evaluation
 This is a small LoCoMo-style memory evaluation runner for Hermes Agent.
 It follows the same shape as `openclaw-eval`: ingest historical conversations, ask QA questions with the same user id, then use an LLM judge to score the answers.
 ## 1. Configure Hermes Memory
 Install or copy the `memory_system` Hermes plugin, then put Memory System settings in `/home/tom/.hermes/memory_system.env`:
 ```dotenv
 MEMORY_SYSTEM_ENDPOINT=http://127.0.0.1:1934
 MEMORY_SYSTEM_USER_ID=default
 MEMORY_SYSTEM_SEARCH_USE_LLM=false
 MEMORY_SYSTEM_COMMIT_EVERY_TURNS=1
 MEMORY_SYSTEM_COMMIT_INTERVAL_SECONDS=0
 ```
 The eval runner overrides `MEMORY_SYSTEM_USER_ID` per LoCoMo sample, so one sample maps to one memory user.
 ## 2. Prepare Config
 Copy and edit:
 ```bash
 cp eval/hermes_memory_eval/config.example.yaml eval/hermes_memory_eval/config.yaml
 ```
 For a stable eval, keep:
 ```yaml
 memory:
  commit_every_turns: 1
  commit_interval_seconds: 0
 ```
 ## 3. Ingest Conversations
 Before ingest, verify the eval Hermes home can see the plugin:
 ```bash
 HERMES_HOME=/home/tom/memory-gateway/eval/hermes_memory_eval/hermes_home hermes memory status
 ```
 The status must show `memory_system` as installed and active.
 Run a small smoke test first:
 ```bash
 python eval/hermes_memory_eval/run_eval.py ingest /path/to/locomo10_small.json \
  --config eval/hermes_memory_eval/config.yaml \
  --sample 0 \
  --sessions 1-2 \
  --output output/hermes_ingest.jsonl
 ```
 This sends each selected session to:
 ```bash
 hermes chat -Q --source memory-eval -q "<formatted session>"
 ```
 ## 4. Ask QA Questions
 Use the same sample and user mapping:
 ```bash
 python eval/hermes_memory_eval/run_eval.py qa /path/to/locomo10_small.json \
  --config eval/hermes_memory_eval/config.yaml \
  --sample 0 \
  --count 10 \
  --output output/hermes_qa.jsonl
 ```
 Each QA runs in a fresh Hermes CLI call, so the answer should come from persistent memory rather than the prior short-term chat context.
 The default QA prompt explicitly asks Hermes to call `memory_system_search` before answering.
 If Memory System API does not log `POST /memory-system/search`, inspect the session JSON to confirm whether the model made a tool call.
 ## 5. Judge Answers
 Use the `judge` section in `config.yaml`:
 ```yaml
 judge:
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"
  model: "gpt-4o-mini"
  parallel: 4
  timeout_seconds: 120
 ```
 Then run:
 ```bash
 OPENAI_API_KEY=sk-... python eval/hermes_memory_eval/judge.py output/hermes_qa.jsonl \
  --config eval/hermes_memory_eval/config.yaml \
  --output output/hermes_grades.json
 ```
 For Ark/Doubao-style endpoints:
 ```yaml
 judge:
  base_url: "https://ark.cn-beijing.volces.com/api/v3"
  api_key_env: "ARK_API_KEY"
  model: "doubao-seed-2-0-pro-260215"
 ```
 ```bash
 ARK_API_KEY=... python eval/hermes_memory_eval/judge.py output/hermes_qa.jsonl \
  --config eval/hermes_memory_eval/config.yaml \
  --output output/hermes_grades.json
 ```
 ## Recommended Comparisons
 Run the same dataset in these modes:
 - no external memory
 - `MEMORY_SYSTEM_SEARCH_USE_LLM=false`
 - `MEMORY_SYSTEM_SEARCH_USE_LLM=true`
 Compare final QA score and inspect failed examples. If search recall is high but QA accuracy is low, Hermes is not using retrieved memory well. If search recall is low, the issue is likely write/extract/search quality.
 ## Current Small Dataset Result
 On `locomo10_small.json` sample `conv-26`, the current smoke test results are:
 | Mode | Score | Category 1 | Category 2 | Category 3 | Category 4 |
 | --- | ---: | ---: | ---: | ---: | ---: |
 | Memory System enabled | 5/35 (14.29%) | 0/5 (0.00%) | 1/9 (11.11%) | 1/2 (50.00%) | 3/19 (15.79%) |
 | No external memory | 0/35 (0.00%) | 0/5 (0.00%) | 0/9 (0.00%) | 0/2 (0.00%) | 0/19 (0.00%) |
 This means the Memory System path is contributing signal over the no-memory baseline, but the absolute score is still low. The main follow-up is to inspect failed QA examples and separate retrieval failure from answer-use failure:
 - If `POST /memory-system/search` does not appear during QA, Hermes did not call the memory tool.
 - If search results do not contain the evidence/gold answer, the write/extract/search path needs improvement.
 - If search results contain the evidence but the answer is wrong, Hermes is not using retrieved memory effectively.
 For future runs, keep a fresh `user_prefix` per mode so OpenViking/EverOS memory from prior runs does not contaminate results.
--- a/eval/hermes_memory_eval/init.py
+++ b/eval/hermes_memory_eval/init.py
@ -0,0 +1,2 @@
 """Hermes memory evaluation helpers."""
--- a/eval/hermes_memory_eval/config.example.yaml
+++ b/eval/hermes_memory_eval/config.example.yaml
@ -0,0 +1,25 @@
 hermes:
  command: "hermes"
  timeout_seconds: 600
  quiet: true
  source: "memory-eval"
  extra_args: []
 memory:
  env_file: "/home/tom/.hermes/memory_system.env"
  endpoint: "http://127.0.0.1:1934"
  api_key: ""
  user_prefix: "locomo-"
  search_use_llm: false
  commit_every_turns: 1
  commit_interval_seconds: 0
 qa:
  prompt_template: "请先使用 memory_system_search 查询长期记忆，再根据检索到的记忆回答问题。如果记忆中没有答案，请直接说不知道，不要编造。\n\n问题：{question}"
 judge:
  base_url: "https://api.openai.com/v1"
  api_key_env: "OPENAI_API_KEY"
  model: "gpt-4o-mini"
  parallel: 4
  timeout_seconds: 120
--- a/eval/hermes_memory_eval/config_no_memory.yaml
+++ b/eval/hermes_memory_eval/config_no_memory.yaml
@ -0,0 +1,25 @@
 hermes:
  command: "hermes"
  timeout_seconds: 600
  quiet: true
  source: "memory-eval"
  extra_args: []
 memory:
  env_file: "/home/tom/memory-gateway/eval/hermes_memory_eval/hermes_home/memory_system.env"
  endpoint: "http://127.0.0.1:1934"
  api_key: ""
  user_prefix: "locomo-full-nomemory-20260520-"
  search_use_llm: false
  commit_every_turns: 1
  commit_interval_seconds: 0
 qa:
  prompt_template: "{question}"
 judge:
  base_url: "https://oai.bwgdi.com/v1"
  model: "Qwen3.6-35B"
  api_key: "sk-4BxeAtnQCRv3x1xwRcmTJg"
  parallel: 4
  timeout_seconds: 120
--- a/eval/hermes_memory_eval/datasets/locomo10.json
+++ b/eval/hermes_memory_eval/datasets/locomo10.json
--- a/eval/hermes_memory_eval/datasets/locomo10_small.json
+++ b/eval/hermes_memory_eval/datasets/locomo10_small.json
@ -0,0 +1,852 @@
 [
  {
    "sample_id": "conv-26",
    "conversation": {
      "speaker_a": "Caroline",
      "speaker_b": "Melanie",
      "session_1_date_time": "1:56 pm on 8 May, 2023",
      "session_1": [
        {
          "speaker": "Caroline",
          "dia_id": "D1:1",
          "text": "Hey Mel! Good to see you! How have you been?"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:2",
          "text": "Hey Caroline! Good to see you! I'm swamped with the kids & work. What's up with you? Anything new?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:3",
          "text": "I went to a LGBTQ support group yesterday and it was so powerful."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:4",
          "text": "Wow, that's cool, Caroline! What happened that was so awesome? Did you hear any inspiring stories?"
        },
        {
          "speaker": "Caroline",
          "img_url": [
            "https://i.redd.it/l7hozpetnhlb1.jpg"
          ],
          "blip_caption": "a photo of a dog walking past a wall with a painting of a woman",
          "query": "transgender pride flag mural",
          "dia_id": "D1:5",
          "text": "The transgender stories were so inspiring! I was so happy and thankful for all the support."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:6",
          "text": "Wow, love that painting! So cool you found such a helpful group. What's it done for you?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:7",
          "text": "The support group has made me feel accepted and given me courage to embrace myself."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:8",
          "text": "That's really cool. You've got guts. What now?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:9",
          "text": "Gonna continue my edu and check out career options, which is pretty exciting!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:10",
          "text": "Wow, Caroline! What kinda jobs are you thinkin' of? Anything that stands out?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:11",
          "text": "I'm keen on counseling or working in mental health - I'd love to support those with similar issues."
        },
        {
          "speaker": "Melanie",
          "img_url": [
            "http://candicealexander.com/cdn/shop/products/IMG_7269_a49d5af8-c76c-4ecd-ae20-48c08cb11dec.jpg"
          ],
          "blip_caption": "a photo of a painting of a sunset over a lake",
          "query": "painting sunrise",
          "dia_id": "D1:12",
          "text": "You'd be a great counselor! Your empathy and understanding will really help the people you work with. By the way, take a look at this."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:13",
          "text": "Thanks, Melanie! That's really sweet. Is this your own painting?"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:14",
          "text": "Yeah, I painted that lake sunrise last year! It's special to me."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:15",
          "text": "Wow, Melanie! The colors really blend nicely. Painting looks like a great outlet for expressing yourself."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:16",
          "text": "Thanks, Caroline! Painting's a fun way to express my feelings and get creative. It's a great way to relax after a long day."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D1:17",
          "text": "Totally agree, Mel. Relaxing and expressing ourselves is key. Well, I'm off to go do some research."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D1:18",
          "text": "Yep, Caroline. Taking care of ourselves is vital. I'm off to go swimming with the kids. Talk to you soon!"
        }
      ],
      "session_2_date_time": "1:14 pm on 25 May, 2023",
      "session_2": [
        {
          "speaker": "Melanie",
          "dia_id": "D2:1",
          "text": "Hey Caroline, since we last chatted, I've had a lot of things happening to me. I ran a charity race for mental health last Saturday \u2013 it was really rewarding. Really made me think about taking care of our minds."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:2",
          "text": "That charity race sounds great, Mel! Making a difference & raising awareness for mental health is super rewarding - I'm really proud of you for taking part!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:3",
          "text": "Thanks, Caroline! The event was really thought-provoking. I'm starting to realize that self-care is really important. It's a journey for me, but when I look after myself, I'm able to better look after my family."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:4",
          "text": "I totally agree, Melanie. Taking care of ourselves is so important - even if it's not always easy. Great that you're prioritizing self-care."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:5",
          "text": "Yeah, it's tough. So I'm carving out some me-time each day - running, reading, or playing my violin - which refreshes me and helps me stay present for my fam!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:6",
          "text": "That's great, Mel! Taking time for yourself is so important. You're doing an awesome job looking after yourself and your family!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:7",
          "text": "Thanks, Caroline. It's still a work in progress, but I'm doing my best. My kids are so excited about summer break! We're thinking about going camping next month. Any fun plans for the summer?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:8",
          "text": "Researching adoption agencies \u2014 it's been a dream to have a family and give a loving home to kids who need it."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:9",
          "text": "Wow, Caroline! That's awesome! Taking in kids in need - you're so kind. Your future family is gonna be so lucky to have you!"
        },
        {
          "speaker": "Caroline",
          "img_url": [
            "https://live.staticflickr.com/3437/3935231341_b2955b00dd_b.jpg"
          ],
          "blip_caption": "a photography of a sign for a new arrival and an information and domestic building",
          "query": "adoption agency brochure",
          "dia_id": "D2:10",
          "re-download": true,
          "text": "Thanks, Mel! My goal is to give kids a loving home. I'm truly grateful for all the support I've got from friends and mentors. Now the hard work starts to turn my dream into a reality. And here's one of the adoption agencies I'm looking into. It's a lot to take in, but I'm feeling hopeful and optimistic."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:11",
          "text": "Wow, that agency looks great! What made you pick it?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:12",
          "text": "I chose them 'cause they help LGBTQ+ folks with adoption. Their inclusivity and support really spoke to me."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:13",
          "text": "That's great, Caroline! Loving the inclusivity and support. Anything you're excited for in the adoption process?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:14",
          "text": "I'm thrilled to make a family for kids who need one. It'll be tough as a single parent, but I'm up for the challenge!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:15",
          "text": "You're doing something amazing! Creating a family for those kids is so lovely. You'll be an awesome mom! Good luck!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D2:16",
          "text": "Thanks, Melanie! Your kind words really mean a lot. I'll do my best to make sure these kids have a safe and loving home."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D2:17",
          "text": "No doubts, Caroline. You have such a caring heart - they'll get all the love and stability they need! Excited for this new chapter!"
        }
      ],
      "session_3_date_time": "7:55 pm on 9 June, 2023",
      "session_3": [
        {
          "speaker": "Caroline",
          "dia_id": "D3:1",
          "text": "Hey Melanie! How's it going? I wanted to tell you about my school event last week. It was awesome! I talked about my transgender journey and encouraged students to get involved in the LGBTQ community. It was great to see their reactions. It made me reflect on how far I've come since I started transitioning three years ago."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:2",
          "text": "Hey Caroline! Great to hear from you. Sounds like your event was amazing! I'm so proud of you for spreading awareness and getting others involved in the LGBTQ community. You've come a long way since your transition - keep on inspiring people with your strength and courage!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:3",
          "text": "Thanks, Mel! Your backing really means a lot. I felt super powerful giving my talk. I shared my own journey, the struggles I had and how much I've developed since coming out. It was wonderful to see how the audience related to what I said and how it inspired them to be better allies. Conversations about gender identity and inclusion are so necessary and I'm thankful for being able to give a voice to the trans community."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:4",
          "text": "Wow, Caroline, you're doing an awesome job of inspiring others with your journey. It's great to be part of it and see how you're positively affecting so many. Talking about inclusivity and acceptance is crucial, and you're so brave to speak up for the trans community. Keep up the great work!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:5",
          "text": "Thanks Mel! Your kind words mean a lot. Sharing our experiences isn't always easy, but I feel it's important to help promote understanding and acceptance. I've been blessed with loads of love and support throughout this journey, and I want to pass it on to others. By sharing our stories, we can build a strong, supportive community of hope."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:6",
          "text": "Yeah, Caroline! It takes courage to talk about our own stories. But it's in these vulnerable moments that we bond and understand each other. We all have our different paths, but if we share them, we show people that they're not alone. Our stories can be so inspiring and encouraging to others who are facing the same challenges. Thank you for using your voice to create love, acceptance, and hope. You're doing amazing!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:7",
          "text": "Your words mean a lot to me. I'm grateful for the chance to share my story and give others hope. We all have unique paths, and by working together we can build a more inclusive and understanding world. I'm going to keep using my voice to make a change and lift others up. And you're part of that!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:8",
          "text": "Thanks, Caroline, for letting me join your journey. I'm so proud to be part of the difference you're making. Let's keep motivating and helping each other out as we journey through life. We can make a real impact together!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:9",
          "text": "Yeah Mel, let's spread love and understanding! Thanks for the support and encouragement. We can tackle life's challenges together! We got this!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:10",
          "text": "Yes, Caroline! We can do it. Your courage is inspiring. I want to be couragous for my family- they motivate me and give me love. What motivates you?"
        },
        {
          "speaker": "Caroline",
          "img_url": [
            "https://fox2now.com/wp-content/uploads/sites/14/2023/08/that-tall-family.jpg"
          ],
          "blip_caption": "a photo of a family posing for a picture in a yard",
          "query": "group of friends and family",
          "dia_id": "D3:11",
          "text": "Thanks, Mel! My friends, family and mentors are my rocks \u2013 they motivate me and give me the strength to push on. Here's a pic from when we met up last week!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:12",
          "text": "Wow, that photo is great! How long have you had such a great support system?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:13",
          "text": "Yeah, I'm really lucky to have them. They've been there through everything, I've known these friends for 4 years, since I moved from my home country. Their love and help have been so important especially after that tough breakup. I'm super thankful. Who supports you, Mel?"
        },
        {
          "speaker": "Melanie",
          "img_url": [
            "https://mrswebersneighborhood.com/wp-content/uploads/2022/07/Cedar-Falls-Hocking-Hills.jpg"
          ],
          "blip_caption": "a photo of a man and a little girl standing in front of a waterfall",
          "query": "husband kids hiking nature",
          "dia_id": "D3:14",
          "text": "I'm lucky to have my husband and kids; they keep me motivated."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:15",
          "text": "Wow, what an amazing family pic! How long have you been married?"
        },
        {
          "speaker": "Melanie",
          "img_url": [
            "https://i.redd.it/8o28nfllf3eb1.jpg"
          ],
          "blip_caption": "a photo of a bride in a wedding dress holding a bouquet",
          "query": "wedding day",
          "dia_id": "D3:16",
          "text": "5 years already! Time flies- feels like just yesterday I put this dress on! Thanks, Caroline!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:17",
          "text": "Congrats, Melanie! You both looked so great on your wedding day! Wishing you many happy years together!"
        },
        {
          "speaker": "Melanie",
          "img_url": [
            "http://shirleyswardrobe.com/wp-content/uploads/2017/07/LF-Picnic-6.jpg"
          ],
          "blip_caption": "a photo of a man and woman sitting on a blanket eating food",
          "query": "family picnic park laughing",
          "dia_id": "D3:18",
          "text": "Thanks, Caroline! Appreciate your kind words. Looking forward to more happy years. Our family and moments make it all worth it."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:19",
          "text": "Looks like you had a great day! How was it? You all look so happy!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:20",
          "text": "It so fun! We played games, ate good food, and just hung out together. Family moments make life awesome."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:21",
          "text": "Sounds great, Mel! Glad you had a great time. Cherish the moments - they're the best!"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D3:22",
          "text": "Absolutely, Caroline! I cherish time with family. It's when I really feel alive and happy."
        },
        {
          "speaker": "Caroline",
          "dia_id": "D3:23",
          "text": "I 100% agree, Mel. Hanging with loved ones is amazing and brings so much happiness. Those moments really make me thankful. Family is everything."
        }
      ],
      "session_4_date_time": "10:37 am on 27 June, 2023",
      "session_4": [
        {
          "speaker": "Caroline",
          "img_url": [
            "https://i.redd.it/67uas3gnmz7b1.jpg"
          ],
          "blip_caption": "a photo of a person holding a necklace with a cross and a heart",
          "query": "pendant transgender symbol",
          "dia_id": "D4:1",
          "text": "Hey Melanie! Long time no talk! A lot's been going on in my life! Take a look at this."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:2",
          "text": "Hey, Caroline! Nice to hear from you! Love the necklace, any special meaning to it?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:3",
          "text": "Thanks, Melanie! This necklace is super special to me - a gift from my grandma in my home country, Sweden. She gave it to me when I was young, and it stands for love, faith and strength. It's like a reminder of my roots and all the love and support I get from my family."
        },
        {
          "speaker": "Melanie",
          "blip_caption": "a photo of a stack of bowls with different designs on them",
          "dia_id": "D4:4",
          "text": "That's gorgeous, Caroline! It's awesome what items can mean so much to us, right? Got any other objects that you treasure, like that necklace?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:5",
          "text": "Yep, Melanie! I've got some other stuff with sentimental value, like my hand-painted bowl. A friend made it for my 18th birthday ten years ago. The pattern and colors are awesome-- it reminds me of art and self-expression."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:6",
          "text": "That sounds great, Caroline! It's awesome having stuff around that make us think of good connections and times. Actually, I just took my fam camping in the mountains last week - it was a really nice time together!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:7",
          "text": "Sounds great, Mel. Glad you made some new family mems. How was it? Anything fun?"
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:8",
          "text": "It was an awesome time, Caroline! We explored nature, roasted marshmallows around the campfire and even went on a hike. The view from the top was amazing! The 2 younger kids love nature. It was so special having these moments together as a family - I'll never forget it!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:9",
          "text": "That's awesome, Melanie! Family moments like that are so special. Glad y'all had such a great time."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:10",
          "text": "Thanks, Caroline! Family time matters to me. What's up with you lately?"
        },
        {
          "speaker": "Caroline",
          "blip_caption": "a photo of a book shelf with many books on it",
          "dia_id": "D4:11",
          "text": "Lately, I've been looking into counseling and mental health as a career. I want to help people who have gone through the same things as me."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:12",
          "text": "Sounds great! What kind of counseling and mental health services do you want to persue?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:13",
          "text": "I'm still figuring out the details, but I'm thinking of working with trans people, helping them accept themselves and supporting their mental health. Last Friday, I went to an LGBTQ+ counseling workshop and it was really enlightening. They talked about different therapeutic methods and how to best work with trans people. Seeing how passionate these pros were about making a safe space for people like me was amazing."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:14",
          "text": "Woah, Caroline, it sounds like you're doing some impressive work. It's inspiring to see your dedication to helping others. What motivated you to pursue counseling?"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:15",
          "text": "Thanks, Melanie. It really mattered. My own journey and the support I got made a huge difference. Now I want to help people go through it too. I saw how counseling and support groups improved my life, so I started caring more about mental health and understanding myself. Now I'm passionate about creating a safe, inviting place for people to grow."
        },
        {
          "speaker": "Melanie",
          "dia_id": "D4:16",
          "text": "Wow, Caroline! You've gained so much from your own experience. Your passion and hard work to help others is awesome. Keep it up, you're making a big impact!"
        },
        {
          "speaker": "Caroline",
          "dia_id": "D4:17",
          "text": "Thanks, Melanie! Your kind words mean a lot."
        },
        {
          "speaker": "Melanie",
          "blip_caption": "a photo of a book shelf filled with books in a room",
          "dia_id": "D4:18",
          "text": "Congrats Caroline! Good on you for going after what you really care about."
        }
      ]
    },
    "qa": [
      {
        "question": "What did Caroline realize after her charity race?",
        "evidence": [
          "D2:3"
        ],
        "category": 5,
        "answer": "self-care is important"
      },
      {
        "question": "When did Caroline go to the LGBTQ support group?",
        "answer": "7 May 2023",
        "evidence": [
          "D1:3"
        ],
        "category": 2
      },
      {
        "question": "When did Melanie paint a sunrise?",
        "answer": 2022,
        "evidence": [
          "D1:12"
        ],
        "category": 2
      },
      {
        "question": "What fields would Caroline be likely to pursue in her educaton?",
        "answer": "Psychology, counseling certification",
        "evidence": [
          "D1:9",
          "D1:11"
        ],
        "category": 3
      },
      {
        "question": "What did Caroline research?",
        "answer": "Adoption agencies",
        "evidence": [
          "D2:8"
        ],
        "category": 1
      },
      {
        "question": "What is Caroline's identity?",
        "answer": "Transgender woman",
        "evidence": [
          "D1:5"
        ],
        "category": 1
      },
      {
        "question": "When did Melanie run a charity race?",
        "answer": "The sunday before 25 May 2023",
        "evidence": [
          "D2:1"
        ],
        "category": 2
      },
      {
        "question": "When is Melanie planning on going camping?",
        "answer": "June 2023",
        "evidence": [
          "D2:7"
        ],
        "category": 2
      },
      {
        "question": "What is Caroline's relationship status?",
        "answer": "Single",
        "evidence": [
          "D3:13",
          "D2:14"
        ],
        "category": 1
      },
      {
        "question": "When did Caroline give a speech at a school?",
        "answer": "The week before 9 June 2023",
        "evidence": [
          "D3:1"
        ],
        "category": 2
      },
      {
        "question": "When did Caroline meet up with her friends, family, and mentors?",
        "answer": "The week before 9 June 2023",
        "evidence": [
          "D3:11"
        ],
        "category": 2
      },
      {
        "question": "How long has Caroline had her current group of friends for?",
        "answer": "4 years",
        "evidence": [
          "D3:13"
        ],
        "category": 2
      },
      {
        "question": "Where did Caroline move from 4 years ago?",
        "answer": "Sweden",
        "evidence": [
          "D3:13",
          "D4:3"
        ],
        "category": 1
      },
      {
        "question": "How long ago was Caroline's 18th birthday?",
        "answer": "10 years ago",
        "evidence": [
          "D4:5"
        ],
        "category": 2
      },
      {
        "question": "What career path has Caroline decided to persue?",
        "answer": "counseling or mental health for Transgender people",
        "evidence": [
          "D4:13",
          "D1:11"
        ],
        "category": 1
      },
      {
        "question": "Would Caroline still want to pursue counseling as a career if she hadn't received support growing up?",
        "answer": "Likely no",
        "evidence": [
          "D4:15",
          "D3:5"
        ],
        "category": 3
      },
      {
        "question": "When did Melanie go camping in June?",
        "answer": "The week before 27 June 2023",
        "evidence": [
          "D4:8"
        ],
        "category": 2
      },
      {
        "question": "What did the charity race raise awareness for?",
        "answer": "mental health",
        "evidence": [
          "D2:2"
        ],
        "category": 4
      },
      {
        "question": "What did Melanie realize after the charity race?",
        "answer": "self-care is important",
        "evidence": [
          "D2:3"
        ],
        "category": 4
      },
      {
        "question": "How does Melanie prioritize self-care?",
        "answer": "by carving out some me-time each day for activities like running, reading, or playing the violin",
        "evidence": [
          "D2:5"
        ],
        "category": 4
      },
      {
        "question": "What are Caroline's plans for the summer?",
        "answer": "researching adoption agencies",
        "evidence": [
          "D2:8"
        ],
        "category": 4
      },
      {
        "question": "What type of individuals does the adoption agency Caroline is considering support?",
        "answer": "LGBTQ+ individuals",
        "evidence": [
          "D2:12"
        ],
        "category": 4
      },
      {
        "question": "Why did Caroline choose the adoption agency?",
        "answer": "because of their inclusivity and support for LGBTQ+ individuals",
        "evidence": [
          "D2:12"
        ],
        "category": 4
      },
      {
        "question": "What is Caroline excited about in the adoption process?",
        "answer": "creating a family for kids who need one",
        "evidence": [
          "D2:14"
        ],
        "category": 4
      },
      {
        "question": "What does Melanie think about Caroline's decision to adopt?",
        "answer": "she thinks Caroline is doing something amazing and will be an awesome mom",
        "evidence": [
          "D2:15"
        ],
        "category": 4
      },
      {
        "question": "How long have Mel and her husband been married?",
        "answer": "Mel and her husband have been married for 5 years.",
        "evidence": [
          "D3:16"
        ],
        "category": 4
      },
      {
        "question": "What does Caroline's necklace symbolize?",
        "answer": "love, faith, and strength",
        "evidence": [
          "D4:3"
        ],
        "category": 4
      },
      {
        "question": "What country is Caroline's grandma from?",
        "answer": "Sweden",
        "evidence": [
          "D4:3"
        ],
        "category": 4
      },
      {
        "question": "What was grandma's gift to Caroline?",
        "answer": "necklace",
        "evidence": [
          "D4:3"
        ],
        "category": 4
      },
      {
        "question": "What is Melanie's hand-painted bowl a reminder of?",
        "answer": "art and self-expression",
        "evidence": [
          "D4:5"
        ],
        "category": 4
      },
      {
        "question": "What did Melanie and her family do while camping?",
        "answer": "explored nature, roasted marshmallows, and went on a hike",
        "evidence": [
          "D4:8"
        ],
        "category": 4
      },
      {
        "question": "What kind of counseling and mental health services is Caroline interested in pursuing?",
        "answer": "working with trans people, helping them accept themselves and supporting their mental health",
        "evidence": [
          "D4:13"
        ],
        "category": 4
      },
      {
        "question": "What workshop did Caroline attend recently?",
        "answer": "LGBTQ+ counseling workshop",
        "evidence": [
          "D4:13"
        ],
        "category": 4
      },
      {
        "question": "What was discussed in the LGBTQ+ counseling workshop?",
        "answer": "therapeutic methods and how to best work with trans people",
        "evidence": [
          "D4:13"
        ],
        "category": 4
      },
      {
        "question": "What motivated Caroline to pursue counseling?",
        "answer": "her own journey and the support she received, and how counseling improved her life",
        "evidence": [
          "D4:15"
        ],
        "category": 4
      },
      {
        "question": "What kind of place does Caroline want to create for people?",
        "answer": "a safe and inviting place for people to grow",
        "evidence": [
          "D4:15"
        ],
        "category": 4
      },
      {
        "question": "What are Melanie's plans for the summer with respect to adoption?",
        "evidence": [
          "D2:8"
        ],
        "category": 5,
        "answer": "researching adoption agencies"
      },
      {
        "question": "What type of individuals does the adoption agency Melanie is considering support?",
        "evidence": [
          "D2:12"
        ],
        "category": 5,
        "answer": "LGBTQ+ individuals"
      },
      {
        "question": "Why did Melanie choose the adoption agency?",
        "evidence": [
          "D2:12"
        ],
        "category": 5,
        "answer": "because of their inclusivity and support for LGBTQ+ individuals"
      },
      {
        "question": "What is Melanie excited about in her adoption process?",
        "evidence": [
          "D2:14"
        ],
        "category": 5,
        "answer": "creating a family for kids who need one"
      },
      {
        "question": "What does Melanie's necklace symbolize?",
        "evidence": [
          "D4:3"
        ],
        "category": 5,
        "answer": "love, faith, and strength"
      },
      {
        "question": "What country is Melanie's grandma from?",
        "evidence": [
          "D4:3"
        ],
        "category": 5,
        "answer": "Sweden"
      },
      {
        "question": "What was grandma's gift to Melanie?",
        "evidence": [
          "D4:3"
        ],
        "category": 5,
        "answer": "necklace"
      },
      {
        "question": "What was grandpa's gift to Caroline?",
        "evidence": [
          "D4:3"
        ],
        "category": 5,
        "answer": "necklace"
      },
      {
        "question": "What is Caroline's hand-painted bowl a reminder of?",
        "evidence": [
          "D4:5"
        ],
        "category": 5,
        "answer": "art and self-expression"
      },
      {
        "question": "What did Caroline and her family do while camping?",
        "evidence": [
          "D4:8"
        ],
        "category": 5,
        "answer": "explored nature, roasted marshmallows, and went on a hike"
      },
      {
        "question": "What kind of counseling and mental health services is Melanie interested in pursuing?",
        "evidence": [
          "D4:13"
        ],
        "category": 5,
        "answer": "working with trans people, helping them accept themselves and supporting their mental health"
      },
      {
        "question": "What kind of counseling workshop did Melanie attend recently?",
        "evidence": [
          "D4:13"
        ],
        "category": 5,
        "answer": "LGBTQ+ counseling workshop"
      },
      {
        "question": "What motivated Melanie to pursue counseling?",
        "evidence": [
          "D4:15"
        ],
        "category": 5,
        "answer": "her own journey and the support she received, and how counseling improved her life"
      },
      {
        "question": "What kind of place does Melanie want to create for people?",
        "evidence": [
          "D4:15"
        ],
        "category": 5,
        "answer": "a safe and inviting place for people to grow"
      }
    ]
  }
 ]
--- a/eval/hermes_memory_eval/hermes_client.py
+++ b/eval/hermes_memory_eval/hermes_client.py
@ -0,0 +1,51 @@
 """Hermes CLI client used by the memory evaluation runner."""
 from __future__ import annotations
 import os
 import subprocess
 from dataclasses import dataclass, field
 from typing import Mapping
@dataclass(frozen=True)
 class HermesClientConfig:
    command: str = "hermes"
    timeout_seconds: int = 600
    quiet: bool = True
    source: str = "memory-eval"
    extra_args: list[str] = field(default_factory=list)
 class HermesClient:
    def __init__(self, config: HermesClientConfig):
        self._config = config
    def chat(self, message: str, *, user_id: str, env: Mapping[str, str] | None = None) -> str:
        command = [self._config.command, "chat"]
        if self._config.quiet:
            command.append("-Q")
        if self._config.source:
            command.extend(["--source", self._config.source])
        command.extend(self._config.extra_args)
        command.extend(["-q", message])
        process_env = os.environ.copy()
        process_env["MEMORY_SYSTEM_USER_ID"] = user_id
        if env:
            process_env.update({key: str(value) for key, value in env.items() if value is not None})
        result = subprocess.run(
            command,
            capture_output=True,
            check=False,
            env=process_env,
            text=True,
            timeout=self._config.timeout_seconds,
        )
        if result.returncode != 0:
            stderr = result.stderr.strip()
            stdout = result.stdout.strip()
            detail = stderr or stdout or f"exit code {result.returncode}"
            raise RuntimeError(f"Hermes command failed: {detail}")
        return result.stdout.strip()
--- a/eval/hermes_memory_eval/judge.py
+++ b/eval/hermes_memory_eval/judge.py
@ -0,0 +1,188 @@
 """LLM judge for Hermes memory QA outputs."""
 from __future__ import annotations
 import argparse
 import asyncio
 import json
 import os
 from pathlib import Path
 from typing import Any
 import httpx
 import yaml
 def load_answers(path: str | Path) -> list[dict[str, Any]]:
    input_path = Path(path)
    if input_path.suffix == ".jsonl":
        with input_path.open("r", encoding="utf-8") as file:
            return [json.loads(line) for line in file if line.strip()]
    with input_path.open("r", encoding="utf-8") as file:
        data = json.load(file)
    if isinstance(data, dict):
        return data.get("results", data.get("grades", []))
    if isinstance(data, list):
        return data
    raise ValueError("answers file must be JSON list, JSONL, or object with results")
 def load_config(path: str | Path | None) -> dict[str, Any]:
    if not path:
        return {}
    config_path = Path(path)
    if not config_path.exists():
        return {}
    with config_path.open("r", encoding="utf-8") as file:
        return yaml.safe_load(file) or {}
 def resolve_judge_config(args: argparse.Namespace) -> dict[str, Any]:
    config = load_config(args.config)
    judge = config.get("judge", {})
    base_url = args.base_url or judge.get("base_url") or os.environ.get("OPENAI_BASE_URL") or "https://api.openai.com/v1"
    model = args.model or judge.get("model") or "gpt-4o-mini"
    api_key_env = args.api_key_env or judge.get("api_key_env") or "OPENAI_API_KEY"
    api_key = args.api_key or judge.get("api_key") or os.environ.get(api_key_env, "")
    parallel = args.parallel if args.parallel is not None else int(judge.get("parallel", 4))
    timeout_seconds = args.timeout_seconds if args.timeout_seconds is not None else int(judge.get("timeout_seconds", 120))
    return {
        "base_url": str(base_url),
        "model": str(model),
        "api_key": str(api_key),
        "api_key_env": str(api_key_env),
        "parallel": int(parallel),
        "timeout_seconds": int(timeout_seconds),
    }
 def judge_prompt(question: str, expected: str, response: str) -> list[dict[str, str]]:
    return [
        {
            "role": "system",
            "content": "You are an expert grader for long-term memory QA. Return JSON only.",
        },
        {
            "role": "user",
            "content": (
                "Decide whether the generated answer matches the gold answer.\n"
                "Be generous: count it correct if it refers to the same fact, topic, person, place, or date.\n"
                "Return exactly JSON: {\"is_correct\":\"CORRECT\" or \"WRONG\", \"reasoning\":\"short reason\"}.\n\n"
                f"Question: {question}\n"
                f"Gold answer: {expected}\n"
                f"Generated answer: {response}"
            ),
        },
    ]
 async def grade_one(
    client: httpx.AsyncClient,
    *,
    base_url: str,
    api_key: str,
    model: str,
    item: dict[str, Any],
 ) -> dict[str, Any]:
    payload = {
        "model": model,
        "temperature": 0,
        "messages": judge_prompt(item["question"], item["expected"], item["response"]),
    }
    response = await client.post(
        f"{base_url.rstrip('/')}/chat/completions",
        headers={"Authorization": f"Bearer {api_key}", "Content-Type": "application/json"},
        json=payload,
    )
    response.raise_for_status()
    content = response.json()["choices"][0]["message"]["content"]
    parsed = json.loads(content)
    label = str(parsed.get("is_correct", parsed.get("label", "WRONG"))).strip().lower()
    return {
        **item,
        "grade": label == "correct",
        "judge_reasoning": parsed.get("reasoning", ""),
    }
 async def grade_answers(
    answers: list[dict[str, Any]],
    *,
    base_url: str,
    api_key: str,
    model: str,
    timeout_seconds: int = 120,
    parallel: int = 4,
 ) -> list[dict[str, Any]]:
    limits = httpx.Limits(max_connections=max(1, parallel))
    async with httpx.AsyncClient(timeout=timeout_seconds, limits=limits) as client:
        semaphore = asyncio.Semaphore(max(1, parallel))
        async def _grade(item: dict[str, Any]) -> dict[str, Any]:
            async with semaphore:
                return await grade_one(client, base_url=base_url, api_key=api_key, model=model, item=item)
        return await asyncio.gather(*[_grade(item) for item in answers])
 def summarize(grades: list[dict[str, Any]]) -> dict[str, Any]:
    correct = sum(1 for item in grades if item.get("grade"))
    total = len(grades)
    categories: dict[str, dict[str, int]] = {}
    for item in grades:
        category = str(item.get("category", "unknown"))
        categories.setdefault(category, {"correct": 0, "total": 0})
        categories[category]["total"] += 1
        if item.get("grade"):
            categories[category]["correct"] += 1
    return {
        "score": correct / total if total else 0.0,
        "correct": correct,
        "total": total,
        "categories": categories,
    }
 def main() -> None:
    parser = argparse.ArgumentParser(description="Judge Hermes memory QA answers")
    parser.add_argument("input", help="QA JSONL or JSON file")
    parser.add_argument("--config", default="eval/hermes_memory_eval/config.yaml")
    parser.add_argument("--output", default=None)
    parser.add_argument("--base-url", default=None)
    parser.add_argument("--api-key", default=None)
    parser.add_argument("--api-key-env", default=None)
    parser.add_argument("--model", default=None)
    parser.add_argument("--parallel", type=int, default=None)
    parser.add_argument("--timeout-seconds", type=int, default=None)
    args = parser.parse_args()
    judge_config = resolve_judge_config(args)
    if not judge_config["api_key"]:
        raise SystemExit(f"missing --api-key or {judge_config['api_key_env']}")
    answers = load_answers(args.input)
    grades = asyncio.run(
        grade_answers(
            answers,
            base_url=judge_config["base_url"],
            api_key=judge_config["api_key"],
            model=judge_config["model"],
            parallel=judge_config["parallel"],
            timeout_seconds=judge_config["timeout_seconds"],
        )
    )
    summary = summarize(grades)
    print(f"score: {summary['correct']}/{summary['total']} ({summary['score']:.2%})")
    for category, stats in sorted(summary["categories"].items()):
        total = stats["total"]
        score = stats["correct"] / total if total else 0.0
        print(f"category {category}: {stats['correct']}/{total} ({score:.2%})")
    if args.output:
        output = {"summary": summary, "grades": grades}
        Path(args.output).parent.mkdir(parents=True, exist_ok=True)
        with Path(args.output).open("w", encoding="utf-8") as file:
            json.dump(output, file, indent=2, ensure_ascii=False)
 if __name__ == "__main__":
    main()
--- a/eval/hermes_memory_eval/locomo.py
+++ b/eval/hermes_memory_eval/locomo.py
@ -0,0 +1,118 @@
 """LoCoMo dataset parsing and formatting for Hermes memory evaluation."""
 from __future__ import annotations
 import json
 from dataclasses import dataclass
 from pathlib import Path
 from typing import Any
@dataclass(frozen=True)
 class LocomoSession:
    sample_id: str
    session_key: str
    date_time: str
    message: str
@dataclass(frozen=True)
 class LocomoQA:
    sample_id: str
    question: str
    expected: str
    category: str
    evidence: list[Any]
 def load_samples(path: str | Path, sample_index: int | None = None) -> list[dict[str, Any]]:
    with Path(path).open("r", encoding="utf-8") as file:
        data = json.load(file)
    if not isinstance(data, list):
        raise ValueError("LoCoMo input must be a JSON list")
    if sample_index is None:
        return data
    if sample_index < 0 or sample_index >= len(data):
        raise ValueError(f"sample index {sample_index} out of range 0-{len(data) - 1}")
    return [data[sample_index]]
 def parse_session_range(value: str | None) -> tuple[int, int] | None:
    if not value:
        return None
    if "-" in value:
        start, end = value.split("-", 1)
        return int(start), int(end)
    number = int(value)
    return number, number
 def format_message(message: dict[str, Any]) -> str:
    speaker = message.get("speaker", "unknown")
    text = message.get("text", "")
    line = f"{speaker}: {text}"
    image_urls = message.get("img_url", [])
    if isinstance(image_urls, str):
        image_urls = [image_urls]
    caption = message.get("blip_caption", "")
    for url in image_urls:
        suffix = f": {caption}" if caption else ""
        line += f"\n{url}{suffix}"
    if caption and not image_urls:
        line += f"\n({caption})"
    return line
 def build_sessions(
    sample: dict[str, Any],
    session_range: tuple[int, int] | None = None,
    tail: str = "请记住以上历史对话，只回复 OK。",
 ) -> list[LocomoSession]:
    conversation = sample["conversation"]
    session_keys = sorted(
        [key for key in conversation if key.startswith("session_") and not key.endswith("_date_time")],
        key=lambda key: int(key.split("_")[1]),
    )
    sessions: list[LocomoSession] = []
    for session_key in session_keys:
        session_number = int(session_key.split("_")[1])
        if session_range:
            start, end = session_range
            if session_number < start or session_number > end:
                continue
        date_time = conversation.get(f"{session_key}_date_time", "")
        parts = [f"[group chat conversation: {date_time}]"]
        parts.extend(format_message(message) for message in conversation[session_key])
        if tail:
            parts.append(tail)
        sessions.append(
            LocomoSession(
                sample_id=str(sample["sample_id"]),
                session_key=session_key,
                date_time=date_time,
                message="\n\n".join(parts),
            )
        )
    return sessions
 def build_qas(sample: dict[str, Any], *, include_category_5: bool = False) -> list[LocomoQA]:
    qas: list[LocomoQA] = []
    for qa in sample.get("qa", []):
        category = str(qa.get("category", ""))
        if category == "5" and not include_category_5:
            continue
        qas.append(
            LocomoQA(
                sample_id=str(sample["sample_id"]),
                question=str(qa["question"]),
                expected=str(qa["answer"]),
                category=category,
                evidence=qa.get("evidence", []),
            )
        )
    return qas
 def sample_user_id(prefix: str, sample: dict[str, Any]) -> str:
    return f"{prefix}{sample['sample_id']}"
--- a/eval/hermes_memory_eval/run_eval.py
+++ b/eval/hermes_memory_eval/run_eval.py
@ -0,0 +1,186 @@
 """Run Hermes memory evaluation using LoCoMo-style datasets."""
 from __future__ import annotations
 import argparse
 import json
 import os
 import sys
 from pathlib import Path
 from typing import Any
 import yaml
 if __package__ in {None, ""}:
    sys.path.insert(0, str(Path(__file__).resolve().parents[2]))
 from eval.hermes_memory_eval.hermes_client import HermesClient, HermesClientConfig
 from eval.hermes_memory_eval.locomo import (
    build_qas,
    build_sessions,
    load_samples,
    parse_session_range,
    sample_user_id,
 )
 def load_config(path: str | Path) -> dict[str, Any]:
    with Path(path).open("r", encoding="utf-8") as file:
        return yaml.safe_load(file) or {}
 def memory_env(config: dict[str, Any]) -> dict[str, str]:
    memory = config.get("memory", {})
    env: dict[str, str] = {}
    mappings = {
        "env_file": "MEMORY_SYSTEM_ENV_FILE",
        "endpoint": "MEMORY_SYSTEM_ENDPOINT",
        "api_key": "MEMORY_SYSTEM_API_KEY",
        "search_use_llm": "MEMORY_SYSTEM_SEARCH_USE_LLM",
        "commit_every_turns": "MEMORY_SYSTEM_COMMIT_EVERY_TURNS",
        "commit_interval_seconds": "MEMORY_SYSTEM_COMMIT_INTERVAL_SECONDS",
    }
    for key, env_key in mappings.items():
        value = memory.get(key)
        if value is not None:
            env[env_key] = str(value)
    return env
 def build_client(config: dict[str, Any]) -> HermesClient:
    hermes = config.get("hermes", {})
    return HermesClient(
        HermesClientConfig(
            command=str(hermes.get("command", "hermes")),
            timeout_seconds=int(hermes.get("timeout_seconds", 600)),
            quiet=bool(hermes.get("quiet", True)),
            source=str(hermes.get("source", "memory-eval")),
            extra_args=[str(arg) for arg in hermes.get("extra_args", [])],
        )
    )
 def qa_prompt(config: dict[str, Any], question: str) -> str:
    qa_config = config.get("qa", {})
    template = str(
        qa_config.get(
            "prompt_template",
            (
                "请先使用 memory_system_search 查询长期记忆，再根据检索到的记忆回答问题。"
                "如果记忆中没有答案，请直接说不知道，不要编造。\n\n问题：{question}"
            ),
        )
    )
    return template.format(question=question)
 def write_jsonl(path: str | Path, records: list[dict[str, Any]]) -> None:
    output_path = Path(path)
    output_path.parent.mkdir(parents=True, exist_ok=True)
    with output_path.open("w", encoding="utf-8") as file:
        for record in records:
            file.write(json.dumps(record, ensure_ascii=False) + "\n")
 def run_ingest(args: argparse.Namespace) -> None:
    config = load_config(args.config)
    client = build_client(config)
    env = memory_env(config)
    samples = load_samples(args.input, args.sample)
    session_range = parse_session_range(args.sessions)
    user_prefix = str(config.get("memory", {}).get("user_prefix", "locomo-"))
    records: list[dict[str, Any]] = []
    for sample in samples:
        user_id = args.user or sample_user_id(user_prefix, sample)
        sessions = build_sessions(sample, session_range=session_range, tail=args.tail)
        print(f"=== Sample {sample['sample_id']} user={user_id} sessions={len(sessions)} ===", file=sys.stderr)
        for session in sessions:
            try:
                response = client.chat(session.message, user_id=user_id, env=env)
                status = "success"
            except Exception as exc:
                response = str(exc)
                status = "failed"
            print(f"[{session.sample_id}/{session.session_key}] {status}", file=sys.stderr)
            records.append(
                {
                    "mode": "ingest",
                    "status": status,
                    "sample_id": session.sample_id,
                    "session": session.session_key,
                    "date_time": session.date_time,
                    "user_id": user_id,
                    "response": response,
                }
            )
    if args.output:
        write_jsonl(args.output, records)
        print(f"written: {args.output}", file=sys.stderr)
 def run_qa(args: argparse.Namespace) -> None:
    config = load_config(args.config)
    client = build_client(config)
    env = memory_env(config)
    samples = load_samples(args.input, args.sample)
    user_prefix = str(config.get("memory", {}).get("user_prefix", "locomo-"))
    records: list[dict[str, Any]] = []
    for sample in samples:
        user_id = args.user or sample_user_id(user_prefix, sample)
        qas = build_qas(sample, include_category_5=args.include_category_5)
        if args.count is not None:
            qas = qas[: args.count]
        print(f"=== Sample {sample['sample_id']} user={user_id} qa={len(qas)} ===", file=sys.stderr)
        for index, qa in enumerate(qas, start=1):
            try:
                response = client.chat(qa_prompt(config, qa.question), user_id=user_id, env=env)
                status = "success"
            except Exception as exc:
                response = str(exc)
                status = "failed"
            print(f"[{qa.sample_id}] Q{index}/{len(qas)} {status}", file=sys.stderr)
            records.append(
                {
                    "mode": "qa",
                    "status": status,
                    "sample_id": qa.sample_id,
                    "user_id": user_id,
                    "qi": index,
                    "question": qa.question,
                    "expected": qa.expected,
                    "response": response,
                    "category": qa.category,
                    "evidence": qa.evidence,
                }
            )
    if args.output:
        write_jsonl(args.output, records)
        print(f"written: {args.output}", file=sys.stderr)
 def main() -> None:
    parser = argparse.ArgumentParser(description="Evaluate Hermes memory with LoCoMo-style datasets")
    parser.add_argument("mode", choices=["ingest", "qa"])
    parser.add_argument("input", help="Path to LoCoMo JSON dataset")
    parser.add_argument("--config", default="eval/hermes_memory_eval/config.example.yaml")
    parser.add_argument("--output", default=None)
    parser.add_argument("--sample", type=int, default=None)
    parser.add_argument("--user", default=None)
    parser.add_argument("--sessions", default=None, help="Ingest session range, for example 1-4")
    parser.add_argument("--tail", default="请记住以上历史对话，只回复 OK。")
    parser.add_argument("--count", type=int, default=None)
    parser.add_argument("--include-category-5", action="store_true")
    args = parser.parse_args()
    if args.mode == "ingest":
        run_ingest(args)
    else:
        run_qa(args)
 if __name__ == "__main__":
    main()