{
  "metadata": {
    "id": "ch07",
    "title": "第7章：记忆与上下文管理",
    "volume": "vol2",
    "volume_title": "基础篇",
    "word_count": 1884,
    "difficulty": "beginner",
    "prerequisites": [
      "ch04"
    ],
    "key_concepts": [
      "短期记忆 vs 长期记忆",
      "记忆的层次模型",
      "各层记忆的实现",
      "短期记忆管理",
      "长期记忆管理",
      "上下文窗口管理策略",
      "上下文窗口的挑战",
      "滑动窗口策略",
      "摘要压缩策略",
      "优先级保留策略",
      "混合策略",
      "记忆检索与压缩",
      "Embedding 基础",
      "记忆检索策略",
      "记忆压缩"
    ],
    "learning_objectives": [],
    "estimated_tokens": 1130,
    "source_file": "vol2/ch07_记忆与上下文管理.md"
  },
  "overview": "",
  "sections": [
    {
      "id": "7.1",
      "title": "7.1 短期记忆 vs 长期记忆",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.1.1",
          "title": "7.1.1 记忆的层次模型",
          "content": "Agent 的记忆系统借鉴了人类认知科学的模型，分为多个层次："
        },
        {
          "id": "7.1.2",
          "title": "7.1.2 各层记忆的实现",
          "content": ""
        },
        {
          "id": "7.1.3",
          "title": "7.1.3 短期记忆管理",
          "content": "短期记忆存储当前会话的上下文，直接作为 LLM 的输入："
        },
        {
          "id": "7.1.4",
          "title": "7.1.4 长期记忆管理",
          "content": "长期记忆需要持久化存储，并支持语义检索：\n\n\n---"
        }
      ]
    },
    {
      "id": "7.2",
      "title": "7.2 上下文窗口管理策略",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.2.1",
          "title": "7.2.1 上下文窗口的挑战",
          "content": "LLM 的上下文窗口是有限的（尽管在不断扩大），但 Agent 在运行中很容易超出限制："
        },
        {
          "id": "7.2.2",
          "title": "7.2.2 滑动窗口策略",
          "content": "最简单的策略——保留最近的 N 条消息："
        },
        {
          "id": "7.2.3",
          "title": "7.2.3 摘要压缩策略",
          "content": "当对话历史太长时，将早期对话压缩为摘要："
        },
        {
          "id": "7.2.4",
          "title": "7.2.4 优先级保留策略",
          "content": "不是所有消息同等重要——系统消息、工具结果摘要、用户的关键指令应该优先保留："
        },
        {
          "id": "7.2.5",
          "title": "7.2.5 混合策略",
          "content": "生产环境推荐使用混合策略：\n\n\n---"
        }
      ]
    },
    {
      "id": "7.3",
      "title": "7.3 记忆检索与压缩",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.3.1",
          "title": "7.3.1 Embedding 基础",
          "content": "向量嵌入（Embedding）是语义检索的基础。它将文本转换为高维向量，使语义相似的文本在向量空间中距离更近："
        },
        {
          "id": "7.3.2",
          "title": "7.3.2 记忆检索策略",
          "content": ""
        },
        {
          "id": "7.3.3",
          "title": "7.3.3 记忆压缩",
          "content": "---"
        }
      ]
    },
    {
      "id": "7.4",
      "title": "7.4 向量数据库集成",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.4.1",
          "title": "7.4.1 为什么需要向量数据库",
          "content": "当记忆数量达到数千甚至数百万条时，每次查询都遍历所有记忆计算相似度是不可行的。向量数据库通过近似最近邻（ANN）算法实现高效检索。"
        },
        {
          "id": "7.4.2",
          "title": "7.4.2 使用 ChromaDB",
          "content": "ChromaDB 是最易上手的向量数据库之一："
        },
        {
          "id": "7.4.3",
          "title": "7.4.3 完整的记忆系统整合",
          "content": "---"
        }
      ]
    },
    {
      "id": "7.5",
      "title": "7.5 对话历史管理",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.5.1",
          "title": "7.5.1 会话管理",
          "content": ""
        },
        {
          "id": "7.5.2",
          "title": "7.5.2 对话历史持久化",
          "content": "---"
        }
      ]
    },
    {
      "id": "7.6",
      "title": "7.6 记忆的遗忘与更新机制",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.6.1",
          "title": "7.6.1 为什么要\"遗忘\"",
          "content": "人类大脑的遗忘不是缺陷，而是特性。Agent 同样需要遗忘机制：\n\n- **控制成本**：长期记忆越多，检索越慢、存储越贵\n- **过滤噪音**：并非所有信息都值得长期保存\n- **适应变化**：过时的信息可能产生误导"
        },
        {
          "id": "7.6.2",
          "title": "7.6.2 记忆更新",
          "content": "当新信息与旧记忆冲突时，需要更新而非保留两者："
        },
        {
          "id": "7.6.3",
          "title": "7.6.3 记忆的自动重要性评估",
          "content": "---"
        }
      ]
    },
    {
      "id": "7.7",
      "title": "7.7 常见陷阱与最佳实践",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "7.7.1",
          "title": "7.7.1 常见陷阱",
          "content": "#### 陷阱1：把所有对话历史都发到 LLM\n\n\n#### 陷阱2：摘要丢失关键细节\n\n\n#### 陷阱3：忽略记忆的时效性"
        },
        {
          "id": "7.7.2",
          "title": "7.7.2 最佳实践",
          "content": "---"
        }
      ]
    },
    {
      "id": "7.8",
      "title": "7.8 本章小结",
      "level": 2,
      "content": "本章我们深入探讨了 Agent 记忆系统的设计与实现：\n\n1. **记忆层次**：感知记忆、工作记忆、短期记忆、长期记忆的分层设计\n2. **上下文管理**：滑动窗口、摘要压缩、优先级保留、混合策略\n3. **记忆检索**：语义检索、时效检索、重要性检索、混合检索\n4. **向量数据库**：ChromaDB 集成，实现高效的语义检索\n5. **对话历史管理**：会话管理、持久化存储、搜索\n6. **遗忘与更新**：时间衰减、冲突解决、重要性评估\n\n**核心洞察：** 记忆系统是 Agent 区别于简单 Chatbot 的关键特征。好的记忆系统不是存储越多越好，而是\"存该存的，忘该忘的，在需要时能快速找到对的\"。记忆管理是一门平衡的艺术——在信息完整性和成本效率之间找到最佳平衡点。\n\n---",
      "subsections": []
    },
    {
      "id": "卷二总结",
      "title": "卷二总结",
      "level": 2,
      "content": "恭喜你完成了卷二\"基础篇\"的学习！回顾一下我们走过的路：\n\n| 章节 | 核心收获 |\n|------|---------|\n| **第4章：Agent核心概念** | 理解了 Agent 的架构模型、核心组件、生命周期和评估体系 |\n| **第5章：LLM与Prompt Engineering** | 掌握了与 LLM 高效沟通的技巧——Prompt 设计、CoT 推理、模板管理 |\n| **第6章：工具调用** | 学会了赋予 Agent 行动能力——Function Calling、工具开发、错误处理 |\n| **第7章：记忆与上下文** | 实现了 Agent 的持久化能力——记忆分层、向量检索、上下文管理 |\n\n现在你已经具备了构建一个完整 Agent 系统的所有基础知识。在卷三\"进阶篇\"中，我们将把这些组件整合为更复杂的系统——多 Agent 协作、生产级部署、安全与治理。准备好了吗？\n\n---\n\n> **下一卷**：卷三《进阶篇》—— 多 Agent 协作、生产级部署、安全与治理、可观测性。",
      "subsections": []
    }
  ],
  "code_blocks": [
    {
      "id": "code-1",
      "language": "text",
      "description": "Agent 的记忆系统借鉴了人类认知科学的模型，分为多个层次：",
      "code": "┌─────────────────────────────────────────────────────────┐\n│                     记忆金字塔                            │\n│                                                          │\n│                    ┌──────────┐                          │\n│                    │ 感知记忆  │  ← 当前输入的原始信息       │\n│                    │ (Sensory)│     持续：毫秒级           │\n│                    └────┬─────┘                          │\n│                         ▼                                │\n│                  ┌──────────────┐                         │\n│                  │  工作记忆     │  ← 当前对话上下文         │\n│                  │ (Working)    │     持续：分钟级           │\n│                  │              │     容量：有限            │\n│                  └──────┬──────┘                         │\n│                         ▼                                │\n│              ┌─────────────────────┐                      │\n│              │    短期记忆          │  ← 当前会话历史         │\n│              │ (Short-term)        │     持续：小时级         │\n│              │                     │     容量：中等          │\n│              └──────────┬──────────┘                      │\n│                         ▼                                │\n│              ┌─────────────────────┐                      │\n│              │    长期记忆          │  ← 跨会话持久化知识     │\n│              │ (Long-term)         │     持续：永久          │\n│              │                     │     容量：大            │\n│              └─────────────────────┘                      │\n└─────────────────────────────────────────────────────────┘",
      "section_ref": "7.1.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-2",
      "language": "python",
      "description": "",
      "code": "from dataclasses import dataclass, field\nfrom datetime import datetime\nfrom typing import Any\nfrom enum import Enum\n\nclass MemoryType(Enum):\n    WORKING = \"working\"       # 工作记忆：当前推理上下文\n    SHORT_TERM = \"short\"     # 短期记忆：当前会话\n    LONG_TERM = \"long\"       # 长期记忆：跨会话持久化\n\n@dataclass\nclass MemoryItem:\n    \"\"\"记忆条目\"\"\"\n    content: str                    # 记忆内容\n    memory_type: MemoryType         # 记忆类型\n    timestamp: datetime = field(default_factory=datetime.now)\n    importance: float = 0.5         # 重要性评分 (0-1)\n    metadata: dict = field(default_factory=dict)\n    access_count: int = 0           # 被访问次数\n    last_accessed: datetime = field(default_factory=datetime.now)\n    embedding: list[float] | None = None  # 向量嵌入（用于检索）\n    source: str = \"\"                # 记忆来源\n    expires_at: datetime | None = None    # 过期时间\n    \n    def touch(self):\n        \"\"\"更新访问时间\"\"\"\n        self.access_count += 1\n        self.last_accessed = datetime.now()\n    \n    @property\n    def is_expired(self) -> bool:\n        if self.expires_at is None:\n            return False\n        return datetime.now() > self.expires_at\n    \n    @property\n    def age_hours(self) -> float:\n        return (datetime.now() - self.timestamp).total_seconds() / 3600",
      "section_ref": "7.1.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-3",
      "language": "python",
      "description": "短期记忆存储当前会话的上下文，直接作为 LLM 的输入：",
      "code": "class ShortTermMemory:\n    \"\"\"短期记忆——对话上下文管理\"\"\"\n    \n    def __init__(self, max_messages: int = 50, max_tokens: int = 8000):\n        self.max_messages = max_messages\n        self.max_tokens = max_tokens\n        self.messages: list[dict] = []\n        self._token_counter = lambda text: len(text) // 2  # 简化计数\n    \n    def add_message(self, role: str, content: str):\n        \"\"\"添加消息\"\"\"\n        self.messages.append({\n            \"role\": role,\n            \"content\": content,\n            \"timestamp\": datetime.now().isoformat()\n        })\n        self._trim()\n    \n    def add_system(self, content: str):\n        \"\"\"添加系统消息（始终保留）\"\"\"\n        # 系统消息插入到最前面\n        if self.messages and self.messages[0][\"role\"] == \"system\":\n            self.messages[0][\"content\"] = content\n        else:\n            self.messages.insert(0, {\n                \"role\": \"system\",\n                \"content\": content,\n                \"timestamp\": datetime.now().isoformat()\n            })\n    \n    def get_context(self) -> list[dict]:\n        \"\"\"获取当前上下文\"\"\"\n        return [\n            {\"role\": m[\"role\"], \"content\": m[\"content\"]}\n            for m in self.messages\n        ]\n    \n    def _trim(self):\n        \"\"\"裁剪上下文\"\"\"\n        # 保留系统消息\n        system_msgs = [m for m in self.messages if m[\"role\"] == \"system\"]\n        other_msgs = [m for m in self.messages if m[\"role\"] != \"system\"]\n        \n        # 按消息数限制裁剪\n        if len(other_msgs) > self.max_messages:\n            other_msgs = other_msgs[-self.max_messages:]\n        \n        # 按 Token 数限制裁剪\n        total_tokens = sum(self._token_counter(m[\"content\"]) for m in other_msgs)\n        while total_tokens > self.max_tokens and len(other_msgs) > 2:\n            removed = other_msgs.pop(0)\n            total_tokens -= self._token_counter(removed[\"content\"])\n        \n        self.messages = system_msgs + other_msgs\n    \n    def get_summary(self) -> str:\n        \"\"\"获取对话摘要\"\"\"\n        if not self.messages:\n            return \"暂无对话历史\"\n        \n        user_msgs = [m for m in self.messages if m[\"role\"] == \"user\"]\n        assistant_msgs = [m for m in self.messages if m[\"role\"] == \"assistant\"]\n        \n        return f\"\"\"对话统计：\n- 总消息数：{len(self.messages)}\n- 用户消息：{len(user_msgs)}\n- 助手回复：{len(assistant_msgs)}\n- 首条消息：{self.messages[0].get('timestamp', 'N/A')}\n- 最新消息：{self.messages[-1].get('timestamp', 'N/A')}\"\"\"",
      "section_ref": "7.1.3",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-4",
      "language": "python",
      "description": "长期记忆需要持久化存储，并支持语义检索：",
      "code": "class LongTermMemory:\n    \"\"\"长期记忆——跨会话持久化\"\"\"\n    \n    def __init__(self, storage_backend: Any = None):\n        self.memories: list[MemoryItem] = []\n        self.storage = storage_backend  # 可以是文件、数据库等\n        self._embedder = None           # 嵌入模型\n    \n    def store(self, content: str, importance: float = 0.5, **metadata):\n        \"\"\"存储记忆\"\"\"\n        item = MemoryItem(\n            content=content,\n            memory_type=MemoryType.LONG_TERM,\n            importance=importance,\n            metadata=metadata\n        )\n        \n        # 生成嵌入向量\n        if self._embedder:\n            item.embedding = self._embedder.embed(content)\n        \n        self.memories.append(item)\n        \n        # 持久化\n        if self.storage:\n            self.storage.save(item)\n        \n        return item\n    \n    def recall(self, query: str, top_k: int = 5, threshold: float = 0.3) -> list[MemoryItem]:\n        \"\"\"检索相关记忆\"\"\"\n        if not self._embedder:\n            # 无嵌入模型时，使用简单的关键词匹配\n            return self._keyword_search(query, top_k)\n        \n        # 向量检索\n        query_embedding = self._embedder.embed(query)\n        \n        scored = []\n        for item in self.memories:\n            if item.embedding is None:\n                continue\n            \n            similarity = self._cosine_similarity(query_embedding, item.embedding)\n            if similarity >= threshold:\n                scored.append((similarity, item))\n        \n        scored.sort(reverse=True)\n        \n        results = [item for _, item in scored[:top_k]]\n        \n        # 更新访问记录\n        for item in results:\n            item.touch()\n        \n        return results\n    \n    def forget(self, criteria: callable):\n        \"\"\"遗忘——删除满足条件的记忆\"\"\"\n        self.memories = [\n            item for item in self.memories\n            if not criteria(item)\n        ]\n    \n    def consolidate(self):\n        \"\"\"记忆整合——合并相似记忆，删除冗余\"\"\"\n        if not self._embedder:\n            return\n        \n        # 找到相似度高的记忆对\n        to_remove = set()\n        for i in range(len(self.memories)):\n            for j in range(i + 1, len(self.memories)):\n                if j in to_remove:\n                    continue\n                \n                mi, mj = self.memories[i], self.memories[j]\n                if mi.embedding and mj.embedding:\n                    sim = self._cosine_similarity(mi.embedding, mj.embedding)\n                    \n                    if sim > 0.9:  # 高度相似\n                        # 保留更重要的那个\n                        if mj.importance > mi.importance:\n                            to_remove.add(i)\n                        else:\n                            to_remove.add(j)\n        \n        self.memories = [\n            item for i, item in enumerate(self.memories)\n            if i not in to_remove\n        ]\n    \n    def _keyword_search(self, query: str, top_k: int) -> list[MemoryItem]:\n        \"\"\"关键词搜索（降级方案）\"\"\"\n        query_words = set(query.lower().split())\n        \n        scored = []\n        for item in self.memories:\n            content_words = set(item.content.lower().split())\n            overlap = len(query_words & content_words)\n            if overlap > 0:\n                scored.append((overlap, item))\n        \n        scored.sort(reverse=True)\n        return [item for _, item in scored[:top_k]]\n    \n    @staticmethod\n    def _cosine_similarity(a: list[float], b: list[float]) -> float:\n        \"\"\"计算余弦相似度\"\"\"\n        import math\n        dot = sum(x * y for x, y in zip(a, b))\n        norm_a = math.sqrt(sum(x * x for x in a))\n        norm_b = math.sqrt(sum(x * x for x in b))\n        return dot / (norm_a * norm_b) if norm_a and norm_b else 0.0",
      "section_ref": "7.1.4",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-5",
      "language": "text",
      "description": "LLM 的上下文窗口是有限的（尽管在不断扩大），但 Agent 在运行中很容易超出限制：",
      "code": "上下文窗口组成：\n┌─────────────────────────────────────────────┐\n│ System Prompt        (~500-2000 tokens)     │\n├─────────────────────────────────────────────┤\n│ 工具定义              (~1000-5000 tokens)    │  ← 每+1个工具约+200-500 tokens\n├─────────────────────────────────────────────┤\n│ 对话历史              (~动态增长)            │  ← 最大威胁！\n├─────────────────────────────────────────────┤\n│ 工具调用结果          (~动态增长)            │\n├─────────────────────────────────────────────┤\n│ 检索到的记忆          (~1000-3000 tokens)    │\n├─────────────────────────────────────────────┤\n│ 预留给输出的空间      (~2000-4000 tokens)    │\n└─────────────────────────────────────────────┘",
      "section_ref": "7.2.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-6",
      "language": "python",
      "description": "最简单的策略——保留最近的 N 条消息：",
      "code": "class SlidingWindowManager:\n    \"\"\"滑动窗口上下文管理\"\"\"\n    \n    def __init__(\n        self,\n        max_tokens: int,\n        system_prompt_tokens: int,\n        reserve_for_output: int = 4096\n    ):\n        self.max_tokens = max_tokens\n        self.system_prompt_tokens = system_prompt_tokens\n        self.reserve_for_output = reserve_for_output\n        self.available_for_context = (\n            max_tokens - system_prompt_tokens - reserve_for_output\n        )\n    \n    def select_messages(\n        self,\n        messages: list[dict],\n        token_counter: callable\n    ) -> list[dict]:\n        \"\"\"选择要保留的消息\"\"\"\n        # 始终保留系统消息\n        system = [m for m in messages if m[\"role\"] == \"system\"]\n        others = [m for m in messages if m[\"role\"] != \"system\"]\n        \n        # 从最新开始，向前添加，直到 Token 用完\n        selected = []\n        used_tokens = 0\n        \n        for msg in reversed(others):\n            msg_tokens = token_counter(msg[\"content\"])\n            \n            if used_tokens + msg_tokens > self.available_for_context:\n                break\n            \n            selected.insert(0, msg)\n            used_tokens += msg_tokens\n        \n        return system + selected\n    \n    def utilization(self, messages: list[dict], token_counter: callable) -> float:\n        \"\"\"计算上下文利用率\"\"\"\n        total = sum(token_counter(m[\"content\"]) for m in messages)\n        return total / self.max_tokens",
      "section_ref": "7.2.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-7",
      "language": "python",
      "description": "当对话历史太长时，将早期对话压缩为摘要：",
      "code": "class SummaryCompressor:\n    \"\"\"摘要压缩器\"\"\"\n    \n    def __init__(self, llm):\n        self.llm = llm\n    \n    def compress(\n        self,\n        messages: list[dict],\n        max_summary_tokens: int = 500\n    ) -> tuple[list[dict], str]:\n        \"\"\"\n        压缩对话历史\n        返回：(保留的近期消息, 历史摘要)\n        \"\"\"\n        if len(messages) <= 6:\n            return messages, \"\"\n        \n        # 分为早期和近期\n        early = messages[:-4]  # 保留最近4条不压缩\n        recent = messages[-4:]\n        \n        # 生成早期对话摘要\n        conversation_text = \"\\n\".join(\n            f\"{'用户' if m['role'] == 'user' else '助手'}: {m['content']}\"\n            for m in early\n        )\n        \n        summary_prompt = f\"\"\"请将以下对话历史压缩为简洁的摘要。\n保留关键信息：讨论的主题、做出的决定、重要的数据。\n\n对话历史：\n{conversation_text}\n\n请用 {max_summary_tokens} 字以内的中文概括。\"\"\"\n\n        summary = self.llm.chat(\n            messages=[{\"role\": \"user\", \"content\": summary_prompt}],\n            temperature=0.1\n        ).content\n        \n        # 用摘要替代早期消息\n        summary_message = {\n            \"role\": \"system\",\n            \"content\": f\"[之前的对话摘要]\\n{summary}\"\n        }\n        \n        return [summary_message] + recent, summary\n    \n    def incremental_summarize(\n        self,\n        existing_summary: str,\n        new_messages: list[dict]\n    ) -> str:\n        \"\"\"增量更新摘要\"\"\"\n        if not new_messages:\n            return existing_summary\n        \n        new_text = \"\\n\".join(\n            f\"{'用户' if m['role'] == 'user' else '助手'}: {m['content']}\"\n            for m in new_messages\n        )\n        \n        prompt = f\"\"\"现有摘要：\n{existing_summary if existing_summary else \"（无）\"}\n\n新的对话内容：\n{new_text}\n\n请更新摘要，整合新信息，保持简洁（300字以内）。\"\"\"\n        \n        return self.llm.chat(\n            messages=[{\"role\": \"user\", \"content\": prompt}],\n            temperature=0.1\n        ).content",
      "section_ref": "7.2.3",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-8",
      "language": "python",
      "description": "不是所有消息同等重要——系统消息、工具结果摘要、用户的关键指令应该优先保留：",
      "code": "@dataclass\nclass MessagePriority:\n    \"\"\"消息优先级\"\"\"\n    message: dict\n    priority: float = 0.5  # 0-1\n    tokens: int = 0\n\nclass PriorityBasedManager:\n    \"\"\"基于优先级的上下文管理\"\"\"\n    \n    # 优先级规则\n    PRIORITY_RULES = {\n        \"system\": 1.0,           # 系统消息：最高\n        \"tool_result\": 0.8,      # 工具结果：高\n        \"tool_call\": 0.7,        # 工具调用：高\n        \"user\": 0.6,             # 用户消息：中高\n        \"assistant\": 0.5,        # 助手回复：中\n        \"error\": 0.3,            # 错误消息：低\n    }\n    \n    def __init__(self, max_tokens: int):\n        self.max_tokens = max_tokens\n    \n    def classify_message(self, message: dict) -> str:\n        \"\"\"分类消息类型\"\"\"\n        role = message.get(\"role\", \"\")\n        \n        if role == \"system\":\n            return \"system\"\n        elif role == \"tool\":\n            return \"tool_result\"\n        elif message.get(\"tool_calls\"):\n            return \"tool_call\"\n        elif role == \"user\":\n            return \"user\"\n        elif role == \"assistant\":\n            return \"assistant\"\n        else:\n            return \"error\"\n    \n    def select_messages(\n        self,\n        messages: list[dict],\n        token_counter: callable\n    ) -> list[dict]:\n        \"\"\"按优先级选择消息\"\"\"\n        \n        # 分类并评分\n        prioritized = []\n        for msg in messages:\n            msg_type = self.classify_message(msg)\n            priority = self.PRIORITY_RULES.get(msg_type, 0.5)\n            \n            # 时间衰减：越老的消息优先级略降\n            index = messages.index(msg)\n            total = len(messages)\n            time_decay = 0.5 + 0.5 * (index / total)  # 越新衰减越小\n            \n            final_priority = priority * time_decay\n            \n            prioritized.append(MessagePriority(\n                message=msg,\n                priority=final_priority,\n                tokens=token_counter(msg.get(\"content\", \"\"))\n            ))\n        \n        # 按优先级排序\n        prioritized.sort(key=lambda p: p.priority, reverse=True)\n        \n        # 贪心选择\n        selected = []\n        used_tokens = 0\n        \n        for p in prioritized:\n            if used_tokens + p.tokens <= self.max_tokens:\n                selected.append(p.message)\n                used_tokens += p.tokens\n        \n        # 按原始顺序排列\n        original_order = {id(m): i for i, m in enumerate(messages)}\n        selected.sort(key=lambda m: original_order.get(id(m), 0))\n        \n        return selected",
      "section_ref": "7.2.4",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-9",
      "language": "python",
      "description": "生产环境推荐使用混合策略：",
      "code": "class HybridContextManager:\n    \"\"\"混合上下文管理器\"\"\"\n    \n    def __init__(\n        self,\n        llm,\n        max_tokens: int,\n        system_prompt_tokens: int,\n        reserve_output: int = 4096,\n        keep_recent: int = 4,\n        summary_max_tokens: int = 400\n    ):\n        self.llm = llm\n        self.max_tokens = max_tokens\n        self.available = max_tokens - system_prompt_tokens - reserve_output\n        self.keep_recent = keep_recent\n        self.compressor = SummaryCompressor(llm)\n        self.priority_mgr = PriorityBasedManager(self.available)\n        self.token_counter = lambda text: len(text) // 2\n        self._cached_summary = \"\"\n    \n    def manage(self, messages: list[dict]) -> list[dict]:\n        \"\"\"管理上下文\"\"\"\n        system = [m for m in messages if m[\"role\"] == \"system\"]\n        others = [m for m in messages if m[\"role\"] != \"system\"]\n        \n        total_tokens = sum(\n            self.token_counter(m.get(\"content\", \"\")) for m in others\n        )\n        \n        if total_tokens <= self.available:\n            # 不需要压缩\n            return messages\n        \n        # 策略1：如果超出不多，用优先级策略裁剪\n        if total_tokens <= self.available * 1.5:\n            selected = self.priority_mgr.select_messages(\n                others, self.token_counter\n            )\n            return system + selected\n        \n        # 策略2：超出较多，先摘要再裁剪\n        recent = others[-self.keep_recent:]\n        old = others[:-self.keep_recent]\n        \n        # 增量摘要\n        self._cached_summary = self.compressor.incremental_summarize(\n            self._cached_summary, old\n        )\n        \n        summary_msg = {\n            \"role\": \"system\",\n            \"content\": f\"[对话历史摘要]\\n{self._cached_summary}\"\n        }\n        \n        return system + [summary_msg] + recent",
      "section_ref": "7.2.5",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-10",
      "language": "python",
      "description": "向量嵌入（Embedding）是语义检索的基础。它将文本转换为高维向量，使语义相似的文本在向量空间中距离更近：",
      "code": "class TextEmbedder:\n    \"\"\"文本嵌入生成器\"\"\"\n    \n    def __init__(self, model: str = \"text-embedding-3-small\", api_key: str = \"\"):\n        from openai import OpenAI\n        self.client = OpenAI(api_key=api_key)\n        self.model = model\n        self._cache: dict[str, list[float]] = {}\n    \n    def embed(self, text: str) -> list[float]:\n        \"\"\"生成文本嵌入\"\"\"\n        if text in self._cache:\n            return self._cache[text]\n        \n        response = self.client.embeddings.create(\n            input=text,\n            model=self.model\n        )\n        \n        embedding = response.data[0].embedding\n        self._cache[text] = embedding\n        return embedding\n    \n    def embed_batch(self, texts: list[str]) -> list[list[float]]:\n        \"\"\"批量生成嵌入\"\"\"\n        # 过滤已有缓存的\n        to_embed = [t for t in texts if t not in self._cache]\n        \n        if to_embed:\n            response = self.client.embeddings.create(\n                input=to_embed,\n                model=self.model\n            )\n            \n            for text, data in zip(to_embed, response.data):\n                self._cache[text] = data.embedding\n        \n        return [self._cache[t] for t in texts]",
      "section_ref": "7.3.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-11",
      "language": "python",
      "description": "",
      "code": "class MemoryRetriever:\n    \"\"\"记忆检索器\"\"\"\n    \n    def __init__(self, embedder: TextEmbedder):\n        self.embedder = embedder\n    \n    def retrieve(\n        self,\n        query: str,\n        memories: list[MemoryItem],\n        top_k: int = 5,\n        strategy: str = \"hybrid\"\n    ) -> list[tuple[MemoryItem, float]]:\n        \"\"\"\n        检索相关记忆\n        \n        strategy:\n        - \"semantic\": 纯语义检索\n        - \"recency\": 纯时效检索\n        - \"importance\": 纯重要性检索\n        - \"hybrid\": 混合检索（推荐）\n        \"\"\"\n        if not memories:\n            return []\n        \n        if strategy == \"semantic\":\n            return self._semantic_search(query, memories, top_k)\n        elif strategy == \"recency\":\n            return self._recency_search(memories, top_k)\n        elif strategy == \"importance\":\n            return self._importance_search(memories, top_k)\n        else:\n            return self._hybrid_search(query, memories, top_k)\n    \n    def _semantic_search(\n        self, query: str, memories: list[MemoryItem], top_k: int\n    ) -> list[tuple[MemoryItem, float]]:\n        \"\"\"语义检索\"\"\"\n        query_embedding = self.embedder.embed(query)\n        \n        scored = []\n        for item in memories:\n            if item.embedding is None:\n                item.embedding = self.embedder.embed(item.content)\n            \n            similarity = LongTermMemory._cosine_similarity(\n                query_embedding, item.embedding\n            )\n            scored.append((similarity, item))\n        \n        scored.sort(reverse=True)\n        return [(item, score) for score, item in scored[:top_k]]\n    \n    def _recency_search(\n        self, memories: list[MemoryItem], top_k: int\n    ) -> list[tuple[MemoryItem, float]]:\n        \"\"\"时效检索\"\"\"\n        now = datetime.now()\n        \n        scored = []\n        for item in memories:\n            age_hours = (now - item.timestamp).total_seconds() / 3600\n            # 时间衰减：1小时内1.0，每24小时衰减0.1\n            recency_score = max(0, 1.0 - age_hours / 240)\n            scored.append((recency_score, item))\n        \n        scored.sort(reverse=True)\n        return [(item, score) for score, item in scored[:top_k]]\n    \n    def _importance_search(\n        self, memories: list[MemoryItem], top_k: int\n    ) -> list[tuple[MemoryItem, float]]:\n        \"\"\"重要性检索\"\"\"\n        scored = [(item.importance, item) for item in memories]\n        scored.sort(reverse=True)\n        return [(item, score) for score, item in scored[:top_k]]\n    \n    def _hybrid_search(\n        self, query: str, memories: list[MemoryItem], top_k: int\n    ) -> list[tuple[MemoryItem, float]]:\n        \"\"\"混合检索——综合语义相关性、时效性和重要性\"\"\"\n        query_embedding = self.embedder.embed(query)\n        now = datetime.now()\n        \n        # 权重配置\n        alpha = 0.6   # 语义相关性\n        beta = 0.2    # 时效性\n        gamma = 0.2   # 重要性\n        \n        scored = []\n        for item in memories:\n            # 语义分\n            if item.embedding is None:\n                item.embedding = self.embedder.embed(item.content)\n            semantic = LongTermMemory._cosine_similarity(\n                query_embedding, item.embedding\n            )\n            \n            # 时效分\n            age_hours = (now - item.timestamp).total_seconds() / 3600\n            recency = max(0, 1.0 - age_hours / 240)\n            \n            # 重要性分\n            importance = item.importance\n            \n            # 加权综合\n            composite = alpha * semantic + beta * recency + gamma * importance\n            scored.append((composite, item))\n        \n        scored.sort(reverse=True)\n        return [(item, score) for score, item in scored[:top_k]]",
      "section_ref": "7.3.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-12",
      "language": "python",
      "description": "",
      "code": "class MemoryCompressor:\n    \"\"\"记忆压缩器\"\"\"\n    \n    def __init__(self, llm):\n        self.llm = llm\n    \n    def compress_memories(\n        self,\n        memories: list[MemoryItem],\n        max_output_tokens: int = 500\n    ) -> MemoryItem:\n        \"\"\"将多条记忆压缩为一条\"\"\"\n        \n        memory_texts = \"\\n\".join(\n            f\"- [{m.timestamp.strftime('%m-%d %H:%M')}] {m.content}\"\n            for m in memories\n        )\n        \n        prompt = f\"\"\"请将以下多条记忆压缩为一条精炼的摘要。\n保留所有关键信息（事实、数据、决策、偏好），去除冗余。\n\n原始记忆：\n{memory_texts}\n\n压缩为一条记忆（{max_output_tokens}字以内）：\"\"\"\n\n        compressed = self.llm.chat(\n            messages=[{\"role\": \"user\", \"content\": prompt}],\n            temperature=0.1\n        ).content\n        \n        # 继承最重要记忆的元数据\n        most_important = max(memories, key=lambda m: m.importance)\n        \n        return MemoryItem(\n            content=compressed,\n            memory_type=MemoryType.LONG_TERM,\n            importance=most_important.importance,\n            metadata={\n                \"compressed_from\": len(memories),\n                \"source_memories\": [m.timestamp.isoformat() for m in memories]\n            }\n        )\n    \n    def extract_key_facts(self, content: str) -> list[str]:\n        \"\"\"从文本中提取关键事实\"\"\"\n        prompt = f\"\"\"从以下文本中提取关键事实，每条事实一行。\n格式：[类别] 事实内容\n类别：事实、偏好、决策、待办\n\n文本：\n{content}\"\"\"\n        \n        response = self.llm.chat(\n            messages=[{\"role\": \"user\", \"content\": prompt}],\n            temperature=0.1\n        ).content\n        \n        return [line.strip() for line in response.strip().split(\"\\n\") if line.strip()]",
      "section_ref": "7.3.3",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-13",
      "language": "text",
      "description": "当记忆数量达到数千甚至数百万条时，每次查询都遍历所有记忆计算相似度是不可行的。向量数据库通过近似最近邻（ANN）算法实现高效检索。",
      "code": "记忆数量 vs 检索延迟：\n┌────────────────────────────────────────────┐\n│                                            │\n│  延迟  │  暴力搜索    │  向量数据库         │\n│  (ms)  │  (遍历所有)  │  (ANN索引)         │\n│        │             │                     │\n│  1000  │  ~100       │  ~10                │\n│  10000 │  ~1000      │  ~15                │\n│  100K  │  ~10000     │  ~20                │\n│  1M    │  ~100000    │  ~30                │\n│  10M   │  ❌         │  ~50                │\n└────────────────────────────────────────────┘",
      "section_ref": "7.4.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-14",
      "language": "python",
      "description": "ChromaDB 是最易上手的向量数据库之一：",
      "code": "import chromadb\nfrom chromadb.config import Settings\n\nclass ChromaMemoryStore:\n    \"\"\"基于 ChromaDB 的记忆存储\"\"\"\n    \n    def __init__(\n        self,\n        collection_name: str = \"agent_memory\",\n        persist_directory: str = \"./chroma_db\"\n    ):\n        self.client = chromadb.Client(Settings(\n            chroma_db_impl=\"duckdb+parquet\",\n            persist_directory=persist_directory\n        ))\n        self.collection = self.client.get_or_create_collection(\n            name=collection_name,\n            metadata={\"hnsw:space\": \"cosine\"}\n        )\n    \n    def store(\n        self,\n        content: str,\n        memory_id: str,\n        metadata: dict | None = None,\n        embedding: list[float] | None = None\n    ):\n        \"\"\"存储记忆\"\"\"\n        self.collection.upsert(\n            ids=[memory_id],\n            documents=[content],\n            metadatas=[metadata or {}],\n            embeddings=[embedding]  # 如果为 None，ChromaDB 会自动生成\n        )\n    \n    def search(\n        self,\n        query: str,\n        n_results: int = 5,\n        where: dict | None = None,\n        query_embedding: list[float] | None = None\n    ) -> list[dict]:\n        \"\"\"检索记忆\"\"\"\n        results = self.collection.query(\n            query_texts=[query] if query else None,\n            query_embeddings=[query_embedding] if query_embedding else None,\n            n_results=n_results,\n            where=where\n        )\n        \n        memories = []\n        for i in range(len(results[\"ids\"][0])):\n            memories.append({\n                \"id\": results[\"ids\"][0][i],\n                \"content\": results[\"documents\"][0][i],\n                \"metadata\": results[\"metadatas\"][0][i],\n                \"distance\": results[\"distances\"][0][i] if results[\"distances\"] else None\n            })\n        \n        return memories\n    \n    def delete(self, memory_ids: list[str]):\n        \"\"\"删除记忆\"\"\"\n        self.collection.delete(ids=memory_ids)\n    \n    def count(self) -> int:\n        \"\"\"记忆总数\"\"\"\n        return self.collection.count()\n    \n    def update_metadata(self, memory_id: str, metadata: dict):\n        \"\"\"更新元数据\"\"\"\n        self.collection.update(\n            ids=[memory_id],\n            metadatas=[metadata]\n        )",
      "section_ref": "7.4.2",
      "runnable": true,
      "dependencies": [
        "chromadb"
      ]
    },
    {
      "id": "code-15",
      "language": "python",
      "description": "",
      "code": "class IntegratedMemorySystem:\n    \"\"\"整合的记忆系统——短期 + 长期（向量数据库）\"\"\"\n    \n    def __init__(\n        self,\n        llm,\n        embedder: TextEmbedder,\n        vector_store: ChromaMemoryStore,\n        short_term_max: int = 50,\n        long_term_top_k: int = 5\n    ):\n        self.llm = llm\n        self.embedder = embedder\n        self.vector_store = vector_store\n        self.short_term = ShortTermMemory(max_messages=short_term_max)\n        self.retriever = MemoryRetriever(embedder)\n        self.long_term_top_k = long_term_top_k\n        self.compressor = MemoryCompressor(llm)\n    \n    def add_conversation(self, role: str, content: str):\n        \"\"\"添加对话到短期记忆\"\"\"\n        self.short_term.add_message(role, content)\n    \n    def store_to_long_term(\n        self,\n        content: str,\n        importance: float = 0.5,\n        category: str = \"general\",\n        tags: list[str] | None = None\n    ):\n        \"\"\"存储到长期记忆\"\"\"\n        import uuid\n        memory_id = str(uuid.uuid4())\n        \n        self.vector_store.store(\n            content=content,\n            memory_id=memory_id,\n            metadata={\n                \"importance\": importance,\n                \"category\": category,\n                \"tags\": json.dumps(tags or []),\n                \"created_at\": datetime.now().isoformat()\n            }\n        )\n    \n    def build_context(self, query: str) -> list[dict]:\n        \"\"\"构建完整的上下文（短期记忆 + 检索到的长期记忆）\"\"\"\n        # 1. 短期记忆\n        context = self.short_term.get_context()\n        \n        # 2. 从长期记忆检索相关内容\n        long_term_results = self.vector_store.search(\n            query=query,\n            n_results=self.long_term_top_k\n        )\n        \n        if long_term_results:\n            memory_text = \"\\n\".join(\n                f\"- {r['content']}\" for r in long_term_results\n            )\n            \n            memory_context = {\n                \"role\": \"system\",\n                \"content\": f\"[相关记忆]\\n{memory_text}\"\n            }\n            \n            # 插入到系统消息之后\n            if context and context[0][\"role\"] == \"system\":\n                context.insert(1, memory_context)\n            else:\n                context.insert(0, memory_context)\n        \n        return context\n    \n    def end_session(self):\n        \"\"\"会话结束时，将重要信息转移到长期记忆\"\"\"\n        # 提取对话中的关键信息\n        conversation = self.short_term.get_context()\n        conversation_text = \"\\n\".join(\n            f\"{m['role']}: {m['content']}\" for m in conversation\n            if m[\"role\"] != \"system\"\n        )\n        \n        if len(conversation_text) < 50:\n            return\n        \n        # 让 LLM 提取值得记住的信息\n        prompt = f\"\"\"这个会话即将结束。请提取值得长期记住的信息。\n包括：用户偏好、重要决策、关键数据、待办事项。\n每条信息一行，格式：[类别] 内容\n\n对话内容：\n{conversation_text[:3000]}\"\"\"\n        \n        response = self.llm.chat(\n            messages=[{\"role\": \"user\", \"content\": prompt}],\n            temperature=0.1\n        ).content\n        \n        # 存储到长期记忆\n        for line in response.strip().split(\"\\n\"):\n            if line.strip():\n                self.store_to_long_term(\n                    content=line.strip(),\n                    importance=0.7,\n                    category=\"session_summary\"\n                )\n        \n        # 清空短期记忆\n        self.short_term.messages = []",
      "section_ref": "7.4.3",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-16",
      "language": "python",
      "description": "",
      "code": "@dataclass\nclass Session:\n    \"\"\"会话\"\"\"\n    session_id: str\n    title: str = \"\"\n    created_at: datetime = field(default_factory=datetime.now)\n    updated_at: datetime = field(default_factory=datetime.now)\n    message_count: int = 0\n    metadata: dict = field(default_factory=dict)\n\nclass SessionManager:\n    \"\"\"会话管理器\"\"\"\n    \n    def __init__(self, memory_system: IntegratedMemorySystem):\n        self.memory = memory_system\n        self._sessions: dict[str, Session] = {}\n    \n    def create_session(self, session_id: str | None = None) -> Session:\n        \"\"\"创建新会话\"\"\"\n        import uuid\n        sid = session_id or str(uuid.uuid4())\n        \n        session = Session(session_id=sid)\n        self._sessions[sid] = session\n        \n        self.memory.short_term = ShortTermMemory()\n        self.memory.short_term.add_system(\n            \"你是一个有帮助的 AI Agent。\"\n        )\n        \n        return session\n    \n    def switch_session(self, session_id: str):\n        \"\"\"切换会话\"\"\"\n        if session_id not in self._sessions:\n            raise ValueError(f\"会话不存在：{session_id}\")\n        \n        # 保存当前会话\n        self.end_session()\n        \n        # 切换\n        session = self._sessions[session_id]\n        # 实际实现中需要从持久化存储加载会话历史\n    \n    def end_session(self):\n        \"\"\"结束当前会话\"\"\"\n        self.memory.end_session()\n    \n    def list_sessions(self) -> list[dict]:\n        \"\"\"列出所有会话\"\"\"\n        return [\n            {\n                \"session_id\": s.session_id,\n                \"title\": s.title,\n                \"message_count\": s.message_count,\n                \"created_at\": s.created_at.isoformat(),\n                \"updated_at\": s.updated_at.isoformat()\n            }\n            for s in self._sessions.values()\n        ]\n    \n    def auto_title(self, session_id: str, first_messages: list[str]):\n        \"\"\"自动生成会话标题\"\"\"\n        conversation = \"\\n\".join(first_messages[:5])\n        \n        prompt = f\"\"\"根据以下对话的开头，生成一个简洁的会话标题（10字以内）：\n{conversation}\n\n标题：\"\"\"\n        \n        title = self.memory.llm.chat(prompt, temperature=0.3).content.strip()\n        \n        if session_id in self._sessions:\n            self._sessions[session_id].title = title",
      "section_ref": "7.5.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-17",
      "language": "python",
      "description": "",
      "code": "import sqlite3\nimport json\nfrom pathlib import Path\n\nclass ConversationStore:\n    \"\"\"对话历史持久化存储\"\"\"\n    \n    def __init__(self, db_path: str = \"conversations.db\"):\n        self.db_path = db_path\n        self._init_db()\n    \n    def _init_db(self):\n        \"\"\"初始化数据库\"\"\"\n        with sqlite3.connect(self.db_path) as conn:\n            conn.execute(\"\"\"\n                CREATE TABLE IF NOT EXISTS conversations (\n                    id TEXT PRIMARY KEY,\n                    session_id TEXT NOT NULL,\n                    role TEXT NOT NULL,\n                    content TEXT NOT NULL,\n                    tokens INTEGER DEFAULT 0,\n                    metadata TEXT DEFAULT '{}',\n                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n                    FOREIGN KEY (session_id) REFERENCES sessions(id)\n                )\n            \"\"\")\n            conn.execute(\"\"\"\n                CREATE TABLE IF NOT EXISTS sessions (\n                    id TEXT PRIMARY KEY,\n                    title TEXT DEFAULT '',\n                    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n                    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n                    metadata TEXT DEFAULT '{}'\n                )\n            \"\"\")\n            # 创建索引加速查询\n            conn.execute(\"\"\"\n                CREATE INDEX IF NOT EXISTS idx_conv_session \n                ON conversations(session_id, created_at)\n            \"\"\")\n    \n    def save_message(\n        self,\n        session_id: str,\n        role: str,\n        content: str,\n        tokens: int = 0,\n        metadata: dict | None = None\n    ):\n        \"\"\"保存消息\"\"\"\n        import uuid\n        msg_id = str(uuid.uuid4())\n        \n        with sqlite3.connect(self.db_path) as conn:\n            conn.execute(\n                \"INSERT INTO conversations VALUES (?,?,?,?,?,?,?)\",\n                (\n                    msg_id, session_id, role, content,\n                    tokens, json.dumps(metadata or {}),\n                    datetime.now().isoformat()\n                )\n            )\n            # 更新会话的 updated_at\n            conn.execute(\n                \"UPDATE sessions SET updated_at = ? WHERE id = ?\",\n                (datetime.now().isoformat(), session_id)\n            )\n    \n    def load_history(\n        self,\n        session_id: str,\n        limit: int = 100,\n        offset: int = 0\n    ) -> list[dict]:\n        \"\"\"加载对话历史\"\"\"\n        with sqlite3.connect(self.db_path) as conn:\n            conn.row_factory = sqlite3.Row\n            rows = conn.execute(\n                \"\"\"SELECT role, content, tokens, metadata, created_at\n                   FROM conversations\n                   WHERE session_id = ?\n                   ORDER BY created_at ASC\n                   LIMIT ? OFFSET ?\"\"\",\n                (session_id, limit, offset)\n            ).fetchall()\n            \n            return [\n                {\n                    \"role\": row[\"role\"],\n                    \"content\": row[\"content\"],\n                    \"tokens\": row[\"tokens\"],\n                    \"metadata\": json.loads(row[\"metadata\"]),\n                    \"created_at\": row[\"created_at\"]\n                }\n                for row in rows\n            ]\n    \n    def search_history(\n        self,\n        session_id: str,\n        keyword: str,\n        limit: int = 20\n    ) -> list[dict]:\n        \"\"\"搜索历史消息\"\"\"\n        with sqlite3.connect(self.db_path) as conn:\n            conn.row_factory = sqlite3.Row\n            rows = conn.execute(\n                \"\"\"SELECT role, content, created_at\n                   FROM conversations\n                   WHERE session_id = ? AND content LIKE ?\n                   ORDER BY created_at DESC\n                   LIMIT ?\"\"\",\n                (session_id, f\"%{keyword}%\", limit)\n            ).fetchall()\n            \n            return [dict(row) for row in rows]\n    \n    def get_session_stats(self, session_id: str) -> dict:\n        \"\"\"获取会话统计\"\"\"\n        with sqlite3.connect(self.db_path) as conn:\n            total = conn.execute(\n                \"SELECT COUNT(*) FROM conversations WHERE session_id = ?\",\n                (session_id,)\n            ).fetchone()[0]\n            \n            total_tokens = conn.execute(\n                \"SELECT COALESCE(SUM(tokens), 0) FROM conversations WHERE session_id = ?\",\n                (session_id,)\n            ).fetchone()[0]\n            \n            first_msg = conn.execute(\n                \"SELECT MIN(created_at) FROM conversations WHERE session_id = ?\",\n                (session_id,)\n            ).fetchone()[0]\n            \n            return {\n                \"total_messages\": total,\n                \"total_tokens\": total_tokens,\n                \"first_message_at\": first_msg,\n                \"avg_tokens_per_message\": total_tokens / total if total > 0 else 0\n            }",
      "section_ref": "7.5.2",
      "runnable": true,
      "dependencies": [
        "sqlite3"
      ]
    },
    {
      "id": "code-18",
      "language": "python",
      "description": "- 适应变化：过时的信息可能产生误导",
      "code": "class MemoryForgetter:\n    \"\"\"记忆遗忘管理器\"\"\"\n    \n    def __init__(\n        self,\n        decay_rate: float = 0.01,  # 每天的重要性衰减率\n        min_importance: float = 0.1,  # 低于此值的记忆被遗忘\n        max_age_days: int = 90  # 超过此天数的记忆强制遗忘\n    ):\n        self.decay_rate = decay_rate\n        self.min_importance = min_importance\n        self.max_age_days = max_age_days\n    \n    def apply_decay(self, memories: list[MemoryItem]) -> list[MemoryItem]:\n        \"\"\"应用时间衰减\"\"\"\n        now = datetime.now()\n        \n        for item in memories:\n            age_days = (now - item.timestamp).total_seconds() / 86400\n            \n            # 指数衰减\n            decayed_importance = item.importance * (1 - self.decay_rate) ** age_days\n            \n            # 被访问过的记忆衰减更慢\n            if item.access_count > 0:\n                boost = min(0.3, 0.05 * item.access_count)\n                decayed_importance = min(1.0, decayed_importance + boost)\n            \n            item.importance = decayed_importance\n        \n        return memories\n    \n    def get_forgettable(self, memories: list[MemoryItem]) -> list[MemoryItem]:\n        \"\"\"获取应该被遗忘的记忆\"\"\"\n        self.apply_decay(memories)\n        \n        forgettable = []\n        for item in memories:\n            # 规则1：重要性过低\n            if item.importance < self.min_importance:\n                forgettable.append(item)\n                continue\n            \n            # 规则2：年龄过大\n            if item.age_hours / 24 > self.max_age_days:\n                forgettable.append(item)\n                continue\n            \n            # 规则3：过期\n            if item.is_expired:\n                forgettable.append(item)\n        \n        return forgettable\n    \n    def forget(self, memories: list[MemoryItem]) -> list[MemoryItem]:\n        \"\"\"执行遗忘\"\"\"\n        forgettable_ids = {id(m) for m in self.get_forgettable(memories)}\n        remaining = [m for m in memories if id(m) not in forgettable_ids]\n        return remaining",
      "section_ref": "7.6.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-19",
      "language": "python",
      "description": "当新信息与旧记忆冲突时，需要更新而非保留两者：",
      "code": "class MemoryUpdater:\n    \"\"\"记忆更新管理器\"\"\"\n    \n    def __init__(self, embedder: TextEmbedder, llm, threshold: float = 0.85):\n        self.embedder = embedder\n        self.llm = llm\n        self.threshold = threshold  # 相似度阈值\n    \n    def check_conflicts(\n        self,\n        new_content: str,\n        existing_memories: list[MemoryItem]\n    ) -> list[tuple[MemoryItem, float]]:\n        \"\"\"检查新记忆是否与已有记忆冲突\"\"\"\n        new_embedding = self.embedder.embed(new_content)\n        \n        conflicts = []\n        for item in existing_memories:\n            if item.embedding is None:\n                item.embedding = self.embedder.embed(item.content)\n            \n            similarity = LongTermMemory._cosine_similarity(\n                new_embedding, item.embedding\n            )\n            \n            if similarity >= self.threshold:\n                conflicts.append((item, similarity))\n        \n        return conflicts\n    \n    def resolve_conflict(\n        self,\n        new_content: str,\n        old_memory: MemoryItem,\n        similarity: float\n    ) -> MemoryItem:\n        \"\"\"解决记忆冲突\"\"\"\n        \n        # 如果相似度极高（>0.95），可能是重复信息\n        if similarity > 0.95:\n            # 保留更重要的那个\n            if new_content in old_memory.content:\n                return old_memory  # 完全重复，保留旧的\n            # 让 LLM 判断哪个更准确\n            prompt = f\"\"\"以下两条信息非常相似，请判断哪个更准确/更新：\n\n信息A（时间：{old_memory.timestamp}）：\n{old_memory.content}\n\n信息B（最新）：\n{new_content}\n\n请回答：\n1. 如果B更新更准确，返回 \"UPDATE: B的理由\"\n2. 如果A仍然准确，返回 \"KEEP: A的理由\"\n3. 如果两者互补，返回 \"MERGE: 合并后的内容\"\n\"\"\"\n            response = self.llm.chat(\n                messages=[{\"role\": \"user\", \"content\": prompt}],\n                temperature=0.1\n            ).content\n            \n            if response.startswith(\"UPDATE\"):\n                # 用新信息更新旧记忆\n                old_memory.content = new_content\n                old_memory.timestamp = datetime.now()\n                old_memory.importance = max(old_memory.importance, 0.7)\n                return old_memory\n            elif response.startswith(\"MERGE\"):\n                # 合并\n                merged_content = response.split(\":\", 1)[1].strip()\n                old_memory.content = merged_content\n                old_memory.timestamp = datetime.now()\n                old_memory.metadata[\"merged\"] = True\n                return old_memory\n            else:\n                return old_memory\n        else:\n            # 相似但不完全相同，可能需要补充\n            old_memory.metadata[\"related_new_info\"] = new_content\n            return old_memory",
      "section_ref": "7.6.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-20",
      "language": "python",
      "description": "",
      "code": "class ImportanceEvaluator:\n    \"\"\"记忆重要性自动评估器\"\"\"\n    \n    def __init__(self, llm):\n        self.llm = llm\n    \n    def evaluate(self, content: str, context: str = \"\") -> float:\n        \"\"\"评估记忆的重要性（0-1）\"\"\"\n        prompt = f\"\"\"评估以下信息的重要性（0-10分）。\n\n评估标准：\n- 10分：关键事实、重要决策、用户核心偏好\n- 7-9分：有用的信息、中等重要的发现\n- 4-6分：一般性信息、可能有用\n- 1-3分：临时信息、很快会过时\n\n信息：{content}\n{f\"上下文：{context}\" if context else \"\"}\n\n只返回数字（0-10）。\"\"\"\n        \n        try:\n            response = self.llm.chat(\n                messages=[{\"role\": \"user\", \"content\": prompt}],\n                temperature=0.1\n            )\n            score = float(response.content.strip())\n            return min(1.0, max(0.0, score / 10))\n        except (ValueError, Exception):\n            return 0.5  # 默认中等重要性\n    \n    def batch_evaluate(\n        self,\n        items: list[tuple[str, str]]\n    ) -> list[float]:\n        \"\"\"批量评估\"\"\"\n        # 简化实现：逐个评估\n        return [\n            self.evaluate(content, context)\n            for content, context in items\n        ]",
      "section_ref": "7.6.3",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-21",
      "language": "python",
      "description": "",
      "code": "# ❌ 无限增长的上下文\ndef chat(messages: list):\n    # messages 会不断增长，最终超出上下文窗口\n    response = llm.chat(messages=messages)\n    messages.append({\"role\": \"assistant\", \"content\": response})\n    return response\n\n# ✅ 管理上下文长度\ndef chat(messages: list, context_manager):\n    managed = context_manager.manage(messages)\n    response = llm.chat(messages=managed)\n    return response",
      "section_ref": "7.7.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-22",
      "language": "python",
      "description": "",
      "code": "# ❌ 过度压缩\nprompt = \"请用一句话总结以下对话\"  # 太短了！\n\n# ✅ 保留关键信息\nprompt = \"\"\"请将对话压缩为摘要，确保保留：\n1. 所有数字、日期、人名等具体信息\n2. 用户明确表达的偏好和要求\n3. 任何承诺或待办事项\n4. 技术细节和代码片段\n\n摘要长度：200-300字\"\"\"",
      "section_ref": "7.7.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-23",
      "language": "python",
      "description": "",
      "code": "# ❌ 检索到过时的记忆\nmemory: \"用户偏好使用 Python 2.7\"  # 2019年的记忆\n# 直接使用这个记忆来推荐技术方案 → 错误！\n\n# ✅ 检查时效性\nif memory.age_hours > 24 * 365:  # 超过1年\n    memory.importance *= 0.3  # 降低重要性\n    # 或标记为\"需要确认\"",
      "section_ref": "7.7.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-24",
      "language": "python",
      "description": "",
      "code": "MEMORY_BEST_PRACTICES = \"\"\"\n## 记忆管理最佳实践\n\n### ✅ 存储策略\n- [ ] 区分短期/长期记忆，不要把所有东西都存长期\n- [ ] 存储时标注重要性，方便后续过滤\n- [ ] 记忆内容简洁化——存储结论而非原始对话\n- [ ] 添加元数据（类别、标签、来源），方便检索\n\n### ✅ 检索策略\n- [ ] 使用混合检索（语义 + 时效 + 重要性）\n- [ ] 设置相似度阈值，过滤低质量结果\n- [ ] 检索结果数量控制在 3-7 条\n- [ ] 将检索到的记忆格式化后注入上下文\n\n### ✅ 上下文管理\n- [ ] 监控上下文 Token 使用率\n- [ ] 设置上下文预算（预留输出空间）\n- [ ] 优先保留系统消息和最近对话\n- [ ] 使用摘要而非简单截断\n\n### ✅ 维护策略\n- [ ] 定期执行遗忘（清理低重要性、过期记忆）\n- [ ] 检测并解决记忆冲突\n- [ ] 记忆整合（合并相似记忆）\n- [ ] 监控记忆系统的存储大小和检索延迟\n\"\"\"",
      "section_ref": "7.7.2",
      "runnable": true,
      "dependencies": []
    }
  ],
  "tables": [
    {
      "headers": [
        "章节",
        "核心收获"
      ],
      "data": [
        [
          "**第4章：Agent核心概念**",
          "理解了 Agent 的架构模型、核心组件、生命周期和评估体系"
        ],
        [
          "**第5章：LLM与Prompt Engineering**",
          "掌握了与 LLM 高效沟通的技巧——Prompt 设计、CoT 推理、模板管理"
        ],
        [
          "**第6章：工具调用**",
          "学会了赋予 Agent 行动能力——Function Calling、工具开发、错误处理"
        ],
        [
          "**第7章：记忆与上下文**",
          "实现了 Agent 的持久化能力——记忆分层、向量检索、上下文管理"
        ]
      ]
    }
  ],
  "key_takeaways": [],
  "common_pitfalls": [],
  "related_chapters": [
    "ch04",
    "ch12",
    "ch26"
  ]
}