{
  "metadata": {
    "id": "ch09",
    "title": "第9章：Agent推理与规划",
    "volume": "vol3",
    "volume_title": "进阶篇",
    "word_count": 2672,
    "difficulty": "intermediate",
    "prerequisites": [
      "ch04",
      "ch05"
    ],
    "key_concepts": [
      "推理：Agent 的\"思考\"核心",
      "推理的层次模型",
      "ReAct：推理与行动交织",
      "ReAct 核心循环",
      "ReAct 完整实现",
      "ReAct 的局限",
      "Tree-of-Thought：树状思维推理",
      "从线性到树状",
      "ToT 实现框架",
      "搜索策略对比",
      "Graph-of-Thought：图状思维推理",
      "从树到图",
      "GoT 实现框架",
      "任务分解与子目标规划",
      "任务分解器"
    ],
    "learning_objectives": [],
    "estimated_tokens": 1603,
    "source_file": "vol3/ch09_Agent推理与规划.md"
  },
  "overview": "",
  "sections": [
    {
      "id": "9.1",
      "title": "9.1 推理：Agent 的\"思考\"核心",
      "level": 2,
      "content": "推理（Reasoning）是 Agent 区别于传统程序的关键能力。传统程序执行预定义的逻辑，而 Agent 能够根据上下文进行动态推理，选择最优的行动方案。",
      "subsections": [
        {
          "id": "9.1.1",
          "title": "9.1.1 推理的层次模型",
          "content": "推理层次的差异决定了 Agent 的能力边界。一个只具备 Layer 1 能力的 Agent 只能做简单的模式匹配；而具备 Layer 4 能力的 Agent 能自我审视推理过程，发现并纠正自己的逻辑错误。现实中的生产级 Agent，通常需要 Layer 2-3 的推理能力。\n\n---"
        }
      ]
    },
    {
      "id": "9.2",
      "title": "9.2 ReAct：推理与行动交织",
      "level": 2,
      "content": "ReAct（Reasoning and Acting）由 Yao et al. 在 2023 年提出，是 Agent 推理的基础范式。其核心思想是让模型在推理（Thought）和行动（Action）之间交替进行，每一步行动的结果作为下一步推理的输入。",
      "subsections": [
        {
          "id": "9.2.1",
          "title": "9.2.1 ReAct 核心循环",
          "content": "与单纯的 Chain-of-Thought（CoT）相比，ReAct 的关键改进在于引入了**外部工具交互**。CoT 只能基于模型内部知识进行推理，而 ReAct 可以在推理过程中主动获取信息——搜索网页、查询数据库、执行代码。"
        },
        {
          "id": "9.2.2",
          "title": "9.2.2 ReAct 完整实现",
          "content": ""
        },
        {
          "id": "9.2.3",
          "title": "9.2.3 ReAct 的局限",
          "content": "ReAct 虽然强大，但有几个固有局限：\n\n1. **线性依赖**：推理链是线性的，一旦某步出错，后续全部受影响\n2. **无法回溯**：走入死胡同后只能从头开始\n3. **视角单一**：每次只探索一条推理路径\n\n这些局限催生了更高级的推理策略——Tree-of-Thought 和 Graph-of-Thought。\n\n---"
        }
      ]
    },
    {
      "id": "9.3",
      "title": "9.3 Tree-of-Thought：树状思维推理",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.3.1",
          "title": "9.3.1 从线性到树状",
          "content": "Tree-of-Thought（ToT）由 Yao et al. 在 2023 年提出，将推理过程从单链扩展为树结构，允许模型同时探索多条推理路径，并通过评估选择最优者。\n\n\nToT 的关键创新在于**分叉探索 + 评估剪枝**。每个推理步骤可以生成多个候选思维，评估后只保留最有前景的分支继续深入。"
        },
        {
          "id": "9.3.2",
          "title": "9.3.2 ToT 实现框架",
          "content": ""
        },
        {
          "id": "9.3.3",
          "title": "9.3.3 搜索策略对比",
          "content": "| 策略 | 描述 | 适用场景 | 优势 | 劣势 |\n|------|------|---------|------|------|\n| **BFS** | 逐层展开，每层保留 top_k | 方案比较、创意生成 | 全局最优 | 消耗大 |\n| **DFS** | 深度优先探索单条路径 | 数学推理、逻辑证明 | 节省资源 | 可能错过好分支 |\n| **Beam Search** | 固定宽度束搜索 | 翻译、摘要生成 | 平衡效率和质量 | 宽度是超参数 |\n| **MCTS** | 蒙特卡洛树搜索 | 博弈、复杂决策 | 渐进精确 | 实现复杂 |\n\n---"
        }
      ]
    },
    {
      "id": "9.4",
      "title": "9.4 Graph-of-Thought：图状思维推理",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.4.1",
          "title": "9.4.1 从树到图",
          "content": "Graph-of-Thought（GoT）由 Besta et al. 在 2023 年提出，将推理空间从树扩展为有向无环图（DAG）。相比 ToT，GoT 增加了一个关键操作：**合并（Aggregate）**——可以将不同分支的思维合并为更全面的结论。\n\n\nGoT 支持四种图操作：\n\n| 操作 | 描述 |\n|------|------|\n| **Branch（分叉）** | 从现有节点生成新分支 |\n| **Merge（合并）** | 合并多个节点的思维 |\n| **Refine（精炼）** | 改进现有节点的思维 |\n| **Loop（循环）** | 回到之前节点重新推理 |"
        },
        {
          "id": "9.4.2",
          "title": "9.4.2 GoT 实现框架",
          "content": "---"
        }
      ]
    },
    {
      "id": "9.5",
      "title": "9.5 任务分解与子目标规划",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.5.1",
          "title": "9.5.1 任务分解器",
          "content": "复杂任务需要分解为可管理的子任务。关键原则：\n\n- **原子性**：每个子任务应该独立可执行\n- **显式依赖**：明确标注子任务间的依赖关系\n- **可并行化**：无依赖的子任务应标记为可并行\n\n\n---"
        }
      ]
    },
    {
      "id": "9.6",
      "title": "9.6 自我反思与迭代优化",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.6.1",
          "title": "9.6.1 Reflexion 模式",
          "content": "Reflexion（Shinn et al., 2023）让 Agent 在失败后生成\"反思\"文本，作为后续尝试的额外上下文。这模拟了人类\"从错误中学习\"的能力。"
        },
        {
          "id": "9.6.2",
          "title": "9.6.2 迭代优化器",
          "content": "---"
        }
      ]
    },
    {
      "id": "9.7",
      "title": "9.7 规划失败的处理策略",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.7.1",
          "title": "9.7.1 失败模式分类",
          "content": "| 失败模式 | 描述 | 处理策略 |\n|---------|------|---------|\n| **工具失败** | 工具调用返回错误 | 重试 → 降级 → 替代工具 |\n| **推理死锁** | 循环推理无法前进 | 回溯到上一个分支 |\n| **信息不足** | 缺少关键信息 | 主动搜索/询问用户 |\n| **超时** | 推理时间过长 | 返回最佳已知答案 |\n| **目标矛盾** | 子目标互相冲突 | 重新规划 |"
        },
        {
          "id": "9.7.2",
          "title": "9.7.2 弹性规划器",
          "content": "---"
        }
      ]
    },
    {
      "id": "9.8",
      "title": "9.8 动态规划与在线学习",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.8.1",
          "title": "9.8.1 自适应推理策略选择",
          "content": "不同类型的任务适合不同的推理策略。一个成熟的 Agent 应该能自动判断任务类型并选择最优策略："
        },
        {
          "id": "9.8.2",
          "title": "9.8.2 从执行中学习",
          "content": "---"
        }
      ]
    },
    {
      "id": "9.9",
      "title": "9.9 最佳实践与常见陷阱",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "9.9.1",
          "title": "9.9.1 推理策略选择决策树",
          "content": ""
        },
        {
          "id": "9.9.2",
          "title": "9.9.2 常见陷阱",
          "content": ""
        },
        {
          "id": "9.9.3",
          "title": "9.9.3 生产环境推理检查清单",
          "content": "---"
        }
      ]
    },
    {
      "id": "9.10",
      "title": "9.10 小结",
      "level": 2,
      "content": "本章深入探讨了 Agent 推理与规划的核心技术：\n\n- **ReAct** 是推理的基础范式，将思考与行动交织，让 Agent 能动态获取信息\n- **Tree-of-Thought** 将线性推理扩展为树状探索，支持多路径比较和剪枝\n- **Graph-of-Thought** 进一步引入合并和精炼操作，实现更灵活的思维协同\n- **任务分解** 是处理复杂问题的前提，原子化子任务 + 显式依赖是关键\n- **自我反思** 让 Agent 能从失败中学习，Reflexion 模式显著提升多轮任务表现\n- **弹性规划** 确保规划失败时系统仍能优雅降级\n- **自适应推理** 根据任务特征动态选择策略，平衡质量与成本\n\n**核心洞见**：推理能力不是越复杂越好，而是要与问题复杂度匹配。简单的 CoT 对多数场景已经足够；ToT/GoT 适用于真正需要探索的复杂决策；而 ReAct 是需要外部信息时的首选。\n\n**下一章预告：** 第10章将建立完整的 Agent 评估体系，从指标定义到基准测试，从自动化评估到人类评估，帮助你系统地衡量和提升 Agent 的表现。\n\n---\n\n*第9章 · Agent推理与规划* | *Agent 编程：从原理到生产级实践 · 卷三 · 进阶篇*",
      "subsections": []
    }
  ],
  "code_blocks": [
    {
      "id": "code-1",
      "language": "text",
      "description": "推理（Reasoning）是 Agent 区别于传统程序的关键能力。传统程序执行预定义的逻辑，而 Agent 能够根据上下文进行动态推理，选择最优的行动方案。",
      "code": "┌─────────────────────────────────────────┐\n│     Layer 4: 元推理 (Meta-Reasoning)        │\n│     \"我的推理过程是否正确？\"               │\n├─────────────────────────────────────────┤\n│     Layer 3: 规划推理 (Planning)           │\n│     \"为了达到目标，我应该按什么顺序做？\"     │\n├─────────────────────────────────────────┤\n│     Layer 2: 因果推理 (Causal)             │\n│     \"如果我做X，会导致Y吗？\"              │\n├─────────────────────────────────────────┤\n│     Layer 1: 模式匹配 (Pattern)            │\n│     \"这个问题类似于我见过的...\"            │\n└─────────────────────────────────────────┘",
      "section_ref": "9.1.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-2",
      "language": "text",
      "description": "ReAct（Reasoning and Acting）由 Yao et al. 在 2023 年提出，是 Agent 推理的基础范式。其核心思想是让模型在推理（Thought）和行动（Action）之",
      "code": "┌─────────────────────────────────────────┐\n│              ReAct 循环                    │\n│                                         │\n│   ┌──────────┐                           │\n│   │  Question │                          │\n│   └────┬─────┘                           │\n│        ↓                                 │\n│   ┌──────────┐                           │\n│   │ Thought  │ ← \"我需要...\"              │\n│   └────┬─────┘                           │\n│        ↓                                 │\n│   ┌──────────┐                           │\n│   │ Action   │ ← 调用工具/搜索/计算       │\n│   └────┬─────┘                           │\n│        ↓                                 │\n│   ┌──────────┐                           │\n│   │Observation│ ← 工具返回结果            │\n│   └────┬─────┘                           │\n│        ↓                                 │\n│   ┌──────────┐                           │\n│   │ Thought  │ ← \"根据结果...\"            │\n│   └────┬─────┘                           │\n│        ↓                                 │\n│   ┌──────────┐                           │\n│   │ Answer   │ ← 最终答案                 │\n│   └──────────┘                           │\n└─────────────────────────────────────────┘",
      "section_ref": "9.2.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-3",
      "language": "python",
      "description": "与单纯的 Chain-of-Thought（CoT）相比，ReAct 的关键改进在于引入了外部工具交互。CoT 只能基于模型内部知识进行推理，而 ReAct 可以在推理过程中主动获取信息——搜索网页、",
      "code": "from typing import Callable, Any\nfrom dataclasses import dataclass, field\nfrom enum import Enum\nimport json, re\n\n\nclass StepType(Enum):\n    THOUGHT = \"thought\"\n    ACTION = \"action\"\n    OBSERVATION = \"observation\"\n    ANSWER = \"answer\"\n\n\n@dataclass\nclass ReActStep:\n    \"\"\"ReAct 单步记录\"\"\"\n    step_type: StepType\n    content: str\n    tool_name: str | None = None\n    tool_input: dict | None = None\n    observation: str | None = None\n\n\nclass ToolRegistry:\n    \"\"\"工具注册中心\"\"\"\n\n    def __init__(self):\n        self._tools: dict[str, dict] = {}\n\n    def register(self, name: str, description: str,\n                 function: Callable, parameters: dict = None):\n        self._tools[name] = {\n            \"name\": name,\n            \"description\": description,\n            \"function\": function,\n            \"parameters\": parameters or {},\n        }\n\n    async def execute(self, name: str, **kwargs) -> str:\n        tool = self._tools.get(name)\n        if not tool:\n            return f\"错误: 工具 '{name}' 不存在\"\n        try:\n            result = await tool[\"function\"](**kwargs)\n            return str(result)\n        except Exception as e:\n            return f\"工具执行错误: {str(e)}\"\n\n    def get_tools_description(self) -> str:\n        lines = []\n        for name, tool in self._tools.items():\n            lines.append(f\"- {name}: {tool['description']}\")\n        return \"\\n\".join(lines)\n\n\nclass ReActAgent:\n    \"\"\"ReAct Agent 实现\"\"\"\n\n    SYSTEM_PROMPT = \"\"\"你是一个有帮助的 AI 助手。请使用以下格式回答问题：\n\n思考: 分析当前情况和下一步行动\n行动: 工具名[参数]\n观察: (系统会提供)\n... (思考/行动/观察 重复)\n最终答案: 问题的最终答案\n\n可用工具:\n{tools}\n\n规则:\n1. 每次只能使用一个工具\n2. 思考必须明确说明选择理由\n3. 工具失败时思考替代方案\n4. 确定答案后输出\"最终答案: ...\"\n\"\"\"\n\n    def __init__(self, llm_client, tools: ToolRegistry,\n                 max_iterations: int = 10):\n        self.llm = llm_client\n        self.tools = tools\n        self.max_iterations = max_iterations\n        self.history: list[ReActStep] = []\n\n    async def run(self, question: str) -> str:\n        system = self.SYSTEM_PROMPT.format(\n            tools=self.tools.get_tools_description()\n        )\n        messages = [\n            {\"role\": \"system\", \"content\": system},\n            {\"role\": \"user\", \"content\": question},\n        ]\n\n        for _ in range(self.max_iterations):\n            response = await self.llm.chat(messages)\n            steps = self._parse_response(response)\n\n            for step in steps:\n                self.history.append(step)\n\n                if step.step_type == StepType.ANSWER:\n                    return step.content\n\n                if step.step_type == StepType.ACTION:\n                    obs = await self.tools.execute(\n                        step.tool_name, **(step.tool_input or {})\n                    )\n                    self.history.append(ReActStep(\n                        step_type=StepType.OBSERVATION,\n                        content=obs,\n                    ))\n                    messages.append({\n                        \"role\": \"assistant\",\n                        \"content\": f\"行动: {step.content}\"\n                    })\n                    messages.append({\n                        \"role\": \"user\",\n                        \"content\": f\"观察: {obs}\"\n                    })\n\n        return \"抱歉，在最大迭代次数内未能找到答案。\"\n\n    def _parse_response(self, response: str) -> list[ReActStep]:\n        steps = []\n        for line in response.strip().split(\"\\n\"):\n            line = line.strip()\n            if not line:\n                continue\n            if line.startswith(\"思考:\"):\n                steps.append(ReActStep(\n                    step_type=StepType.THOUGHT,\n                    content=line[3:].strip()))\n            elif line.startswith(\"行动:\"):\n                name, inp = self._parse_action(line[3:].strip())\n                steps.append(ReActStep(\n                    step_type=StepType.ACTION,\n                    content=line[3:].strip(),\n                    tool_name=name, tool_input=inp))\n            elif line.startswith(\"最终答案:\"):\n                steps.append(ReActStep(\n                    step_type=StepType.ANSWER,\n                    content=line[5:].strip()))\n        return steps\n\n    def _parse_action(self, text: str) -> tuple:\n        match = re.match(r'(\\w+)\\[(.+)\\]', text)\n        if match:\n            name = match.group(1)\n            try:\n                inp = json.loads(match.group(2))\n            except json.JSONDecodeError:\n                inp = {\"query\": match.group(2)}\n            return name, inp\n        return text, {}\n\n\n# ---- 使用示例 ----\nasync def demo_react():\n    tools = ToolRegistry()\n\n    async def calculator(expression: str) -> str:\n        try:\n            return str(eval(expression))\n        except Exception as e:\n            return f\"计算错误: {e}\"\n\n    async def search(query: str) -> str:\n        return f\"搜索 '{query}' 的结果: Python 3.12 于 2023年10月发布。\"\n\n    tools.register(\"calculator\", \"计算数学表达式\", calculator)\n    tools.register(\"search\", \"搜索信息\", search)\n\n    # agent = ReActAgent(llm_client, tools)\n    # result = await agent.run(\"Python 3.12 发布距今多少天？\")",
      "section_ref": "9.2.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-4",
      "language": "text",
      "description": "Tree-of-Thought（ToT）由 Yao et al. 在 2023 年提出，将推理过程从单链扩展为树结构，允许模型同时探索多条推理路径，并通过评估选择最优者。",
      "code": "ReAct (线性):              Tree-of-Thought (树状):\n\nQ → T1 → A1 → Answer      Q ─┬─ T1a ─ T2a ─ T3a → Answer ✓\n                              ├─ T1b ─ T2b → (放弃)\n                              └─ T1c ─ T2c ─ T3c → Answer ✓",
      "section_ref": "9.3.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-5",
      "language": "python",
      "description": "ToT 的关键创新在于分叉探索 + 评估剪枝。每个推理步骤可以生成多个候选思维，评估后只保留最有前景的分支继续深入。",
      "code": "from dataclasses import dataclass, field\nfrom typing import Any\n\n\n@dataclass\nclass ThoughtNode:\n    \"\"\"思维节点\"\"\"\n    id: str\n    content: str\n    parent_id: str | None = None\n    children_ids: list[str] = field(default_factory=list)\n    score: float = 0.0\n    depth: int = 0\n    is_terminal: bool = False\n\n\nclass TreeOfThought:\n    \"\"\"树状思维推理引擎\"\"\"\n\n    def __init__(self, llm_client: Any,\n                 num_branches: int = 3,\n                 max_depth: int = 5,\n                 top_k: int = 2):\n        self.llm = llm_client\n        self.num_branches = num_branches\n        self.max_depth = max_depth\n        self.top_k = top_k\n        self.nodes: dict[str, ThoughtNode] = {}\n        self._counter = 0\n\n    def _gen_id(self) -> str:\n        self._counter += 1\n        return f\"n_{self._counter}\"\n\n    async def solve(self, problem: str) -> str:\n        \"\"\"使用树状思维解决问题\"\"\"\n\n        # 1. 生成初始思维分支\n        thoughts = await self._generate_thoughts(problem)\n        for t in thoughts:\n            nid = self._gen_id()\n            self.nodes[nid] = ThoughtNode(id=nid, content=t, depth=0)\n\n        frontier = list(self.nodes.keys())\n\n        # 2. BFS 逐层扩展\n        for depth in range(1, self.max_depth + 1):\n            next_frontier = []\n\n            for nid in frontier:\n                node = self.nodes[nid]\n                if node.is_terminal:\n                    continue\n\n                # 生成分支\n                children = await self._generate_thoughts(\n                    f\"问题: {problem}\\n之前: {node.content}\"\n                )\n                for c in children:\n                    cid = self._gen_id()\n                    child = ThoughtNode(\n                        id=cid, content=c,\n                        parent_id=nid, depth=depth,\n                    )\n                    self.nodes[cid] = child\n                    node.children_ids.append(cid)\n                    next_frontier.append(cid)\n\n            # 3. 评估 + 剪枝\n            for cid in next_frontier:\n                node = self.nodes[cid]\n                node.score = await self._evaluate(node.content, problem)\n                node.is_terminal = self._check_terminal(node.content)\n\n            # 保留 top_k\n            scored = sorted(\n                next_frontier,\n                key=lambda x: self.nodes[x].score,\n                reverse=True\n            )\n            frontier = scored[:self.top_k]\n\n        # 4. 返回最佳路径\n        return self._best_path_summary()\n\n    async def _generate_thoughts(self, context: str) -> list[str]:\n        \"\"\"生成多个候选思维\"\"\"\n        prompt = f\"\"\"基于以下上下文，生成 {self.num_branches} 个不同的推理步骤。\n用换行分隔：\n{context}\"\"\"\n        # response = await self.llm.chat(prompt)\n        # return [l.strip() for l in response.split(\"\\n\") if l.strip()]\n        return [f\"思维步骤 {i+1}\" for i in range(self.num_branches)]\n\n    async def _evaluate(self, thought: str, problem: str) -> float:\n        \"\"\"评估思维质量 (0-10)\"\"\"\n        prompt = f\"\"\"评估以下推理步骤的质量(0-10):\n问题: {problem}\n步骤: {thought}\n只输出一个数字。\"\"\"\n        # return float(await self.llm.chat(prompt))\n        return 7.0\n\n    def _check_terminal(self, thought: str) -> bool:\n        markers = [\"答案是\", \"因此\", \"最终\", \"结论\"]\n        return any(m in thought for m in markers)\n\n    def _best_path_summary(self) -> str:\n        terminals = [n for n in self.nodes.values() if n.is_terminal]\n        if not terminals:\n            terminals = [max(self.nodes.values(), key=lambda n: n.score)]\n        best = max(terminals, key=lambda n: n.score)\n        path = []\n        cur = best\n        while cur:\n            path.append(cur.content)\n            cur = self.nodes.get(cur.parent_id)\n        path.reverse()\n        return \"\\n→ \".join(path)",
      "section_ref": "9.3.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-6",
      "language": "text",
      "description": "Graph-of-Thought（GoT）由 Besta et al. 在 2023 年提出，将推理空间从树扩展为有向无环图（DAG）。相比 ToT，GoT 增加了一个关键操作：合并（Aggregat",
      "code": "Tree-of-Thought:          Graph-of-Thought:\n\n    A                          A ──┐\n    ├─ B                       │   │\n    │  └─ C                B       D\n    └─ D                       │  ╲ │ ╱\n       └─ E                C ── E ── F (合并)",
      "section_ref": "9.4.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-7",
      "language": "python",
      "description": "| Loop（循环） | 回到之前节点重新推理 |",
      "code": "from collections import defaultdict\n\n\nclass GraphOfThought:\n    \"\"\"图状思维推理引擎\"\"\"\n\n    def __init__(self, llm_client: Any):\n        self.llm = llm_client\n        self.nodes: dict[str, dict] = {}\n        self.edges: dict[str, list[str]] = defaultdict(list)\n        self._counter = 0\n\n    def _gen_id(self) -> str:\n        self._counter += 1\n        return f\"g_{self._counter}\"\n\n    def add_node(self, content: str, score: float = 0.0) -> str:\n        nid = self._gen_id()\n        self.nodes[nid] = {\"content\": content, \"score\": score}\n        return nid\n\n    def add_edge(self, from_id: str, to_id: str, etype: str):\n        self.edges[from_id].append(to_id)\n        # 存储边类型（用于可视化/调试）\n        self.edges[f\"{from_id}->{to_id}\"] = etype\n\n    async def branch(self, parent_id: str, num: int = 2) -> list[str]:\n        \"\"\"从父节点分叉出新思维\"\"\"\n        parent = self.nodes[parent_id]\n        prompt = f\"\"\"基于: {parent['content']}\n生成 {num} 个不同的后续推理步骤。\"\"\"\n        # response = await self.llm.chat(prompt)\n        child_ids = []\n        for i in range(num):\n            cid = self.add_node(f\"分支 {i+1}\")\n            self.add_edge(parent_id, cid, \"branch\")\n            child_ids.append(cid)\n        return child_ids\n\n    async def merge(self, node_ids: list[str]) -> str:\n        \"\"\"合并多个节点的思维\"\"\"\n        contents = \"\\n\".join(\n            f\"- {self.nodes[nid]['content']}\" for nid in node_ids\n        )\n        prompt = f\"\"\"合并以下思维为一个综合结论:\n{contents}\"\"\"\n        # response = await self.llm.chat(prompt)\n        merged_id = self.add_node(\"合并结果\")\n        for nid in node_ids:\n            self.add_edge(nid, merged_id, \"merge\")\n        return merged_id\n\n    async def refine(self, node_id: str) -> str:\n        \"\"\"精炼改进某个节点\"\"\"\n        node = self.nodes[node_id]\n        prompt = f\"\"\"改进以下推理（保持核心思路）:\n{node['content']}\"\"\"\n        # response = await self.llm.chat(prompt)\n        refined_id = self.add_node(\"改进结果\")\n        self.add_edge(node_id, refined_id, \"refine\")\n        return refined_id\n\n    async def solve(self, problem: str, budget: int = 15) -> str:\n        \"\"\"使用图操作解决问题\"\"\"\n        # 初始节点\n        init_ids = [self.add_node(f\"初始思维 {i}\") for i in range(3)]\n\n        ops = 0\n        frontier = init_ids[:]\n\n        while ops < budget:\n            import random\n            op = random.choice([\"branch\", \"branch\", \"merge\", \"refine\"])\n\n            match op:\n                case \"branch\":\n                    parent = random.choice(frontier)\n                    new_ids = await self.branch(parent)\n                    frontier.extend(new_ids)\n\n                case \"merge\":\n                    if len(frontier) >= 2:\n                        pair = random.sample(frontier, 2)\n                        mid = await self.merge(pair)\n                        frontier.append(mid)\n\n                case \"refine\":\n                    if frontier:\n                        target = random.choice(frontier)\n                        rid = await self.refine(target)\n                        frontier.append(rid)\n\n            ops += 1\n\n        # 返回得分最高的节点\n        best = max(self.nodes.values(), key=lambda n: n.get(\"score\", 0))\n        return best[\"content\"]",
      "section_ref": "9.4.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-8",
      "language": "python",
      "description": "- 可并行化：无依赖的子任务应标记为可并行",
      "code": "from dataclasses import dataclass, field\nfrom typing import Any\n\n\n@dataclass\nclass SubTask:\n    id: str\n    description: str\n    dependencies: list[str] = field(default_factory=list)\n    estimated_effort: float = 1.0\n    status: str = \"pending\"\n    result: Any = None\n\n\nclass TaskDecomposer:\n    \"\"\"智能任务分解器\"\"\"\n\n    PROMPT = \"\"\"将以下任务分解为子任务。\n规则:\n1. 每个子任务原子化（不可再分）\n2. 标注依赖关系\n3. 估算复杂度 (1-5)\n4. 最多8个子任务\n\n任务: {task}\n\n输出JSON:\n{{\"subtasks\": [{{\"id\":\"s1\",\"description\":\"...\",\"dependencies\":[],\"complexity\":1}}]}}\"\"\"\n\n    async def decompose(self, task: str, llm_client=None) -> list[SubTask]:\n        # response = await llm_client.chat(self.PROMPT.format(task=task))\n        # parsed = json.loads(response)\n        # return [SubTask(**s) for s in parsed[\"subtasks\"]]\n\n        # 模拟\n        return [\n            SubTask(id=\"s1\", description=\"分析需求文档\"),\n            SubTask(id=\"s2\", description=\"设计技术方案\",\n                    dependencies=[\"s1\"]),\n            SubTask(id=\"s3\", description=\"实现核心功能\",\n                    dependencies=[\"s2\"]),\n            SubTask(id=\"s4\", description=\"编写测试\",\n                    dependencies=[\"s3\"]),\n            SubTask(id=\"s5\", description=\"安全审查\",\n                    dependencies=[\"s3\"]),\n            SubTask(id=\"s6\", description=\"部署上线\",\n                    dependencies=[\"s4\", \"s5\"]),\n        ]\n\n\nclass TaskPlanner:\n    \"\"\"任务规划器：生成分层执行计划\"\"\"\n\n    def plan(self, subtasks: list[SubTask]) -> list[list[str]]:\n        \"\"\"生成分层并行执行计划\"\"\"\n        task_map = {st.id: st for st in subtasks}\n        layers = []\n        completed = set()\n        remaining = set(st.id for st in subtasks)\n\n        while remaining:\n            ready = [\n                tid for tid in remaining\n                if all(dep in completed\n                       for dep in task_map[tid].dependencies)\n            ]\n            if not ready:\n                raise ValueError(\"循环依赖检测\")\n            layers.append(sorted(ready))\n            completed.update(ready)\n            remaining -= set(ready)\n\n        return layers\n\n\n# 演示\ndef demo_planning():\n    decomposer = TaskDecomposer()\n    planner = TaskPlanner()\n\n    subtasks = [\n        SubTask(id=\"s1\", description=\"分析需求\"),\n        SubTask(id=\"s2\", description=\"设计\", dependencies=[\"s1\"]),\n        SubTask(id=\"s3\", description=\"编码\", dependencies=[\"s2\"]),\n        SubTask(id=\"s4\", description=\"测试\", dependencies=[\"s3\"]),\n        SubTask(id=\"s5\", description=\"审查\", dependencies=[\"s3\"]),\n    ]\n\n    plan = planner.plan(subtasks)\n    for i, layer in enumerate(plan):\n        label = \"并行\" if len(layer) > 1 else \"\"\n        print(f\"第{i+1}层{label}: {', '.join(layer)}\")\n\n    # 输出:\n    # 第1层: s1\n    # 第2层: s2\n    # 第3层: s3\n    # 第4层并行: s4, s5",
      "section_ref": "9.5.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-9",
      "language": "python",
      "description": "Reflexion（Shinn et al., 2023）让 Agent 在失败后生成\"反思\"文本，作为后续尝试的额外上下文。这模拟了人类\"从错误中学习\"的能力。",
      "code": "@dataclass\nclass ReflectionEntry:\n    attempt: int\n    action: str\n    result: str\n    success: bool\n    reflection: str\n    lesson: str\n\n\nclass ReflexionAgent:\n    \"\"\"具备反思能力的 Agent\"\"\"\n\n    MAX_REFLECTIONS = 3\n\n    def __init__(self, llm_client, max_attempts: int = 5):\n        self.llm = llm_client\n        self.max_attempts = max_attempts\n        self.reflections: list[ReflectionEntry] = []\n\n    async def run(self, task: str, evaluator) -> dict:\n        for attempt in range(1, self.max_attempts + 1):\n            context = self._build_context(task, attempt)\n            action = await self._get_action(context)\n            result = await self._execute(action)\n            evaluation = await evaluator(result)\n\n            if evaluation[\"success\"]:\n                return {\"success\": True, \"result\": result,\n                        \"attempts\": attempt}\n\n            # 反思失败原因\n            reflection = await self._reflect(action, result)\n            self.reflections.append(ReflectionEntry(\n                attempt=attempt, action=action, result=result,\n                success=False,\n                reflection=reflection[\"text\"],\n                lesson=reflection[\"lesson\"],\n            ))\n\n        return {\"success\": False, \"reflections\": self.reflections}\n\n    def _build_context(self, task: str, attempt: int) -> str:\n        parts = [f\"任务: {task}\", f\"当前: 第{attempt}次尝试\"]\n        if self.reflections:\n            parts.append(\"历史教训:\")\n            for r in self.reflections[-self.MAX_REFLECTIONS:]:\n                parts.append(f\"  - {r.lesson}\")\n        return \"\\n\".join(parts)\n\n    async def _reflect(self, action: str, result: str) -> dict:\n        \"\"\"生成反思\"\"\"\n        prompt = f\"\"\"上次操作失败。\n操作: {action}\n结果: {result}\n请反思: 1) 为什么失败? 2) 下次应该怎么做?\n输出JSON: {{\"text\":\"...\",\"lesson\":\"...\"}}\"\"\"\n        # response = await self.llm.chat(prompt)\n        return {\"text\": \"分析中...\", \"lesson\": \"下次注意...\"}",
      "section_ref": "9.6.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-10",
      "language": "python",
      "description": "",
      "code": "class IterativeOptimizer:\n    \"\"\"对解决方案进行多轮迭代优化\"\"\"\n\n    async def optimize(self, initial: str, task: str,\n                       evaluator, max_iter: int = 5,\n                       threshold: float = 0.1) -> dict:\n        current = initial\n        current_score = await evaluator(current)\n        history = [{\"iter\": 0, \"score\": current_score}]\n\n        for i in range(1, max_iter + 1):\n            analysis = await self._analyze(current, task, current_score)\n            improved = await self._improve(current, analysis, task)\n            new_score = await evaluator(improved)\n            delta = new_score - current_score\n            history.append({\"iter\": i, \"score\": new_score,\n                            \"delta\": delta})\n\n            if delta > threshold:\n                current = improved\n                current_score = new_score\n            else:\n                break  # 改进幅度不足，停止\n\n        return {\n            \"solution\": current,\n            \"score\": current_score,\n            \"iterations\": len(history) - 1,\n            \"history\": history,\n        }",
      "section_ref": "9.6.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-11",
      "language": "python",
      "description": "| 目标矛盾 | 子目标互相冲突 | 重新规划 |",
      "code": "from enum import Enum\nfrom typing import Callable\n\n\nclass FailureType(Enum):\n    TOOL_ERROR = \"tool_error\"\n    DEADLOCK = \"deadlock\"\n    INFO_GAP = \"info_gap\"\n    TIMEOUT = \"timeout\"\n    CONFLICT = \"conflict\"\n\n\nclass ResilientPlanner:\n    \"\"\"弹性规划器：处理各种失败\"\"\"\n\n    def __init__(self, llm_client):\n        self.llm = llm_client\n        self.failure_log: list[dict] = []\n        self.handlers: dict[FailureType, Callable] = {\n            FailureType.TOOL_ERROR: self._tool_fallback,\n            FailureType.DEADLOCK: self._backtrack,\n            FailureType.INFO_GAP: self._ask_user,\n            FailureType.TIMEOUT: self._return_best,\n            FailureType.CONFLICT: self._replan,\n        }\n\n    async def plan_with_recovery(self, task: str,\n                                  max_retries: int = 3) -> dict:\n        for attempt in range(max_retries + 1):\n            try:\n                plan = await self._create_plan(task)\n                result = await self._execute_plan(plan)\n\n                if result[\"success\"]:\n                    return result\n\n                failure = self._classify_failure(result)\n                handler = self.handlers.get(failure)\n                if handler:\n                    recovery = await handler(task, result)\n                    if recovery.get(\"recovered\"):\n                        return recovery[\"result\"]\n\n            except Exception as e:\n                self.failure_log.append({\"error\": str(e)})\n\n        return {\"success\": False,\n                \"message\": \"超过最大恢复次数\"}\n\n    async def _tool_fallback(self, task, result):\n        \"\"\"工具失败 → 尝试替代方案\"\"\"\n        # 降级策略\n        simplified = await self._create_simple_plan(task)\n        return {\"recovered\": True,\n                \"result\": await self._execute_plan(simplified)}\n\n    async def _backtrack(self, task, result):\n        \"\"\"死锁 → 回溯\"\"\"\n        return {\"recovered\": False, \"action\": \"backtrack\"}\n\n    async def _ask_user(self, task, result):\n        \"\"\"信息不足 → 询问用户\"\"\"\n        return {\"recovered\": False,\n                \"action\": \"need_user_input\",\n                \"questions\": [\"请补充以下信息...\"]}\n\n    async def _return_best(self, task, result):\n        \"\"\"超时 → 返回当前最佳\"\"\"\n        return {\"recovered\": True,\n                \"result\": {\"answer\": result.get(\"best_known\", \"\"),\n                           \"note\": \"超时，返回已知最佳答案\"}}\n\n    async def _replan(self, task, result):\n        \"\"\"目标冲突 → 重新规划\"\"\"\n        return {\"recovered\": False, \"action\": \"replan\"}",
      "section_ref": "9.7.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-12",
      "language": "python",
      "description": "不同类型的任务适合不同的推理策略。一个成熟的 Agent 应该能自动判断任务类型并选择最优策略：",
      "code": "class AdaptiveReasoner:\n    \"\"\"根据任务特征自动选择推理策略\"\"\"\n\n    PROFILES = {\n        \"mathematical\": {\n            \"strategy\": \"cot\",      # Chain-of-Thought\n            \"fallback\": \"tot\",\n            \"temperature\": 0.0,\n            \"max_steps\": 5,\n        },\n        \"creative\": {\n            \"strategy\": \"tot\",\n            \"fallback\": \"got\",\n            \"temperature\": 0.7,\n            \"max_steps\": 8,\n        },\n        \"analytical\": {\n            \"strategy\": \"react\",\n            \"fallback\": \"cot\",\n            \"temperature\": 0.1,\n            \"max_steps\": 10,\n        },\n        \"factual\": {\n            \"strategy\": \"react\",\n            \"fallback\": \"react\",\n            \"temperature\": 0.0,\n            \"max_steps\": 6,\n        },\n    }\n\n    async def classify_and_reason(self, task: str) -> dict:\n        # 1. 分类\n        task_type = await self._classify(task)\n        profile = self.PROFILES[task_type]\n\n        # 2. 用主策略推理\n        result = await self._reason(task, profile[\"strategy\"], profile)\n\n        # 3. 主策略失败则用备选\n        if not result[\"success\"]:\n            result = await self._reason(\n                task, profile[\"fallback\"], profile)\n\n        return {\"task_type\": task_type, **result}\n\n    async def _classify(self, task: str) -> str:\n        prompt = f\"\"\"分类为: mathematical, creative, analytical, factual\n任务: {task}\n只输出分类名。\"\"\"\n        # return (await self.llm.chat(prompt)).strip().lower()\n        return \"analytical\"",
      "section_ref": "9.8.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-13",
      "language": "python",
      "description": "",
      "code": "from collections import defaultdict\n\n\nclass OnlineLearner:\n    \"\"\"Agent 在线学习器\"\"\"\n\n    def __init__(self):\n        self.history: list[dict] = []\n        self.strategy_perf: dict[str, list[float]] = defaultdict(list)\n\n    async def learn(self, task: str, strategy: str, outcome: dict):\n        \"\"\"记录一次执行的经验\"\"\"\n        self.history.append({\n            \"task\": task,\n            \"strategy\": strategy,\n            \"success\": outcome[\"success\"],\n            \"score\": outcome.get(\"score\", 0),\n        })\n        self.strategy_perf[strategy].append(outcome.get(\"score\", 0))\n\n    def best_strategy(self) -> str:\n        \"\"\"基于历史数据选择最佳策略\"\"\"\n        if not self.strategy_perf:\n            return \"react\"\n        return max(\n            self.strategy_perf,\n            key=lambda s: sum(self.strategy_perf[s]) /\n                          len(self.strategy_perf[s])\n        )\n\n    def performance_summary(self) -> dict:\n        summary = {}\n        for s, scores in self.strategy_perf.items():\n            if scores:\n                summary[s] = {\n                    \"avg\": sum(scores) / len(scores),\n                    \"runs\": len(scores),\n                    \"best\": max(scores),\n                }\n        return summary",
      "section_ref": "9.8.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-14",
      "language": "text",
      "description": "",
      "code": "任务需要什么？\n├── 数学推理 / 逻辑链\n│   └── CoT + 自我验证\n├── 创意探索 / 方案比较\n│   └── ToT (BFS)\n├── 复杂决策 / 多因素权衡\n│   └── GoT\n├── 需要外部信息\n│   └── ReAct\n└── 简单问答\n    └── 直接生成（无需特殊策略）",
      "section_ref": "9.9.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-15",
      "language": "python",
      "description": "",
      "code": "# ❌ 陷阱1：过度推理\n# 简单问题使用复杂策略 → 浪费成本和时间\nif complexity(task) <= 2:\n    result = await simple_generate(task)\nelse:\n    result = await tot.solve(task)  # 只在需要时才用\n\n# ❌ 陷阱2：忽视推理成本\n# ToT 每步都调用 LLM，5 层 × 3 分支 = 15 次调用\nbudget = ReasoningBudget(max_calls=20, max_tokens=10000)\n\n# ❌ 陷阱3：盲目信任推理结果\n# LLM 的推理可能包含逻辑错误\nasync def verified_reasoning(task):\n    chain = await self.reason(task)\n    for step in chain.critical_steps:\n        if not await self.verify(step):\n            step = await self.rereason(step)\n    return chain\n\n# ❌ 陷阱4：反思循环\n# 不断反思但从不行动\nMAX_REFLECTIONS = 3\nfor attempt in range(max_attempts):\n    action = await self.plan(task)\n    result = await self.execute(action)\n    if self.evaluate(result):\n        return result\n    if attempt < MAX_REFLECTIONS:\n        context += f\"\\n反思: {await self.reflect(result)}\"\n    # 超过限制后强制行动",
      "section_ref": "9.9.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-16",
      "language": "text",
      "description": "",
      "code": "推理系统上线检查清单：\n├── [ ] 任务分类器已验证（准确率 > 90%）\n├── [ ] 每种推理策略有超时和预算限制\n├── [ ] 失败降级策略已定义\n├── [ ] 反思次数有上限（防止循环）\n├── [ ] 推理链可追溯（日志完整）\n├── [ ] 关键步骤有验证机制\n├── [ ] Token 消耗有预算告警\n├── [ ] 性能基准测试已通过\n└── [ ] 成本/质量权衡已评估",
      "section_ref": "9.9.3",
      "runnable": false,
      "dependencies": []
    }
  ],
  "tables": [
    {
      "headers": [
        "策略",
        "描述",
        "适用场景",
        "优势",
        "劣势"
      ],
      "data": [
        [
          "**BFS**",
          "逐层展开，每层保留 top_k",
          "方案比较、创意生成",
          "全局最优",
          "消耗大"
        ],
        [
          "**DFS**",
          "深度优先探索单条路径",
          "数学推理、逻辑证明",
          "节省资源",
          "可能错过好分支"
        ],
        [
          "**Beam Search**",
          "固定宽度束搜索",
          "翻译、摘要生成",
          "平衡效率和质量",
          "宽度是超参数"
        ],
        [
          "**MCTS**",
          "蒙特卡洛树搜索",
          "博弈、复杂决策",
          "渐进精确",
          "实现复杂"
        ]
      ]
    },
    {
      "headers": [
        "操作",
        "描述"
      ],
      "data": [
        [
          "**Branch（分叉）**",
          "从现有节点生成新分支"
        ],
        [
          "**Merge（合并）**",
          "合并多个节点的思维"
        ],
        [
          "**Refine（精炼）**",
          "改进现有节点的思维"
        ],
        [
          "**Loop（循环）**",
          "回到之前节点重新推理"
        ]
      ]
    },
    {
      "headers": [
        "失败模式",
        "描述",
        "处理策略"
      ],
      "data": [
        [
          "**工具失败**",
          "工具调用返回错误",
          "重试 → 降级 → 替代工具"
        ],
        [
          "**推理死锁**",
          "循环推理无法前进",
          "回溯到上一个分支"
        ],
        [
          "**信息不足**",
          "缺少关键信息",
          "主动搜索/询问用户"
        ],
        [
          "**超时**",
          "推理时间过长",
          "返回最佳已知答案"
        ],
        [
          "**目标矛盾**",
          "子目标互相冲突",
          "重新规划"
        ]
      ]
    }
  ],
  "key_takeaways": [
    "ReAct 是推理的基础范式，将思考与行动交织，让 Agent 能动态获取信息",
    "Tree-of-Thought 将线性推理扩展为树状探索，支持多路径比较和剪枝",
    "Graph-of-Thought 进一步引入合并和精炼操作，实现更灵活的思维协同",
    "任务分解 是处理复杂问题的前提，原子化子任务 + 显式依赖是关键",
    "自我反思 让 Agent 能从失败中学习，Reflexion 模式显著提升多轮任务表现",
    "弹性规划 确保规划失败时系统仍能优雅降级",
    "自适应推理 根据任务特征动态选择策略，平衡质量与成本"
  ],
  "common_pitfalls": [],
  "related_chapters": [
    "ch04",
    "ch05",
    "ch16",
    "ch17",
    "ch19",
    "ch21"
  ]
}