{
  "metadata": {
    "id": "ch15",
    "title": "第15章：Agent的可观测性",
    "volume": "vol4",
    "volume_title": "高级篇",
    "word_count": 1854,
    "difficulty": "intermediate",
    "prerequisites": [
      "ch04"
    ],
    "key_concepts": [
      "可观测性的三支柱",
      "日志、指标、追踪",
      "Agent可观测性框架",
      "Agent系统可观测性挑战",
      "非确定性",
      "LLM黑盒问题",
      "结构化日志",
      "Agent专用日志格式",
      "LLM I/O日志",
      "分布式追踪",
      "OpenTelemetry集成",
      "Token使用追踪",
      "指标与监控",
      "Agent核心指标",
      "Prometheus集成"
    ],
    "learning_objectives": [],
    "estimated_tokens": 1112,
    "source_file": "vol4/ch15_Agent的可观测性.md"
  },
  "overview": "传统软件系统的可观测性依赖于确定性的行为——相同输入产生相同输出，日志可以精确还原执行过程。但 Agent 系统天生是非确定性的：LLM 可能对同一输入生成不同的推理路径，工具调用顺序可能变化，甚至在\"思考\"过程中产生幻觉。这种非确定性使得 Agent 系统的可观测性变得尤为重要且具有挑战性。本章将系统地讲解如何为 Agent 系统构建完整的可观测性体系，涵盖结构化日志、分布式追踪、指标监控、决策可视化和评估反馈。",
  "sections": [
    {
      "id": "15.1",
      "title": "15.1 可观测性的三支柱",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.1.1",
          "title": "15.1.1 日志、指标、追踪",
          "content": "可观测性（Observability）的三个核心支柱在 Agent 系统中各有特殊含义：\n\n\n| 支柱 | 传统系统 | Agent系统 | 特殊挑战 |\n|------|---------|----------|---------|\n| **日志** | 请求/响应记录 | LLM I/O、推理链、工具调用 | 数据量大，需结构化 |\n| **指标** | QPS、延迟、错误率 | Token消耗、推理步骤数、工具命中率 | 成本追踪是新维度 |\n| **追踪** | 服务间调用链 | 多步推理链、并行工具调用 | Span嵌套深，持续时间长 |"
        },
        {
          "id": "15.1.2",
          "title": "15.1.2 Agent可观测性框架",
          "content": ""
        }
      ]
    },
    {
      "id": "15.2",
      "title": "15.2 Agent系统可观测性挑战",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.2.1",
          "title": "15.2.1 非确定性",
          "content": ""
        },
        {
          "id": "15.2.2",
          "title": "15.2.2 LLM黑盒问题",
          "content": "LLM 是一个巨大的神经网络，我们无法直接观察其内部推理过程。但可以通过以下手段间接\"照亮\"黑盒："
        }
      ]
    },
    {
      "id": "15.3",
      "title": "15.3 结构化日志",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.3.1",
          "title": "15.3.1 Agent专用日志格式",
          "content": ""
        },
        {
          "id": "15.3.2",
          "title": "15.3.2 LLM I/O日志",
          "content": ""
        }
      ]
    },
    {
      "id": "15.4",
      "title": "15.4 分布式追踪",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.4.1",
          "title": "15.4.1 OpenTelemetry集成",
          "content": ""
        },
        {
          "id": "15.4.2",
          "title": "15.4.2 Token使用追踪",
          "content": ""
        }
      ]
    },
    {
      "id": "15.5",
      "title": "15.5 指标与监控",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.5.1",
          "title": "15.5.1 Agent核心指标",
          "content": ""
        },
        {
          "id": "15.5.2",
          "title": "15.5.2 Prometheus集成",
          "content": ""
        }
      ]
    },
    {
      "id": "15.6",
      "title": "15.6 Agent决策可视化",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.6.1",
          "title": "15.6.1 思维链可视化",
          "content": ""
        },
        {
          "id": "15.6.2",
          "title": "15.6.2 工具调用图",
          "content": ""
        }
      ]
    },
    {
      "id": "15.7",
      "title": "15.7 评估与反馈循环",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.7.1",
          "title": "15.7.1 人类反馈（HITL）",
          "content": ""
        },
        {
          "id": "15.7.2",
          "title": "15.7.2 自动评估管道",
          "content": ""
        }
      ]
    },
    {
      "id": "15.8",
      "title": "15.8 可观测性平台搭建",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "15.8.1",
          "title": "15.8.1 Grafana Dashboard配置",
          "content": ""
        },
        {
          "id": "15.8.2",
          "title": "15.8.2 告警规则",
          "content": ""
        }
      ]
    },
    {
      "id": "最佳实践",
      "title": "最佳实践",
      "level": 2,
      "content": "1. **从第一天就接入追踪**：不要等到出问题才加可观测性，新项目一开始就集成 OpenTelemetry\n2. **结构化日志优先**：JSON格式的结构化日志是后续分析和告警的基础\n3. **成本实时追踪**：LLM调用成本是Agent系统最大的运营风险，必须实时监控\n4. **保存完整执行记录**：每次Agent执行的完整追踪记录（包括LLM I/O）是调试和优化的宝贵数据\n5. **建立评估基线**：定义核心场景的评估指标和基线分数，持续追踪改进",
      "subsections": []
    },
    {
      "id": "常见陷阱",
      "title": "常见陷阱",
      "level": 2,
      "content": "1. **日志过于冗长**：记录完整的LLM输入输出会导致日志量爆炸。使用脱敏和截断\n2. **只看平均不看分布**：延迟分布比平均值更有意义。关注 P95/P99 而非平均值\n3. **忽略成本维度**：很多团队只监控性能，忽略LLM成本，导致月末账单惊人\n4. **追踪信息不完整**：缺少上下文传播（trace_id跨服务传递），导致无法还原完整链路\n5. **告警疲劳**：设置过多低质量告警，导致真正重要的问题被忽略",
      "subsections": []
    },
    {
      "id": "小结",
      "title": "小结",
      "level": 2,
      "content": "可观测性是 Agent 系统从实验室走向生产的关键基础设施。通过结构化日志、分布式追踪、指标监控、决策可视化和评估反馈五大体系，开发者可以对 Agent 的行为进行全面洞察。Agent 系统的非确定性决定了可观测性不是可选的，而是必须的——没有可观测性，就无法可靠地运营 Agent 系统。",
      "subsections": []
    },
    {
      "id": "延伸阅读",
      "title": "延伸阅读",
      "level": 2,
      "content": "1. **OpenTelemetry文档**: https://opentelemetry.io/docs/\n2. **Prometheus文档**: https://prometheus.io/docs/\n3. **Grafana文档**: https://grafana.com/docs/\n4. **LangSmith**: https://smith.langchain.com/ — LangChain的可观测平台\n5. **Weights & Biases Weave**: https://wandb.ai/weave — LLM应用追踪\n6. **论文**: \"Evaluation of Large Language Model Agents\" — Agent评估方法论",
      "subsections": []
    }
  ],
  "code_blocks": [
    {
      "id": "code-1",
      "language": "mermaid",
      "description": "可观测性（Observability）的三个核心支柱在 Agent 系统中各有特殊含义：",
      "code": "graph TB\n    subgraph \"可观测性三支柱\"\n        A[日志 Logs] --> A1[\"Agent对话日志\"]\n        A --> A2[\"LLM输入/输出\"]\n        A --> A3[\"工具调用记录\"]\n        \n        B[指标 Metrics] --> B1[\"延迟（E2E/LLM/工具）\"]\n        B --> B2[\"Token消耗与成本\"]\n        B --> B3[\"成功率与失败率\"]\n        B --> B4[\"用户满意度\"]\n        \n        C[追踪 Traces] --> C1[\"请求完整链路\"]\n        C --> C2[\"推理步骤Span\"]\n        C --> C3[\"工具调用Span\"]\n        C --> C4[\"上下文传播\"]\n    end",
      "section_ref": "15.1.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-2",
      "language": "python",
      "description": "| 追踪 | 服务间调用链 | 多步推理链、并行工具调用 | Span嵌套深，持续时间长 |",
      "code": "from dataclasses import dataclass, field\nfrom datetime import datetime\nfrom typing import Any\nfrom enum import Enum\nimport uuid\n\nclass SpanKind(Enum):\n    LLM_CALL = \"llm_call\"\n    TOOL_CALL = \"tool_call\"\n    REASONING = \"reasoning\"\n    RETRIEVAL = \"retrieval\"\n    USER_INPUT = \"user_input\"\n    AGENT_OUTPUT = \"agent_output\"\n\n@dataclass\nclass Span:\n    \"\"\"追踪单元\"\"\"\n    trace_id: str\n    span_id: str = field(default_factory=lambda: str(uuid.uuid4())[:8])\n    parent_id: str | None = None\n    kind: SpanKind = SpanKind.REASONING\n    name: str = \"\"\n    start_time: datetime = field(default_factory=datetime.now)\n    end_time: datetime | None = None\n    attributes: dict[str, Any] = field(default_factory=dict)\n    events: list[dict] = field(default_factory=list)\n    status: str = \"ok\"  # ok, error\n    \n    @property\n    def duration_ms(self) -> float:\n        if self.end_time:\n            return (self.end_time - self.start_time).total_seconds() * 1000\n        return 0\n\nclass Tracer:\n    \"\"\"简易追踪器\"\"\"\n    \n    def __init__(self):\n        self._traces: dict[str, list[Span]] = {}\n    \n    def start_trace(self) -> str:\n        trace_id = str(uuid.uuid4())[:16]\n        self._traces[trace_id] = []\n        return trace_id\n    \n    def start_span(self, trace_id: str, kind: SpanKind,\n                   name: str, parent_id: str | None = None) -> Span:\n        span = Span(\n            trace_id=trace_id, kind=kind, name=name,\n            parent_id=parent_id\n        )\n        self._traces[trace_id].append(span)\n        return span\n    \n    def end_span(self, span: Span, status: str = \"ok\"):\n        span.end_time = datetime.now()\n        span.status = status\n    \n    def get_trace(self, trace_id: str) -> list[Span]:\n        return self._traces.get(trace_id, [])",
      "section_ref": "15.1.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-3",
      "language": "python",
      "description": "",
      "code": "class NonDeterminismTracker:\n    \"\"\"非确定性追踪器\"\"\"\n    \n    def __init__(self):\n        self._replay_store: dict[str, dict] = {}\n    \n    async def record_execution(self, trace_id: str, \n                               input_data: dict,\n                               execution_log: list[dict],\n                               output: Any):\n        \"\"\"记录完整执行过程，支持回放\"\"\"\n        record = {\n            \"trace_id\": trace_id,\n            \"input\": input_data,\n            \"execution_log\": execution_log,\n            \"output\": output,\n            \"timestamp\": datetime.now().isoformat()\n        }\n        self._replay_store[trace_id] = record\n    \n    async def replay(self, trace_id: str, \n                     agent: Any) -> dict:\n        \"\"\"回放执行过程（使用缓存结果）\"\"\"\n        record = self._replay_store.get(trace_id)\n        if not record:\n            raise ValueError(f\"未找到执行记录: {trace_id}\")\n        \n        # 模拟执行，但不实际调用LLM\n        mock_results = {}\n        for step in record[\"execution_log\"]:\n            if step[\"type\"] == \"llm_call\":\n                mock_results[step[\"span_id\"]] = step[\"output\"]\n        \n        return {\n            \"original_output\": record[\"output\"],\n            \"mock_available\": len(mock_results),\n            \"steps\": len(record[\"execution_log\"])\n        }",
      "section_ref": "15.2.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-4",
      "language": "python",
      "description": "LLM 是一个巨大的神经网络，我们无法直接观察其内部推理过程。但可以通过以下手段间接\"照亮\"黑盒：",
      "code": "class LLMCallInspector:\n    \"\"\"LLM调用检查器\"\"\"\n    \n    def __init__(self, tracer: Tracer):\n        self.tracer = tracer\n    \n    async def inspected_call(self, trace_id: str,\n                             llm: Any, messages: list[dict],\n                             **kwargs) -> dict:\n        \"\"\"带完整检查的LLM调用\"\"\"\n        # 记录输入\n        span = self.tracer.start_span(\n            trace_id, SpanKind.LLM_CALL, \"llm_inference\"\n        )\n        span.attributes.update({\n            \"model\": kwargs.get(\"model\", \"unknown\"),\n            \"input_messages\": len(messages),\n            \"input_tokens_est\": sum(\n                len(m.get(\"content\", \"\").split()) \n                for m in messages\n            ) * 1.3,  # 粗略估算\n            \"temperature\": kwargs.get(\"temperature\", 0.0),\n        })\n        \n        try:\n            # 执行调用\n            response = await llm.chat(messages, **kwargs)\n            \n            # 记录输出\n            span.attributes.update({\n                \"output_tokens\": response.usage.completion_tokens\n                    if hasattr(response, \"usage\") else 0,\n                \"total_tokens\": response.usage.total_tokens\n                    if hasattr(response, \"usage\") else 0,\n                \"finish_reason\": response.choices[0].finish_reason\n                    if hasattr(response, \"choices\") else \"unknown\",\n                \"tool_calls\": len(response.choices[0].message.tool_calls)\n                    if (hasattr(response, \"choices\") and \n                        hasattr(response.choices[0].message, \"tool_calls\"))\n                    else 0,\n            })\n            \n            self.tracer.end_span(span)\n            return response\n        \n        except Exception as e:\n            span.events.append({\n                \"name\": \"error\",\n                \"timestamp\": datetime.now().isoformat(),\n                \"attributes\": {\"error\": str(e)}\n            })\n            self.tracer.end_span(span, status=\"error\")\n            raise",
      "section_ref": "15.2.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-5",
      "language": "python",
      "description": "",
      "code": "import json\nimport logging\nfrom typing import Any\n\nclass AgentJSONFormatter(logging.Formatter):\n    \"\"\"Agent专用的JSON日志格式化器\"\"\"\n    \n    def format(self, record: logging.LogRecord) -> str:\n        log_entry = {\n            \"timestamp\": datetime.now().isoformat(),\n            \"level\": record.levelname,\n            \"logger\": record.name,\n            \"message\": record.getMessage(),\n        }\n        \n        # 添加Agent特有字段\n        if hasattr(record, \"trace_id\"):\n            log_entry[\"trace_id\"] = record.trace_id\n        if hasattr(record, \"span_id\"):\n            log_entry[\"span_id\"] = record.span_id\n        if hasattr(record, \"agent_step\"):\n            log_entry[\"agent_step\"] = record.agent_step\n        if hasattr(record, \"token_usage\"):\n            log_entry[\"token_usage\"] = record.token_usage\n        if hasattr(record, \"cost_usd\"):\n            log_entry[\"cost_usd\"] = record.cost_usd\n        \n        # 额外字段\n        if hasattr(record, \"extra_fields\"):\n            log_entry.update(record.extra_fields)\n        \n        return json.dumps(log_entry, ensure_ascii=False)\n\nclass AgentLogger:\n    \"\"\"Agent日志记录器\"\"\"\n    \n    def __init__(self, agent_name: str):\n        self.logger = logging.getLogger(f\"agent.{agent_name}\")\n        self.logger.setLevel(logging.DEBUG)\n        \n        handler = logging.StreamHandler()\n        handler.setFormatter(AgentJSONFormatter())\n        self.logger.addHandler(handler)\n        \n        self._trace_id: str | None = None\n        self._step_counter = 0\n    \n    def set_trace(self, trace_id: str):\n        self._trace_id = trace_id\n        self._step_counter = 0\n    \n    def log_llm_call(self, messages: list[dict], \n                     response: dict, duration_ms: float,\n                     model: str = \"unknown\"):\n        self._step_counter += 1\n        self.logger.info(\n            f\"LLM调用: model={model}, \"\n            f\"input_msgs={len(messages)}, \"\n            f\"duration={duration_ms:.0f}ms\",\n            extra={\n                \"trace_id\": self._trace_id,\n                \"agent_step\": self._step_counter,\n                \"extra_fields\": {\n                    \"type\": \"llm_call\",\n                    \"model\": model,\n                    \"input_messages\": len(messages),\n                    \"duration_ms\": duration_ms,\n                    \"output_tokens\": response.get(\"usage\", {}).get(\"completion_tokens\", 0),\n                    \"total_tokens\": response.get(\"usage\", {}).get(\"total_tokens\", 0),\n                }\n            }\n        )\n    \n    def log_tool_call(self, tool_name: str, args: dict,\n                      result: Any, duration_ms: float,\n                      success: bool = True):\n        self._step_counter += 1\n        self.logger.info(\n            f\"工具调用: {tool_name}, \"\n            f\"duration={duration_ms:.0f}ms, \"\n            f\"success={success}\",\n            extra={\n                \"trace_id\": self._trace_id,\n                \"agent_step\": self._step_counter,\n                \"extra_fields\": {\n                    \"type\": \"tool_call\",\n                    \"tool_name\": tool_name,\n                    \"args_keys\": list(args.keys()),\n                    \"duration_ms\": duration_ms,\n                    \"success\": success,\n                    \"result_size\": len(str(result)) if result else 0,\n                }\n            }\n        )\n    \n    def log_decision(self, reasoning: str, action: str):\n        self._step_counter += 1\n        self.logger.info(\n            f\"决策: {action}\",\n            extra={\n                \"trace_id\": self._trace_id,\n                \"agent_step\": self._step_counter,\n                \"extra_fields\": {\n                    \"type\": \"decision\",\n                    \"reasoning\": reasoning[:500],\n                    \"action\": action,\n                }\n            }\n        )",
      "section_ref": "15.3.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-6",
      "language": "python",
      "description": "",
      "code": "class LLMIORecorder:\n    \"\"\"LLM输入输出记录器\"\"\"\n    \n    def __init__(self, storage_backend):\n        self.storage = storage_backend\n    \n    async def record(self, trace_id: str, step: int,\n                     input_messages: list[dict],\n                     output: dict, metadata: dict = None):\n        \"\"\"完整记录LLM交互\"\"\"\n        record = {\n            \"trace_id\": trace_id,\n            \"step\": step,\n            \"timestamp\": datetime.now().isoformat(),\n            \"input\": self._sanitize_messages(input_messages),\n            \"output\": {\n                \"content\": output.get(\"content\", \"\"),\n                \"tool_calls\": output.get(\"tool_calls\", []),\n                \"finish_reason\": output.get(\"finish_reason\"),\n            },\n            \"metadata\": metadata or {},\n        }\n        await self.storage.save(record)\n    \n    def _sanitize_messages(self, messages: list[dict]) -> list[dict]:\n        \"\"\"脱敏处理\"\"\"\n        sanitized = []\n        for msg in messages:\n            clean_msg = {\"role\": msg[\"role\"]}\n            content = msg.get(\"content\", \"\")\n            # 截断过长的内容\n            if len(content) > 10000:\n                clean_msg[\"content\"] = content[:10000] + \"...[TRUNCATED]\"\n            else:\n                clean_msg[\"content\"] = content\n            sanitized.append(clean_msg)\n        return sanitized",
      "section_ref": "15.3.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-7",
      "language": "python",
      "description": "",
      "code": "from opentelemetry import trace\nfrom opentelemetry.sdk.trace import TracerProvider\nfrom opentelemetry.sdk.trace.export import BatchSpanProcessor\nfrom opentelemetry.exporter.otlp.proto.grpc.trace_exporter import (\n    OTLPSpanExporter\n)\nfrom opentelemetry.trace.propagation.tracecontext import (\n    TraceContextTextMapPropagator\n)\n\nclass AgentTracer:\n    \"\"\"基于OpenTelemetry的Agent追踪\"\"\"\n    \n    def __init__(self, service_name: str = \"agent-service\",\n                 otlp_endpoint: str = \"localhost:4317\"):\n        # 初始化TracerProvider\n        provider = TracerProvider()\n        exporter = OTLPSpanExporter(endpoint=otlp_endpoint)\n        provider.add_span_processor(BatchSpanProcessor(exporter))\n        trace.set_tracer_provider(provider)\n        \n        self.tracer = trace.get_tracer(service_name)\n        self.propagator = TraceContextTextMapPropagator()\n    \n    def trace_agent_execution(self, agent_func):\n        \"\"\"装饰器：追踪Agent执行\"\"\"\n        async def wrapper(user_input: str, **kwargs) -> str:\n            with self.tracer.start_as_current_span(\n                \"agent.execution\"\n            ) as span:\n                span.set_attribute(\"agent.input\", user_input[:500])\n                span.set_attribute(\"agent.type\", kwargs.get(\"agent_type\", \"default\"))\n                \n                try:\n                    result = await agent_func(user_input, **kwargs)\n                    span.set_attribute(\"agent.output_length\", len(result))\n                    span.set_attribute(\"agent.status\", \"success\")\n                    return result\n                except Exception as e:\n                    span.set_attribute(\"agent.status\", \"error\")\n                    span.set_attribute(\"agent.error\", str(e))\n                    span.record_exception(e)\n                    raise\n        \n        return wrapper\n    \n    def trace_llm_call(self):\n        \"\"\"上下文管理器：追踪LLM调用\"\"\"\n        return self.tracer.start_as_current_span(\n            \"agent.llm_call\",\n            attributes={\"component\": \"llm\"}\n        )\n    \n    def trace_tool_call(self, tool_name: str):\n        \"\"\"上下文管理器：追踪工具调用\"\"\"\n        return self.tracer.start_as_current_span(\n            f\"agent.tool_call.{tool_name}\",\n            attributes={\n                \"component\": \"tool\",\n                \"tool.name\": tool_name,\n            }\n        )",
      "section_ref": "15.4.1",
      "runnable": true,
      "dependencies": [
        "opentelemetry"
      ]
    },
    {
      "id": "code-8",
      "language": "python",
      "description": "",
      "code": "class TokenTracker:\n    \"\"\"Token使用追踪器\"\"\"\n    \n    def __init__(self):\n        self._usage: dict[str, dict] = {}  # trace_id -> usage\n    \n    def record(self, trace_id: str, model: str,\n               prompt_tokens: int, completion_tokens: int):\n        if trace_id not in self._usage:\n            self._usage[trace_id] = {\n                \"models\": {},\n                \"total_prompt_tokens\": 0,\n                \"total_completion_tokens\": 0,\n                \"total_tokens\": 0,\n                \"estimated_cost_usd\": 0.0,\n            }\n        \n        usage = self._usage[trace_id]\n        usage[\"total_prompt_tokens\"] += prompt_tokens\n        usage[\"total_completion_tokens\"] += completion_tokens\n        usage[\"total_tokens\"] += prompt_tokens + completion_tokens\n        \n        # 估算成本\n        cost = self._estimate_cost(model, prompt_tokens, completion_tokens)\n        usage[\"estimated_cost_usd\"] += cost\n        \n        if model not in usage[\"models\"]:\n            usage[\"models\"][model] = {\"calls\": 0, \"tokens\": 0, \"cost\": 0.0}\n        usage[\"models\"][model][\"calls\"] += 1\n        usage[\"models\"][model][\"tokens\"] += prompt_tokens + completion_tokens\n        usage[\"models\"][model][\"cost\"] += cost\n    \n    def _estimate_cost(self, model: str, prompt: int, \n                       completion: int) -> float:\n        \"\"\"根据模型估算成本（美元）\"\"\"\n        pricing = {\n            \"gpt-4o\": (2.5/1M, 10/1M),\n            \"gpt-4o-mini\": (0.15/1M, 0.6/1M),\n            \"gpt-4-turbo\": (10/1M, 30/1M),\n            \"claude-3-5-sonnet\": (3/1M, 15/1M),\n            \"claude-3-haiku\": (0.25/1M, 1.25/1M),\n        }\n        if model in pricing:\n            p, c = pricing[model]\n            return prompt * p + completion * c\n        return prompt * 0.003/1M + completion * 0.015/1M\n    \n    def get_usage(self, trace_id: str) -> dict:\n        return self._usage.get(trace_id, {})",
      "section_ref": "15.4.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-9",
      "language": "python",
      "description": "",
      "code": "from dataclasses import dataclass, field\nfrom collections import deque\nimport time\n\n@dataclass\nclass AgentMetrics:\n    \"\"\"Agent核心指标收集器\"\"\"\n    \n    # 延迟指标\n    e2e_latencies: deque = field(\n        default_factory=lambda: deque(maxlen=1000)\n    )\n    llm_latencies: deque = field(\n        default_factory=lambda: deque(maxlen=1000)\n    )\n    tool_latencies: deque = field(\n        default_factory=lambda: deque(maxlen=1000)\n    )\n    \n    # 成功率指标\n    total_requests: int = 0\n    successful_requests: int = 0\n    failed_requests: int = 0\n    \n    # Token与成本\n    total_tokens: int = 0\n    total_cost_usd: float = 0.0\n    \n    # 推理步骤\n    step_counts: deque = field(\n        default_factory=lambda: deque(maxlen=1000)\n    )\n    \n    def record_e2e_latency(self, latency_ms: float):\n        self.e2e_latencies.append(latency_ms)\n    \n    def record_llm_latency(self, latency_ms: float):\n        self.llm_latencies.append(latency_ms)\n    \n    def record_tool_latency(self, latency_ms: float):\n        self.tool_latencies.append(latency_ms)\n    \n    def record_request(self, success: bool):\n        self.total_requests += 1\n        if success:\n            self.successful_requests += 1\n        else:\n            self.failed_requests += 1\n    \n    def record_tokens(self, tokens: int, cost: float):\n        self.total_tokens += tokens\n        self.total_cost_usd += cost\n    \n    def record_steps(self, count: int):\n        self.step_counts.append(count)\n    \n    def get_summary(self) -> dict:\n        \"\"\"获取指标摘要\"\"\"\n        def avg(deque_val):\n            return sum(deque_val) / len(deque_val) if deque_val else 0\n        \n        def p95(deque_val):\n            if not deque_val:\n                return 0\n            sorted_vals = sorted(deque_val)\n            idx = int(len(sorted_vals) * 0.95)\n            return sorted_vals[min(idx, len(sorted_vals)-1)]\n        \n        return {\n            \"requests\": {\n                \"total\": self.total_requests,\n                \"success_rate\": (\n                    self.successful_requests / self.total_requests * 100\n                ) if self.total_requests > 0 else 0,\n            },\n            \"latency_ms\": {\n                \"e2e_avg\": avg(self.e2e_latencies),\n                \"e2e_p95\": p95(self.e2e_latencies),\n                \"llm_avg\": avg(self.llm_latencies),\n                \"tool_avg\": avg(self.tool_latencies),\n            },\n            \"tokens_and_cost\": {\n                \"total_tokens\": self.total_tokens,\n                \"total_cost_usd\": round(self.total_cost_usd, 4),\n                \"avg_tokens_per_request\": (\n                    self.total_tokens / self.total_requests\n                ) if self.total_requests > 0 else 0,\n            },\n            \"agent_efficiency\": {\n                \"avg_steps\": avg(self.step_counts),\n                \"max_steps\": max(self.step_counts) if self.step_counts else 0,\n            }\n        }",
      "section_ref": "15.5.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-10",
      "language": "python",
      "description": "",
      "code": "from prometheus_client import Counter, Histogram, Gauge, start_http_server\n\nclass AgentPrometheusMetrics:\n    \"\"\"Agent Prometheus指标\"\"\"\n    \n    def __init__(self, port: int = 8000):\n        start_http_server(port)\n        \n        self.request_total = Counter(\n            \"agent_requests_total\",\n            \"Total agent requests\",\n            [\"agent_type\", \"status\"]\n        )\n        self.e2e_latency = Histogram(\n            \"agent_e2e_latency_seconds\",\n            \"End-to-end latency\",\n            [\"agent_type\"],\n            buckets=[0.5, 1, 2, 5, 10, 30, 60]\n        )\n        self.llm_latency = Histogram(\n            \"agent_llm_latency_seconds\",\n            \"LLM call latency\",\n            [\"model\"]\n        )\n        self.tool_latency = Histogram(\n            \"agent_tool_latency_seconds\",\n            \"Tool call latency\",\n            [\"tool_name\"]\n        )\n        self.token_usage = Counter(\n            \"agent_token_usage_total\",\n            \"Token usage\",\n            [\"model\", \"type\"]  # type: prompt/completion\n        )\n        self.cost_usd = Counter(\n            \"agent_cost_usd_total\",\n            \"Estimated cost in USD\",\n            [\"model\"]\n        )\n        self.active_requests = Gauge(\n            \"agent_active_requests\",\n            \"Currently active requests\"\n        )\n        self.step_count = Histogram(\n            \"agent_step_count\",\n            \"Number of reasoning steps per request\",\n            buckets=[1, 2, 3, 5, 10, 15, 20]\n        )",
      "section_ref": "15.5.2",
      "runnable": true,
      "dependencies": [
        "prometheus_client"
      ]
    },
    {
      "id": "code-11",
      "language": "python",
      "description": "",
      "code": "class ChainOfThoughtVisualizer:\n    \"\"\"思维链可视化\"\"\"\n    \n    def __init__(self, tracer: Tracer):\n        self.tracer = tracer\n    \n    def visualize_trace(self, trace_id: str) -> str:\n        \"\"\"将追踪数据可视化为Mermaid流程图\"\"\"\n        spans = self.tracer.get_trace(trace_id)\n        if not spans:\n            return \"无追踪数据\"\n        \n        lines = [\"graph TD\"]\n        node_map = {}\n        \n        for i, span in enumerate(spans):\n            node_id = f\"step{i}\"\n            label = span.name\n            \n            if span.kind == SpanKind.LLM_CALL:\n                label = f\"🧠 LLM: {span.attributes.get('model', '')}\"\n            elif span.kind == SpanKind.TOOL_CALL:\n                label = f\"🔧 工具: {span.name}\"\n            elif span.kind == SpanKind.REASONING:\n                label = f\"💭 推理: {span.name[:30]}\"\n            \n            # 添加耗时\n            if span.duration_ms:\n                label += f\" ({span.duration_ms:.0f}ms)\"\n            \n            # 状态标记\n            if span.status == \"error\":\n                label += \" ❌\"\n            \n            lines.append(f'    {node_id}[\"{label}\"]')\n            \n            # 连接父子节点\n            if span.parent_id and span.parent_id in node_map:\n                lines.append(f'    {node_map[span.parent_id]} --> {node_id}')\n            elif i > 0:\n                lines.append(f'    step{i-1} --> {node_id}')\n            \n            node_map[span.span_id] = node_id\n        \n        return \"\\n\".join(lines)\n    \n    def to_text_timeline(self, trace_id: str) -> str:\n        \"\"\"文本时间线\"\"\"\n        spans = self.tracer.get_trace(trace_id)\n        lines = [f\"📋 追踪 ID: {trace_id}\", \"=\" * 60]\n        \n        for i, span in enumerate(spans):\n            icon = {\n                SpanKind.LLM_CALL: \"🧠\",\n                SpanKind.TOOL_CALL: \"🔧\",\n                SpanKind.REASONING: \"💭\",\n            }.get(span.kind, \"📌\")\n            \n            status_icon = \"✅\" if span.status == \"ok\" else \"❌\"\n            lines.append(\n                f\"  [{i+1}] {icon} {span.name} \"\n                f\"{span.duration_ms:.0f}ms {status_icon}\"\n            )\n            if span.events:\n                for event in span.events:\n                    lines.append(f\"       └─ {event.get('name', '')}\")\n        \n        return \"\\n\".join(lines)",
      "section_ref": "15.6.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-12",
      "language": "python",
      "description": "",
      "code": "class ToolCallGraphBuilder:\n    \"\"\"工具调用依赖图\"\"\"\n    \n    def build(self, trace_id: str, \n              tracer: Tracer) -> dict:\n        spans = tracer.get_trace(trace_id)\n        \n        nodes = []\n        edges = []\n        tool_spans = [s for s in spans if s.kind == SpanKind.TOOL_CALL]\n        \n        for span in tool_spans:\n            nodes.append({\n                \"id\": span.span_id,\n                \"name\": span.name,\n                \"duration_ms\": span.duration_ms,\n                \"status\": span.status,\n                \"args\": span.attributes.get(\"args\", {}),\n            })\n            \n            if span.parent_id:\n                edges.append({\n                    \"from\": span.parent_id,\n                    \"to\": span.span_id,\n                })\n        \n        return {\"nodes\": nodes, \"edges\": edges}",
      "section_ref": "15.6.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-13",
      "language": "python",
      "description": "",
      "code": "class HumanFeedbackCollector:\n    \"\"\"人类反馈收集器\"\"\"\n    \n    def __init__(self, storage):\n        self.storage = storage\n    \n    async def collect_feedback(self, trace_id: str,\n                               rating: int,  # 1-5\n                               comment: str = \"\",\n                               correction: str = \"\"):\n        \"\"\"收集用户反馈\"\"\"\n        feedback = {\n            \"trace_id\": trace_id,\n            \"rating\": rating,\n            \"comment\": comment,\n            \"correction\": correction,\n            \"timestamp\": datetime.now().isoformat()\n        }\n        await self.storage.save(feedback)\n    \n    async def get_quality_metrics(self, \n                                   period_days: int = 7) -> dict:\n        \"\"\"计算质量指标\"\"\"\n        feedbacks = await self.storage.query(\n            period=period_days\n        )\n        \n        if not feedbacks:\n            return {}\n        \n        ratings = [f[\"rating\"] for f in feedbacks]\n        return {\n            \"avg_rating\": sum(ratings) / len(ratings),\n            \"satisfaction_rate\": (\n                sum(1 for r in ratings if r >= 4) / len(ratings) * 100\n            ),\n            \"total_feedbacks\": len(feedbacks),\n            \"with_corrections\": sum(\n                1 for f in feedbacks if f.get(\"correction\")\n            ),\n        }",
      "section_ref": "15.7.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-14",
      "language": "python",
      "description": "",
      "code": "class AgentEvaluationPipeline:\n    \"\"\"Agent自动评估管道\"\"\"\n    \n    def __init__(self, llm, evaluator_llm=None):\n        self.llm = llm\n        self.evaluator = evaluator_llm or llm\n    \n    async def evaluate(self, test_cases: list[dict]) -> dict:\n        \"\"\"批量评估Agent\"\"\"\n        results = []\n        \n        for case in test_cases:\n            # 执行Agent\n            agent_output = await self._run_agent(case)\n            \n            # 评估质量\n            score = await self._evaluate_quality(\n                case[\"expected\"], agent_output, case[\"criteria\"]\n            )\n            \n            results.append({\n                \"case_id\": case[\"id\"],\n                \"input\": case[\"input\"],\n                \"expected\": case[\"expected\"],\n                \"actual\": agent_output,\n                \"score\": score,\n            })\n        \n        return self._aggregate_results(results)\n    \n    async def _evaluate_quality(self, expected: str, actual: str,\n                                criteria: list[str]) -> float:\n        \"\"\"使用LLM评估输出质量\"\"\"\n        prompt = f\"\"\"\n        请评估以下Agent输出的质量：\n        \n        预期输出: {expected}\n        实际输出: {actual}\n        评估标准: {', '.join(criteria)}\n        \n        请给出0-1之间的分数，其中1表示完全符合预期。\n        只返回数字。\n        \"\"\"\n        response = await self.evaluator.generate(prompt)\n        try:\n            return float(response.strip())\n        except ValueError:\n            return 0.0\n    \n    def _aggregate_results(self, results: list[dict]) -> dict:\n        scores = [r[\"score\"] for r in results]\n        return {\n            \"total_cases\": len(results),\n            \"avg_score\": sum(scores) / len(scores),\n            \"pass_rate\": sum(1 for s in scores if s >= 0.8) / len(scores),\n            \"fail_cases\": [r for r in results if r[\"score\"] < 0.6],\n        }",
      "section_ref": "15.7.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-15",
      "language": "python",
      "description": "",
      "code": "# dashboard_config.py\nDASHBOARD_CONFIG = {\n    \"dashboard\": {\n        \"title\": \"Agent 可观测性\",\n        \"panels\": [\n            {\n                \"title\": \"请求量与成功率\",\n                \"type\": \"timeseries\",\n                \"targets\": [\n                    {\n                        \"expr\": \"rate(agent_requests_total[5m])\",\n                        \"legendFormat\": \"{{agent_type}} - {{status}}\"\n                    }\n                ]\n            },\n            {\n                \"title\": \"端到端延迟 P95\",\n                \"type\": \"gauge\",\n                \"targets\": [\n                    {\n                        \"expr\": \"histogram_quantile(0.95, rate(agent_e2e_latency_seconds_bucket[5m]))\"\n                    }\n                ]\n            },\n            {\n                \"title\": \"Token消耗趋势\",\n                \"type\": \"timeseries\",\n                \"targets\": [\n                    {\n                        \"expr\": \"rate(agent_token_usage_total[5m])\",\n                        \"legendFormat\": \"{{model}} - {{type}}\"\n                    }\n                ]\n            },\n            {\n                \"title\": \"预估成本\",\n                \"type\": \"stat\",\n                \"targets\": [\n                    {\n                        \"expr\": \"sum(increase(agent_cost_usd_total[24h]))\"\n                    }\n                ]\n            },\n            {\n                \"title\": \"推理步骤分布\",\n                \"type\": \"histogram\",\n                \"targets\": [\n                    {\n                        \"expr\": \"agent_step_count_bucket\"\n                    }\n                ]\n            },\n        ]\n    }\n}",
      "section_ref": "15.8.1",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-16",
      "language": "python",
      "description": "",
      "code": "ALERT_RULES = {\n    \"high_latency\": {\n        \"expr\": \"histogram_quantile(0.95, rate(agent_e2e_latency_seconds_bucket[5m])) > 30\",\n        \"for\": \"5m\",\n        \"labels\": {\"severity\": \"warning\"},\n        \"annotations\": {\n            \"summary\": \"Agent端到端延迟过高\",\n            \"description\": \"P95延迟超过30秒\"\n        }\n    },\n    \"high_error_rate\": {\n        \"expr\": 'rate(agent_requests_total{status=\"error\"}[5m]) / rate(agent_requests_total[5m]) > 0.1',\n        \"for\": \"2m\",\n        \"labels\": {\"severity\": \"critical\"},\n        \"annotations\": {\n            \"summary\": \"Agent错误率过高\",\n            \"description\": \"错误率超过10%\"\n        }\n    },\n    \"cost_spike\": {\n        \"expr\": \"increase(agent_cost_usd_total[1h]) > 10\",\n        \"for\": \"1m\",\n        \"labels\": {\"severity\": \"warning\"},\n        \"annotations\": {\n            \"summary\": \"Agent成本异常\",\n            \"description\": \"1小时内成本超过$10\"\n        }\n    },\n    \"token_budget_exceeded\": {\n        \"expr\": \"sum(increase(agent_token_usage_total[24h])) > 10000000\",\n        \"for\": \"5m\",\n        \"labels\": {\"severity\": \"critical\"},\n        \"annotations\": {\n            \"summary\": \"Token消耗超预算\",\n            \"description\": \"24小时Token消耗超过1000万\"\n        }\n    }\n}",
      "section_ref": "15.8.2",
      "runnable": true,
      "dependencies": []
    }
  ],
  "tables": [
    {
      "headers": [
        "支柱",
        "传统系统",
        "Agent系统",
        "特殊挑战"
      ],
      "data": [
        [
          "**日志**",
          "请求/响应记录",
          "LLM I/O、推理链、工具调用",
          "数据量大，需结构化"
        ],
        [
          "**指标**",
          "QPS、延迟、错误率",
          "Token消耗、推理步骤数、工具命中率",
          "成本追踪是新维度"
        ],
        [
          "**追踪**",
          "服务间调用链",
          "多步推理链、并行工具调用",
          "Span嵌套深，持续时间长"
        ]
      ]
    }
  ],
  "key_takeaways": [],
  "common_pitfalls": [],
  "related_chapters": [
    "ch04",
    "ch25",
    "ch38"
  ]
}