{
  "metadata": {
    "id": "ch36",
    "title": "第36章 生产环境架构设计",
    "volume": "vol10",
    "volume_title": "生产级Agent平台",
    "word_count": 3648,
    "difficulty": "advanced",
    "prerequisites": [
      "ch35"
    ],
    "key_concepts": [
      "概述：从原型到生产的跨越",
      "全局架构总览",
      "微服务架构下的 Agent 服务",
      "服务拆分原则",
      "服务间通信",
      "服务治理",
      "API Gateway 设计",
      "Gateway 的核心职责",
      "Gateway 架构选型",
      "限流策略",
      "消息队列集成",
      "消息队列在 Agent 平台中的作用",
      "消息队列选型",
      "事件消息格式设计",
      "消息消费者幂等性"
    ],
    "learning_objectives": [],
    "estimated_tokens": 2189,
    "source_file": "vol10/ch36_生产环境架构设计.md"
  },
  "overview": "",
  "sections": [
    {
      "id": "36.1",
      "title": "36.1 概述：从原型到生产的跨越",
      "level": 2,
      "content": "将一个 Agent 系统从实验室推向生产环境，绝不仅仅是\"多加几台服务器\"这么简单。生产环境意味着你面对的是一个真实、复杂、不可预测的世界：用户流量忽高忽低、上游 LLM 服务偶尔不稳定、数据需要持久化和加密、不同租户之间的数据必须隔离、每次发布都可能引入新的风险。\n\n本章将从架构设计的角度，系统地讨论如何构建一个生产级的 Agent 平台。我们的讨论将围绕以下几个核心问题展开：\n\n1. **服务应该如何拆分？** —— 微服务 vs. 单体 vs. 模块化单体\n2. **请求如何路由？** —— API Gateway 的设计哲学\n3. **组件之间如何通信？** —— 同步调用 vs. 异步消息\n4. **数据如何存储？** —— 多种数据引擎的协作\n5. **多租户如何隔离？** —— 数据隔离与资源隔离的策略\n\n在深入具体设计之前，我们先看一个典型的生产级 Agent 平台的整体架构图。",
      "subsections": [
        {
          "id": "36.1.1",
          "title": "36.1.1 全局架构总览",
          "content": "这个架构图展示了一个完整的生产级 Agent 平台的各个层次。接下来，我们将逐层深入分析。"
        }
      ]
    },
    {
      "id": "36.2",
      "title": "36.2 微服务架构下的 Agent 服务",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "36.2.1",
          "title": "36.2.1 服务拆分原则",
          "content": "Agent 系统的服务拆分需要遵循领域驱动设计（DDD）的原则，同时考虑实际运维的复杂度。过早的微服务化会导致运维成本急剧上升，而过晚的拆分则可能让系统陷入\"大泥球\"。\n\n**推荐策略：先模块化单体，再按需拆分**\n\n\n**服务拆分的具体建议：**\n\n| 服务 | 职责 | 拆分时机 | 数据库策略 |\n|------|------|----------|-----------|\n| Chat Service | 处理用户对话请求 | 启动时即独立 | 独立 Schema |\n| Agent Orchestrator | Agent 编排和任务调度 | QPS > 500 | 独立 Schema |\n| RAG Service | 检索增强生成 | 知识库功能上线时 | 共享向量库 |\n| Tool Execution | 工具调用执行 | 工具数量 > 20 | 独立 Schema |\n| Session Manager | 会话状态管理 | 需要跨服务共享状态 | Redis + PG |\n| Skill Registry | 技能注册与管理 | 技能系统上线时 | 独立 Schema |\n| Knowledge Base | 知识库管理 | RAG 上线时 | 独立 Schema |\n| Model Router | LLM 模型路由 | 多模型支持时 | 配置存储 |"
        },
        {
          "id": "36.2.2",
          "title": "36.2.2 服务间通信",
          "content": "在微服务架构中，服务间通信是最关键的设计决策之一。\n\n**同步通信 vs. 异步通信的选择矩阵：**\n\n\n**具体实践：**\n\n1. **同步调用（HTTP/gRPC）**：适用于需要即时响应的场景\n   - 用户发送消息 → Chat Service → Agent Service → 返回响应\n   - 推荐使用 gRPC 进行内部服务间调用，性能优于 HTTP/JSON\n\n2. **异步消息（MQ）**：适用于不需要即时响应的场景\n   - Agent 执行结果通知\n   - 知识库文档索引构建\n   - 使用统计和分析\n   - 审计日志收集\n\n**gRPC 服务定义示例：**"
        },
        {
          "id": "36.2.3",
          "title": "36.2.3 服务治理",
          "content": "**服务注册与发现配置示例（Consul）：**\n\n\n**Service Mesh 选型建议：**\n\n| 方案 | 适用场景 | 优势 | 劣势 |\n|------|----------|------|------|\n| Istio | 大规模集群（>100 节点） | 功能全面 | 运维复杂 |\n| Linkerd | 中小规模集群 | 轻量、易用 | 功能较少 |\n| 不使用 Service Mesh | 团队 < 10 人 | 简单直接 | 缺少流量治理 |\n\n**建议**：对于 Agent 平台初期的团队规模，推荐先不引入 Service Mesh，通过 API Gateway + 服务注册中心实现基本的流量管理。"
        }
      ]
    },
    {
      "id": "36.3",
      "title": "36.3 API Gateway 设计",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "36.3.1",
          "title": "36.3.1 Gateway 的核心职责",
          "content": "API Gateway 是整个 Agent 平台的统一入口，承担着流量管理、认证授权、限流熔断等关键职责。"
        },
        {
          "id": "36.3.2",
          "title": "36.3.2 Gateway 架构选型",
          "content": "**主流 API Gateway 对比：**\n\n| 方案 | 语言 | 性能 | 可扩展性 | 运维复杂度 |\n|------|------|------|----------|-----------|\n| Kong + Lua | Lua | 高 | 高（插件丰富） | 中 |\n| APISIX | Lua | 极高 | 高（原生 Nginx） | 中 |\n| Tyk | Go | 高 | 中 | 低 |\n| 自研 Gateway | Go/Rust | 最高 | 完全可控 | 高 |\n\n**推荐方案**：APISIX 或自研轻量 Gateway。对于 Agent 平台的特殊需求（如 SSE 流式响应、Token 级别的限流），自研 Gateway 可能是更好的选择。\n\n**自研 Gateway 核心代码示例（Go）：**"
        },
        {
          "id": "36.3.3",
          "title": "36.3.3 限流策略",
          "content": "Agent 平台的限流需要考虑多个维度：\n\n\n**Token 级别限流实现（Python + Redis）：**"
        }
      ]
    },
    {
      "id": "36.4",
      "title": "36.4 消息队列集成",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "36.4.1",
          "title": "36.4.1 消息队列在 Agent 平台中的作用",
          "content": "Agent 系统的许多操作具有天然异步性：文档索引、任务执行结果通知、使用统计分析、审计日志收集等。消息队列将这些操作解耦，提升系统的可靠性和响应速度。"
        },
        {
          "id": "36.4.2",
          "title": "36.4.2 消息队列选型",
          "content": "| 方案 | 吞吐量 | 延迟 | 持久化 | 适用场景 |\n|------|--------|------|--------|----------|\n| Kafka | 极高（百万/秒） | 中 | 强 | 审计日志、使用统计 |\n| RabbitMQ | 高（万/秒） | 低 | 可选 | 任务通知、事件驱动 |\n| Redis Streams | 高 | 极低 | 可选 | 轻量级实时通知 |\n| NATS | 极高 | 极低 | JetStream | 高性能内部通信 |\n\n**推荐组合**：Kafka 用于高吞吐量的数据流（审计、统计），RabbitMQ 用于任务分发和事件通知。"
        },
        {
          "id": "36.4.3",
          "title": "36.4.3 事件消息格式设计",
          "content": "统一的事件消息格式是系统可靠性的基础："
        },
        {
          "id": "36.4.4",
          "title": "36.4.4 消息消费者幂等性",
          "content": "生产环境中，消息可能被重复投递。消费者必须实现幂等性："
        }
      ]
    },
    {
      "id": "36.5",
      "title": "36.5 数据层设计",
      "level": 2,
      "content": "Agent 平台的数据层需要支持多种数据模型：结构化数据（用户、配置）、半结构化数据（会话历史）、向量数据（Embedding）、缓存数据（热数据）以及大对象（文档、模型文件）。",
      "subsections": [
        {
          "id": "36.5.1",
          "title": "36.5.1 多引擎数据架构",
          "content": ""
        },
        {
          "id": "36.5.2",
          "title": "36.5.2 数据库设计",
          "content": "**核心表结构（PostgreSQL）：**"
        },
        {
          "id": "36.5.3",
          "title": "36.5.3 缓存策略",
          "content": ""
        },
        {
          "id": "36.5.4",
          "title": "36.5.4 向量数据库设计",
          "content": ""
        }
      ]
    },
    {
      "id": "36.6",
      "title": "36.6 多租户架构",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "36.6.1",
          "title": "36.6.1 租户隔离策略",
          "content": "多租户是 Agent 平台的必经之路。不同的租户（企业客户）需要数据隔离、配置隔离和资源隔离。\n\n\n**推荐方案：共享数据库 + Schema 隔离 + 资源配额**"
        },
        {
          "id": "36.6.2",
          "title": "36.6.2 租户管理实现",
          "content": ""
        },
        {
          "id": "36.6.3",
          "title": "36.6.3 资源配额执行",
          "content": ""
        }
      ]
    },
    {
      "id": "36.7",
      "title": "36.7 架构演进路线",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "36.7.1",
          "title": "36.7.1 演进阶段",
          "content": "**各阶段关键决策点：**\n\n| 阶段 | QPS | 关键变化 | 基础设施 |\n|------|-----|----------|----------|\n| 阶段一 | 0-1K | 单体部署，共享数据库 | 单台服务器 + RDS |\n| 阶段二 | 1K-10K | 核心服务拆分，Redis 缓存 | K8s 集群（3-5 节点） |\n| 阶段三 | 10K-100K | 完全微服务，消息队列 | K8s 集群（10+ 节点）+ 专线 |\n| 阶段四 | 100K+ | 多区域部署，全球路由 | 多云 + CDN + Anycast |"
        },
        {
          "id": "36.7.2",
          "title": "36.7.2 架构决策记录（ADR）模板",
          "content": "每次重大架构决策都应该留下记录："
        },
        {
          "id": "36.7.3",
          "title": "36.7.3 技术债务管理",
          "content": "生产环境中不可避免会产生技术债务。关键是要有意识地管理它："
        }
      ]
    },
    {
      "id": "36.8",
      "title": "36.8 部署架构示例",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "36.8.1",
          "title": "36.8.1 Kubernetes 部署配置",
          "content": ""
        },
        {
          "id": "36.8.2",
          "title": "36.8.2 基础设施即代码（Terraform）",
          "content": ""
        }
      ]
    },
    {
      "id": "36.9",
      "title": "36.9 本章小结",
      "level": 2,
      "content": "本章从全局视角介绍了生产级 Agent 平台的架构设计，涵盖了以下核心内容：\n\n1. **微服务架构**：采用\"先模块化单体，再按需拆分\"的渐进式策略\n2. **API Gateway**：作为统一入口处理认证、限流、路由等横切关注点\n3. **消息队列**：通过异步解耦提升系统的可靠性和响应速度\n4. **数据层设计**：多引擎协作（PostgreSQL + Redis + 向量库 + 对象存储）\n5. **多租户架构**：通过 Schema 隔离和资源配额实现安全的租户隔离\n6. **架构演进**：从单体到微服务再到多区域部署的清晰路线图\n\n架构设计没有银弹。关键是要根据团队规模、业务阶段和技术能力做出合理的选择，并持续演进。下一章我们将深入讨论可扩展性与高可用的具体实现策略。",
      "subsections": []
    }
  ],
  "code_blocks": [
    {
      "id": "code-1",
      "language": "mermaid",
      "description": "在深入具体设计之前，我们先看一个典型的生产级 Agent 平台的整体架构图。",
      "code": "graph TB\n    subgraph Clients[\"客户端层\"]\n        Web[Web 应用]\n        Mobile[移动端]\n        API_Consumer[第三方 API]\n        CLI[CLI 工具]\n    end\n\n    subgraph Edge[\"边缘层\"]\n        CDN[CDN]\n        WAF[WAF 防火墙]\n        LB[负载均衡器]\n    end\n\n    subgraph Gateway[\"API 网关层\"]\n        AG[API Gateway]\n        Auth[认证服务]\n        RateLimit[限流服务]\n        Route[路由服务]\n    end\n\n    subgraph Core[\"核心服务层\"]\n        ChatSvc[Chat 服务]\n        AgentSvc[Agent 编排服务]\n        ToolSvc[Tool 执行服务]\n        RAGSvc[RAG 检索服务]\n        SessionSvc[Session 管理服务]\n        SkillSvc[Skill 管理服务]\n        KBService[知识库服务]\n    end\n\n    subgraph AI[\"AI 推理层\"]\n        Router[模型路由]\n        Cache[推理缓存]\n        LLM_A[LLM Provider A]\n        LLM_B[LLM Provider B]\n        LLM_C[LLM Provider C]\n        EmbeddingSvc[Embedding 服务]\n    end\n\n    subgraph Data[\"数据层\"]\n        PG[(PostgreSQL)]\n        Redis[(Redis Cluster)]\n        VectorDB[(向量数据库)]\n        ObjectStore[(对象存储)]\n        MQ[消息队列]\n    end\n\n    subgraph Infra[\"基础设施层\"]\n        K8s[Kubernetes]\n        Monitor[监控系统]\n        LogCenter[日志中心]\n    end\n\n    Clients --> Edge\n    Edge --> Gateway\n    Gateway --> Core\n    Core --> AI\n    Core --> Data\n    Core --> MQ\n    AI --> LLM_A & LLM_B & LLM_C\n    AI --> EmbeddingSvc\n    Infra -.-> Core\n    Infra -.-> Gateway\n    Infra -.-> Data",
      "section_ref": "36.1.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-2",
      "language": "text",
      "description": "推荐策略：先模块化单体，再按需拆分",
      "code": "阶段一：模块化单体（0-1K QPS）\n┌─────────────────────────────────────┐\n│           Agent Platform             │\n│  ┌─────────┬──────────┬───────────┐ │\n│  │  Chat   │  Agent   │  Session  │ │\n│  │ Module  │  Module  │  Module   │ │\n│  ├─────────┼──────────┼───────────┤ │\n│  │   RAG   │  Skill   │  Tool     │ │\n│  │ Module  │  Module  │  Module   │ │\n│  └─────────┴──────────┴───────────┘ │\n└─────────────────────────────────────┘\n\n阶段二：核心服务拆分（1K-10K QPS）\n┌──────────┐  ┌──────────┐  ┌──────────┐\n│Chat Svc  │  │Agent Svc │  │RAG Svc   │\n└────┬─────┘  └────┬─────┘  └────┬─────┘\n     │              │              │\n┌────┴──────────────┴──────────────┴────┐\n│         共享模块（Session/Skill/Tool）  │\n└───────────────────────────────────────┘\n\n阶段三：完全微服务化（10K+ QPS）\n每个核心能力独立部署，通过消息队列和 API 通信",
      "section_ref": "36.2.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-3",
      "language": "mermaid",
      "description": "同步通信 vs. 异步通信的选择矩阵：",
      "code": "graph LR\n    A[请求类型] --> B{是否需要即时响应？}\n    B -->|是| C{是否可以容忍失败？}\n    B -->|否| D[异步消息]\n    C -->|否| E[同步 HTTP/gRPC]\n    C -->|是| F{调用链深度 > 3？}\n    F -->|是| D\n    F -->|否| E",
      "section_ref": "36.2.2",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-4",
      "language": "protobuf",
      "description": "gRPC 服务定义示例：",
      "code": "// agent_service.proto\nsyntax = \"proto3\";\npackage agent.v1;\n\nservice AgentService {\n  // 执行 Agent 任务\n  rpc ExecuteTask(ExecuteTaskRequest) returns (stream TaskResponse);\n  \n  // 获取任务状态\n  rpc GetTaskStatus(TaskStatusRequest) returns (TaskStatus);\n  \n  // 取消任务\n  rpc CancelTask(CancelTaskRequest) returns (CancelResponse);\n}\n\nmessage ExecuteTaskRequest {\n  string session_id = 1;\n  string user_message = 2;\n  string agent_type = 3;  // \"chat\", \"rag\", \"tool_call\"\n  map<string, string> context = 4;\n  int32 timeout_seconds = 5;\n}\n\nmessage TaskResponse {\n  string task_id = 1;\n  oneof payload {\n    ChunkResponse chunk = 2;\n    ToolCallResponse tool_call = 3;\n    FinalResponse final = 4;\n    ErrorResponse error = 5;\n  }\n}\n\nmessage FinalResponse {\n  string content = 1;\n  int32 total_tokens = 2;\n  float latency_ms = 3;\n  repeated ToolCallResult tool_results = 4;\n}",
      "section_ref": "36.2.2",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-5",
      "language": "yaml",
      "description": "服务注册与发现配置示例（Consul）：",
      "code": "# consul-agent.hcl\ndatacenter = \"agent-platform\"\ndata_dir = \"/opt/consul/data\"\nserver = true\nbootstrap_expect = 3\n\nservices {\n  id = \"agent-service-1\"\n  name = \"agent-service\"\n  tags = [\"v2.1.0\", \"production\"]\n  port = 50051\n  \n  check {\n    id = \"agent-health\"\n    name = \"Agent Service Health\"\n    http = \"http://localhost:8080/health\"\n    interval = \"10s\"\n    timeout = \"3s\"\n  }\n}",
      "section_ref": "36.2.3",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-6",
      "language": "mermaid",
      "description": "API Gateway 是整个 Agent 平台的统一入口，承担着流量管理、认证授权、限流熔断等关键职责。",
      "code": "graph LR\n    Request[客户端请求] --> GW[API Gateway]\n    GW --> Auth{认证检查}\n    Auth -->|通过| RL{限流检查}\n    Auth -->|失败| Reject401[401 Unauthorized]\n    RL -->|通过| Route{路由匹配}\n    RL -->|超限| Reject429[429 Too Many Requests]\n    Route --> SVC[后端服务]\n    \n    subgraph Gateway 内部\n        Auth\n        RL\n        Route\n    end",
      "section_ref": "36.3.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-7",
      "language": "go",
      "description": "自研 Gateway 核心代码示例（Go）：",
      "code": "// gateway/main.go\npackage main\n\nimport (\n    \"context\"\n    \"io\"\n    \"net/http\"\n    \"strings\"\n    \"sync\"\n    \"time\"\n    \"github.com/redis/go-redis/v9\"\n)\n\ntype AgentGateway struct {\n    router          *http.ServeMux\n    authService     AuthService\n    rateLimiter     *TokenRateLimiter\n    circuitBreakers map[string]*CircuitBreaker\n    config          *GatewayConfig\n}\n\ntype GatewayConfig struct {\n    ListenAddr      string        `yaml:\"listen_addr\"`\n    AuthServiceAddr string        `yaml:\"auth_service_addr\"`\n    Routes          []RouteConfig `yaml:\"routes\"`\n    RateLimits      RateLimitConfig `yaml:\"rate_limits\"`\n}\n\ntype RouteConfig struct {\n    Path        string `yaml:\"path\"`\n    Upstream    string `yaml:\"upstream\"`\n    StripPrefix bool   `yaml:\"strip_prefix\"`\n    Timeout     int    `yaml:\"timeout_ms\"`\n    MaxRetries  int    `yaml:\"max_retries\"`\n}\n\nfunc (gw *AgentGateway) handleChat(w http.ResponseWriter, r *http.Request) {\n    start := time.Now()\n    \n    // 1. 认证\n    userID, _, err := gw.authService.Authenticate(r)\n    if err != nil {\n        http.Error(w, \"Unauthorized\", http.StatusUnauthorized)\n        return\n    }\n    \n    // 2. Token 级别限流\n    if !gw.rateLimiter.Allow(userID, \"chat\") {\n        w.Header().Set(\"Retry-After\", \"60\")\n        http.Error(w, \"Rate limit exceeded\", http.StatusTooManyRequests)\n        return\n    }\n    \n    // 3. 熔断检查\n    cb := gw.circuitBreakers[\"agent-service\"]\n    if !cb.Allow() {\n        http.Error(w, \"Service unavailable\", http.StatusServiceUnavailable)\n        return\n    }\n    \n    // 4. 转发请求（支持 SSE 流式）\n    gw.proxyChatRequest(w, r, userID)\n    \n    _ = time.Since(start).Seconds() // 记录延迟指标\n}\n\n// proxyChatRequest 代理聊天请求，支持 SSE 流式传输\nfunc (gw *AgentGateway) proxyChatRequest(w http.ResponseWriter, r *http.Request, userID string) {\n    upstream := gw.config.Routes[0].Upstream\n    \n    ctx, cancel := context.WithTimeout(r.Context(), 5*time.Minute)\n    defer cancel()\n    \n    proxyReq, _ := http.NewRequestWithContext(ctx, r.Method, upstream+r.URL.Path, r.Body)\n    proxyReq.Header = r.Header.Clone()\n    proxyReq.Header.Set(\"X-User-ID\", userID)\n    \n    resp, err := http.DefaultClient.Do(proxyReq)\n    if err != nil {\n        http.Error(w, \"Upstream error\", http.StatusBadGateway)\n        return\n    }\n    defer resp.Body.Close()\n    \n    contentType := resp.Header.Get(\"Content-Type\")\n    if strings.Contains(contentType, \"text/event-stream\") {\n        w.Header().Set(\"Content-Type\", \"text/event-stream\")\n        w.Header().Set(\"Cache-Control\", \"no-cache\")\n        w.Header().Set(\"Connection\", \"keep-alive\")\n        \n        flusher, _ := w.(http.Flusher)\n        buf := make([]byte, 4096)\n        for {\n            n, err := resp.Body.Read(buf)\n            if n > 0 {\n                w.Write(buf[:n])\n                flusher.Flush()\n            }\n            if err == io.EOF {\n                break\n            }\n            if err != nil {\n                break\n            }\n        }\n    } else {\n        w.Header().Set(\"Content-Type\", contentType)\n        io.Copy(w, resp.Body)\n    }\n}",
      "section_ref": "36.3.2",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-8",
      "language": "yaml",
      "description": "Agent 平台的限流需要考虑多个维度：",
      "code": "# rate_limit_config.yaml\nrate_limits:\n  global:\n    requests_per_second: 10000\n    burst: 15000\n  \n  user:\n    free_tier:\n      requests_per_minute: 20\n      tokens_per_day: 100000\n      max_concurrent: 2\n    pro_tier:\n      requests_per_minute: 100\n      tokens_per_day: 2000000\n      max_concurrent: 10\n    enterprise_tier:\n      requests_per_minute: 500\n      tokens_per_day: -1  # 无限制\n      max_concurrent: 50\n  \n  api:\n    /api/v1/chat:\n      requests_per_minute: 60\n      token_cost_multiplier: 1.0\n    /api/v1/agent/execute:\n      requests_per_minute: 30\n      token_cost_multiplier: 1.5\n    /api/v1/rag/search:\n      requests_per_minute: 120\n      token_cost_multiplier: 0.3",
      "section_ref": "36.3.3",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-9",
      "language": "python",
      "description": "Token 级别限流实现（Python + Redis）：",
      "code": "# rate_limiter.py\nimport time\nimport redis\n\nclass TokenAwareRateLimiter:\n    \"\"\"基于 Token 消耗的限流器\"\"\"\n    \n    def __init__(self, redis_client: redis.Redis):\n        self.redis = redis_client\n        self._lua_script = \"\"\"\n        local key = KEYS[1]\n        local limit = tonumber(ARGV[1])\n        local window = tonumber(ARGV[2])\n        local cost = tonumber(ARGV[3])\n        local now = tonumber(ARGV[4])\n        \n        local window_start = now - window\n        redis.call('ZREMRANGEBYSCORE', key, 0, window_start)\n        local current = tonumber(redis.call('ZCARD', key))\n        \n        if current + cost <= limit then\n            redis.call('ZADD', key, now, now .. ':' .. cost)\n            redis.call('EXPIRE', key, window)\n            return {1, limit - current - cost, now + window}\n        else\n            local oldest = tonumber(\n                redis.call('ZRANGE', key, 0, 0, 'WITHSCORES')[2])\n            local retry_after = oldest + window - now\n            return {0, 0, now + window, retry_after}\n        end\n        \"\"\"\n    \n    def check(self, user_id: str, token_cost: int = 1,\n              limit_type: str = \"minute\") -> dict:\n        \"\"\"检查请求是否被允许\"\"\"\n        configs = {\n            \"minute\": (f\"rl:{user_id}:minute\", 60),\n            \"day\":    (f\"rl:{user_id}:day\", 86400),\n        }\n        key, window = configs.get(limit_type, configs[\"minute\"])\n        \n        # 根据用户等级获取不同限额\n        limit = self._get_user_limit(user_id, limit_type)\n        \n        result = self.redis.eval(\n            self._lua_script, 1, key, limit, window,\n            token_cost, time.time()\n        )\n        return {\n            \"allowed\": bool(result[0]),\n            \"remaining\": int(result[1]),\n            \"reset_at\": float(result[2]),\n            \"retry_after\": float(result[3]) if len(result) > 3 else None\n        }\n    \n    def _get_user_limit(self, user_id: str, limit_type: str) -> int:\n        \"\"\"获取用户限额（从配置或数据库中读取）\"\"\"\n        tier = self._get_user_tier(user_id)\n        limits = {\n            \"free\":       {\"minute\": 20, \"day\": 100000},\n            \"pro\":        {\"minute\": 100, \"day\": 2000000},\n            \"enterprise\": {\"minute\": 500, \"day\": 999999999},\n        }\n        return limits.get(tier, limits[\"free\"]).get(limit_type, 60)",
      "section_ref": "36.3.3",
      "runnable": true,
      "dependencies": [
        "redis"
      ]
    },
    {
      "id": "code-10",
      "language": "mermaid",
      "description": "Agent 系统的许多操作具有天然异步性：文档索引、任务执行结果通知、使用统计分析、审计日志收集等。消息队列将这些操作解耦，提升系统的可靠性和响应速度。",
      "code": "graph TB\n    subgraph Producers[\"生产者\"]\n        ChatSvc[Chat Service]\n        AgentSvc[Agent Service]\n        RAGSvc[RAG Service]\n    end\n    \n    subgraph MQ[\"消息队列\"]\n        T1[topic: task.result]\n        T2[topic: document.index]\n        T3[topic: usage.stats]\n        T4[topic: audit.log]\n        T5[topic: notification]\n    end\n    \n    subgraph Consumers[\"消费者\"]\n        NotifSvc[通知服务]\n        Analytics[分析服务]\n        IndexSvc[索引服务]\n        AuditSvc[审计服务]\n    end\n    \n    ChatSvc --> T1 & T3 & T5\n    AgentSvc --> T1 & T3 & T4\n    RAGSvc --> T2 & T3\n    \n    T1 --> NotifSvc & Analytics\n    T2 --> IndexSvc\n    T3 --> Analytics\n    T4 --> AuditSvc\n    T5 --> NotifSvc",
      "section_ref": "36.4.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-11",
      "language": "json",
      "description": "统一的事件消息格式是系统可靠性的基础：",
      "code": "{\n  \"event_id\": \"evt_8f7d2a1b3c4d\",\n  \"event_type\": \"agent.task.completed\",\n  \"event_version\": \"1.0\",\n  \"timestamp\": \"2026-04-01T00:00:00.000Z\",\n  \"source\": \"agent-service\",\n  \"correlation_id\": \"corr_5e6f7a8b9c0d\",\n  \"data\": {\n    \"task_id\": \"task_abc123\",\n    \"session_id\": \"sess_def456\",\n    \"user_id\": \"user_ghi789\",\n    \"agent_type\": \"rag_agent\",\n    \"status\": \"completed\",\n    \"result\": {\n      \"response\": \"根据检索结果...\",\n      \"sources\": [\"doc_1\", \"doc_3\"],\n      \"tokens_used\": 1523,\n      \"latency_ms\": 2340\n    }\n  },\n  \"metadata\": {\n    \"schema_version\": \"1.0\",\n    \"partition_key\": \"user_ghi789\",\n    \"retry_count\": 0\n  }\n}",
      "section_ref": "36.4.3",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-12",
      "language": "python",
      "description": "生产环境中，消息可能被重复投递。消费者必须实现幂等性：",
      "code": "import hashlib\nimport json\nimport redis\nfrom datetime import datetime\nfrom typing import Any, Callable\n\nclass IdempotentConsumer:\n    \"\"\"幂等消息消费者\"\"\"\n    \n    def __init__(self, redis_client: redis.Redis, handler: Callable):\n        self.redis = redis_client\n        self.handler = handler\n    \n    def consume(self, message: dict) -> dict:\n        event_id = message[\"event_id\"]\n        msg_hash = self._compute_hash(message)\n        \n        # 检查是否已处理\n        if self._is_processed(event_id, msg_hash):\n            return {\"status\": \"duplicate\", \"event_id\": event_id}\n        \n        # 获取处理锁（防止并发重复处理）\n        lock_key = f\"lock:msg:{event_id}\"\n        if not self.redis.set(lock_key, \"1\", nx=True, ex=300):\n            return {\"status\": \"locked\", \"event_id\": event_id}\n        \n        try:\n            # Double check\n            if self._is_processed(event_id, msg_hash):\n                return {\"status\": \"duplicate\", \"event_id\": event_id}\n            \n            result = self.handler(message)\n            self._mark_processed(event_id, msg_hash, result)\n            return {\"status\": \"processed\", \"result\": result}\n        except Exception as e:\n            return {\"status\": \"error\", \"error\": str(e)}\n        finally:\n            self.redis.delete(lock_key)\n    \n    def _compute_hash(self, message: dict) -> str:\n        canonical = json.dumps(message[\"data\"], sort_keys=True)\n        return hashlib.sha256(canonical.encode()).hexdigest()[:16]\n    \n    def _is_processed(self, event_id: str, msg_hash: str) -> bool:\n        key = f\"processed:msg:{event_id}\"\n        stored = self.redis.hget(key, \"hash\")\n        return stored == msg_hash.encode() if stored else False\n    \n    def _mark_processed(self, event_id, msg_hash, result):\n        key = f\"processed:msg:{event_id}\"\n        self.redis.hset(key, mapping={\n            \"hash\": msg_hash,\n            \"processed_at\": datetime.utcnow().isoformat(),\n            \"result_summary\": str(result)[:200]\n        })\n        self.redis.expire(key, 86400 * 7)",
      "section_ref": "36.4.4",
      "runnable": true,
      "dependencies": [
        "redis"
      ]
    },
    {
      "id": "code-13",
      "language": "mermaid",
      "description": "Agent 平台的数据层需要支持多种数据模型：结构化数据（用户、配置）、半结构化数据（会话历史）、向量数据（Embedding）、缓存数据（热数据）以及大对象（文档、模型文件）。",
      "code": "graph TB\n    subgraph Services[\"服务层\"]\n        S1[Chat Service]\n        S2[Agent Service]\n        S3[RAG Service]\n    end\n    \n    subgraph DataLayer[\"数据层\"]\n        subgraph Cache[\"缓存层\"]\n            R1[(Redis Cluster<br/>会话/限流)]\n            R2[(本地缓存<br/>热点配置)]\n        end\n        subgraph Relational[\"关系型\"]\n            PG[(PostgreSQL<br/>用户/配置/审计)]\n            PGR[(只读副本<br/>报表查询)]\n        end\n        subgraph Vector[\"向量存储\"]\n            VDB[(Milvus/Qdrant<br/>Embedding)]\n        end\n        subgraph Document[\"文档存储\"]\n            ES[(Elasticsearch<br/>全文检索)]\n            S3Store[(MinIO/S3<br/>原始文档)]\n        end\n    end\n    \n    S1 --> R1 & PG\n    S2 --> R1 & PG & VDB\n    S3 --> VDB & ES & S3Store\n    PG <--> PGR",
      "section_ref": "36.5.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-14",
      "language": "sql",
      "description": "核心表结构（PostgreSQL）：",
      "code": "-- 用户表\nCREATE TABLE users (\n    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),\n    email VARCHAR(255) UNIQUE NOT NULL,\n    display_name VARCHAR(100),\n    tier VARCHAR(20) DEFAULT 'free'\n        CHECK (tier IN ('free', 'pro', 'enterprise')),\n    created_at TIMESTAMPTZ DEFAULT NOW(),\n    updated_at TIMESTAMPTZ DEFAULT NOW(),\n    last_login_at TIMESTAMPTZ,\n    is_active BOOLEAN DEFAULT true\n);\n\n-- 租户表（多租户支持）\nCREATE TABLE tenants (\n    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),\n    name VARCHAR(100) NOT NULL,\n    slug VARCHAR(50) UNIQUE NOT NULL,\n    plan VARCHAR(20) DEFAULT 'starter',\n    config JSONB DEFAULT '{}',\n    rate_limit_config JSONB DEFAULT '{}',\n    created_at TIMESTAMPTZ DEFAULT NOW()\n);\n\n-- 会话表\nCREATE TABLE sessions (\n    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),\n    user_id UUID NOT NULL REFERENCES users(id),\n    tenant_id UUID REFERENCES tenants(id),\n    title VARCHAR(500),\n    agent_type VARCHAR(50) DEFAULT 'chat',\n    metadata JSONB DEFAULT '{}',\n    created_at TIMESTAMPTZ DEFAULT NOW(),\n    updated_at TIMESTAMPTZ DEFAULT NOW(),\n    is_archived BOOLEAN DEFAULT false\n);\n\n-- 消息表（按时间范围分区）\nCREATE TABLE messages (\n    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),\n    session_id UUID NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,\n    role VARCHAR(20) NOT NULL\n        CHECK (role IN ('user', 'assistant', 'system', 'tool')),\n    content TEXT NOT NULL,\n    token_count INTEGER DEFAULT 0,\n    model VARCHAR(50),\n    metadata JSONB DEFAULT '{}',\n    created_at TIMESTAMPTZ DEFAULT NOW(),\n    is_starred BOOLEAN DEFAULT false\n) PARTITION BY RANGE (created_at);\n\n-- 按月分区\nCREATE TABLE messages_2026_04 PARTITION OF messages\n    FOR VALUES FROM ('2026-04-01') TO ('2026-05-01');\nCREATE TABLE messages_2026_05 PARTITION OF messages\n    FOR VALUES FROM ('2026-05-01') TO ('2026-06-01');\n\n-- 关键索引\nCREATE INDEX idx_messages_session\n    ON messages (session_id, created_at DESC);\nCREATE INDEX idx_sessions_user\n    ON sessions (user_id, updated_at DESC);\n\n-- Token 使用统计表（分区）\nCREATE TABLE token_usage (\n    id BIGSERIAL PRIMARY KEY,\n    user_id UUID NOT NULL REFERENCES users(id),\n    tenant_id UUID REFERENCES tenants(id),\n    session_id UUID REFERENCES sessions(id),\n    model VARCHAR(50) NOT NULL,\n    prompt_tokens INTEGER NOT NULL,\n    completion_tokens INTEGER NOT NULL,\n    total_tokens INTEGER NOT NULL,\n    cost_usd DECIMAL(10, 6),\n    created_at TIMESTAMPTZ DEFAULT NOW()\n) PARTITION BY RANGE (created_at);\n\nCREATE INDEX idx_token_usage_user_date\n    ON token_usage (user_id, created_at);\nCREATE INDEX idx_token_usage_tenant_date\n    ON token_usage (tenant_id, created_at);",
      "section_ref": "36.5.2",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-15",
      "language": "python",
      "description": "",
      "code": "# cache_manager.py\nimport json\nimport hashlib\nimport redis\nfrom dataclasses import dataclass\nfrom functools import wraps\nfrom typing import Optional\n\n@dataclass\nclass CacheConfig:\n    session_ttl: int = 3600          # 会话缓存 1小时\n    context_ttl: int = 1800          # 上下文缓存 30分钟\n    embedding_ttl: int = 86400 * 7   # Embedding 缓存 7天\n    config_ttl: int = 300            # 配置缓存 5分钟\n    tool_result_ttl: int = 600       # 工具结果缓存 10分钟\n\nclass AgentCacheManager:\n    \"\"\"Agent 平台统一缓存管理\"\"\"\n    \n    def __init__(self, redis_client: redis.Redis, config: CacheConfig):\n        self.redis = redis_client\n        self.config = config\n    \n    def get_session_context(self, session_id: str) -> Optional[dict]:\n        \"\"\"获取会话上下文\"\"\"\n        key = f\"session:ctx:{session_id}\"\n        data = self.redis.get(key)\n        return json.loads(data) if data else None\n    \n    def set_session_context(self, session_id: str, context: dict):\n        \"\"\"设置会话上下文\"\"\"\n        key = f\"session:ctx:{session_id}\"\n        self.redis.setex(\n            key, self.config.session_ttl, json.dumps(context)\n        )\n    \n    def get_embedding_cache(self, text: str) -> Optional[list]:\n        \"\"\"获取 Embedding 缓存\"\"\"\n        h = hashlib.md5(text.encode()).hexdigest()[:16]\n        data = self.redis.get(f\"embedding:{h}\")\n        return json.loads(data) if data else None\n    \n    def set_embedding_cache(self, text: str, embedding: list):\n        \"\"\"设置 Embedding 缓存\"\"\"\n        h = hashlib.md5(text.encode()).hexdigest()[:16]\n        self.redis.setex(\n            f\"embedding:{h}\",\n            self.config.embedding_ttl,\n            json.dumps(embedding)\n        )\n    \n    def cached(self, key_prefix: str, ttl: Optional[int] = None):\n        \"\"\"缓存装饰器\"\"\"\n        def decorator(func):\n            @wraps(func)\n            def wrapper(*args, **kwargs):\n                arg_str = json.dumps(args, default=str)\n                kwarg_str = json.dumps(kwargs, sort_keys=True, default=str)\n                raw = f\"{key_prefix}:{func.__name__}:{arg_str}:{kwarg_str}\"\n                cache_key = f\"cache:{hashlib.md5(raw.encode()).hexdigest()[:24]}\"\n                \n                cached = self.redis.get(cache_key)\n                if cached:\n                    return json.loads(cached)\n                \n                result = func(*args, **kwargs)\n                if result is not None:\n                    self.redis.setex(\n                        cache_key, ttl or self.config.config_ttl,\n                        json.dumps(result)\n                    )\n                return result\n            return wrapper\n        return decorator",
      "section_ref": "36.5.3",
      "runnable": true,
      "dependencies": [
        "redis"
      ]
    },
    {
      "id": "code-16",
      "language": "python",
      "description": "",
      "code": "# vector_store.py\nfrom typing import List, Optional\n\n@dataclass\nclass VectorSearchResult:\n    id: str\n    score: float\n    metadata: dict\n    content: str\n\nclass VectorStoreManager:\n    \"\"\"向量数据库管理（基于 Milvus）\"\"\"\n    \n    def create_collection(self, name: str, dim: int = 1536):\n        \"\"\"创建向量集合\"\"\"\n        from pymilvus import (\n            CollectionSchema, FieldSchema, DataType, Collection\n        )\n        fields = [\n            FieldSchema(name=\"id\", dtype=DataType.VARCHAR,\n                        is_primary=True, max_length=64),\n            FieldSchema(name=\"embedding\", dtype=DataType.FLOAT_VECTOR, dim=dim),\n            FieldSchema(name=\"content\", dtype=DataType.VARCHAR, max_length=65535),\n            FieldSchema(name=\"metadata\", dtype=DataType.JSON),\n            FieldSchema(name=\"tenant_id\", dtype=DataType.VARCHAR, max_length=64),\n        ]\n        schema = CollectionSchema(fields=fields, description=name)\n        collection = Collection(name=name, schema=schema)\n        \n        # IVF_FLAT 索引适合中等规模\n        collection.create_index(\n            field_name=\"embedding\",\n            index_params={\n                \"index_type\": \"IVF_FLAT\",\n                \"metric_type\": \"COSINE\",\n                \"params\": {\"nlist\": 1024}\n            }\n        )\n        return collection\n    \n    def hybrid_search(self, query_embedding: List[float],\n                      tenant_id: str, top_k: int = 10,\n                      vector_weight: float = 0.7,\n                      keyword: Optional[str] = None\n                      ) -> List[VectorSearchResult]:\n        \"\"\"混合检索：向量相似度 + 关键词匹配（RRF 融合）\"\"\"\n        vector_results = self._vector_search(\n            query_embedding, tenant_id, top_k * 2\n        )\n        keyword_results = (\n            self._keyword_search(keyword, tenant_id, top_k * 2)\n            if keyword else []\n        )\n        return self._reciprocal_rank_fusion(\n            vector_results, keyword_results, top_k, vector_weight\n        )\n    \n    def _reciprocal_rank_fusion(self, vector_results, keyword_results,\n                                 top_k, vector_weight):\n        \"\"\"RRF 结果融合\"\"\"\n        scores = {}\n        k = 60\n        for rank, r in enumerate(vector_results):\n            scores.setdefault(r.id, 0)\n            scores[r.id] += vector_weight / (k + rank + 1)\n        for rank, r in enumerate(keyword_results):\n            scores.setdefault(r.id, 0)\n            scores[r.id] += (1 - vector_weight) / (k + rank + 1)\n        \n        all_results = {r.id: r for r in vector_results + keyword_results}\n        sorted_ids = sorted(scores, key=scores.get, reverse=True)[:top_k]\n        return [all_results[i] for i in sorted_ids if i in all_results]",
      "section_ref": "36.5.4",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-17",
      "language": "text",
      "description": "多租户是 Agent 平台的必经之路。不同的租户（企业客户）需要数据隔离、配置隔离和资源隔离。",
      "code": "隔离级别光谱：\n最小隔离 ←————————————————————————→ 最大隔离\n\n共享数据库    共享 Schema    独立 Schema    独立实例\n共享表        行级隔离       命名空间       物理隔离\n成本最低 ←————————————————————————→ 成本最高\n运维最简 ←————————————————————————→ 运维最复杂",
      "section_ref": "36.6.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-18",
      "language": "python",
      "description": "推荐方案：共享数据库 + Schema 隔离 + 资源配额",
      "code": "# tenant_manager.py\nfrom dataclasses import dataclass\nfrom typing import Optional\nfrom enum import Enum\n\nclass TenantPlan(Enum):\n    STARTER = \"starter\"\n    PROFESSIONAL = \"professional\"\n    ENTERPRISE = \"enterprise\"\n\nPLAN_LIMITS = {\n    TenantPlan.STARTER: {\n        \"max_sessions\": 100,\n        \"max_tokens_per_day\": 500_000,\n        \"max_documents\": 1_000,\n        \"max_concurrent\": 3,\n        \"allowed_models\": [\"gpt-4o-mini\", \"claude-3-haiku\"],\n    },\n    TenantPlan.PROFESSIONAL: {\n        \"max_sessions\": 10_000,\n        \"max_tokens_per_day\": 10_000_000,\n        \"max_documents\": 100_000,\n        \"max_concurrent\": 20,\n        \"allowed_models\": [\"gpt-4o\", \"claude-3-sonnet\", \"claude-3-haiku\"],\n    },\n    TenantPlan.ENTERPRISE: {\n        \"max_sessions\": -1,       # 无限制\n        \"max_tokens_per_day\": -1,\n        \"max_documents\": -1,\n        \"max_concurrent\": 100,\n        \"allowed_models\": [\"*\"],   # 全部模型\n    },\n}\n\nclass TenantManager:\n    \"\"\"多租户管理器\"\"\"\n    \n    def check_quota(self, tenant_id: str, resource: str) -> bool:\n        \"\"\"检查租户配额\"\"\"\n        plan = self._get_tenant_plan(tenant_id)\n        limits = PLAN_LIMITS[plan]\n        limit = limits[resource]\n        if limit == -1:\n            return True  # 无限制\n        usage = self._get_current_usage(tenant_id, resource)\n        return usage < limit\n    \n    def enforce_tenant_isolation(self, query, tenant_id: str):\n        \"\"\"在查询中强制加入租户隔离条件\"\"\"\n        # 所有涉及租户数据的查询都必须加入 tenant_id 过滤\n        if hasattr(query, 'where'):\n            query = query.where(tenant_id=tenant_id)\n        return query",
      "section_ref": "36.6.2",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-19",
      "language": "yaml",
      "description": "",
      "code": "# tenant_resource_quota.yaml\ntenant_quotas:\n  - tenant_id: \"tenant_acme\"\n    plan: \"professional\"\n    resources:\n      sessions:\n        limit: 10000\n        current: 4523\n        alert_threshold: 0.8  # 80%时告警\n      tokens_daily:\n        limit: 10000000\n        current: 6500000\n        alert_threshold: 0.9\n      concurrent_requests:\n        limit: 20\n        current: 8\n      knowledge_base:\n        documents_limit: 100000\n        storage_limit_gb: 50\n        current_documents: 23456\n        current_storage_gb: 12.3",
      "section_ref": "36.6.3",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-20",
      "language": "mermaid",
      "description": "",
      "code": "graph LR\n    A[阶段一<br/>单体应用<br/>0-1K QPS] --> B[阶段二<br/>模块化拆分<br/>1K-10K QPS]\n    B --> C[阶段三<br/>微服务化<br/>10K-100K QPS]\n    C --> D[阶段四<br/>多区域部署<br/>100K+ QPS]\n    \n    style A fill:#e1f5fe\n    style B fill:#fff3e0\n    style C fill:#fce4ec\n    style D fill:#e8f5e9",
      "section_ref": "36.7.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-21",
      "language": "markdown",
      "description": "每次重大架构决策都应该留下记录：",
      "code": "# ADR-001: 选择 gRPC 作为内部服务间通信协议\n\n## 状态\n已采纳\n\n## 背景\nAgent 平台的服务间通信需要高性能、强类型的协议。\n\n## 决策\n使用 gRPC + Protocol Buffers 作为内部服务间通信协议。\n\n## 理由\n1. 性能：比 HTTP/JSON 快 5-10 倍（Protobuf 序列化）\n2. 流式支持：原生支持 Server Streaming（适配 SSE 响应）\n3. 强类型：Protobuf 提供编译时类型检查\n4. 代码生成：自动生成多语言客户端代码\n\n## 后果\n- 正面：性能提升、类型安全、开发效率\n- 负面：调试比 REST 困难、浏览器不支持 gRPC\n- 缓解：保留 HTTP/REST 用于外部 API\n\n## 决策人\n@架构组 2026-03-15",
      "section_ref": "36.7.2",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-22",
      "language": "markdown",
      "description": "生产环境中不可避免会产生技术债务。关键是要有意识地管理它：",
      "code": "# 技术债务看板\n\n## 高优先级\n- [ ] Agent Service 的同步调用链过长（最大深度 7）→ 引入异步编排\n- [ ] 消息表缺少分区策略 → 按月分区\n\n## 中优先级\n- [ ] 缓存失效策略不一致 → 统一 CacheManager\n- [ ] 部分服务缺少健康检查端点 → 统一健康检查框架\n\n## 低优先级\n- [ ] API 版本管理不够规范 → 制定版本策略\n- [ ] 日志格式不完全统一 → 引入结构化日志标准",
      "section_ref": "36.7.3",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-23",
      "language": "yaml",
      "description": "",
      "code": "# k8s/agent-platform-deployment.yaml\napiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: agent-service\n  namespace: agent-platform\n  labels:\n    app: agent-service\n    version: v2.1.0\nspec:\n  replicas: 3\n  strategy:\n    type: RollingUpdate\n    rollingUpdate:\n      maxSurge: 1\n      maxUnavailable: 0  # 零停机\n  selector:\n    matchLabels:\n      app: agent-service\n  template:\n    metadata:\n      labels:\n        app: agent-service\n        version: v2.1.0\n      annotations:\n        prometheus.io/scrape: \"true\"\n        prometheus.io/port: \"9090\"\n    spec:\n      containers:\n      - name: agent-service\n        image: registry.example.com/agent-service:v2.1.0\n        ports:\n        - containerPort: 8080\n          name: http\n        - containerPort: 9090\n          name: metrics\n        resources:\n          requests:\n            cpu: \"500m\"\n            memory: \"512Mi\"\n          limits:\n            cpu: \"2000m\"\n            memory: \"2Gi\"\n        env:\n        - name: DATABASE_URL\n          valueFrom:\n            secretKeyRef:\n              name: agent-secrets\n              key: database-url\n        - name: REDIS_URL\n          valueFrom:\n            secretKeyRef:\n              name: agent-secrets\n              key: redis-url\n        - name: JWT_SECRET\n          valueFrom:\n            secretKeyRef:\n              name: agent-secrets\n              key: jwt-secret\n        livenessProbe:\n          httpGet:\n            path: /health/live\n            port: 8080\n          initialDelaySeconds: 30\n          periodSeconds: 10\n        readinessProbe:\n          httpGet:\n            path: /health/ready\n            port: 8080\n          initialDelaySeconds: 5\n          periodSeconds: 5\n        volumeMounts:\n        - name: config\n          mountPath: /app/config\n      volumes:\n      - name: config\n        configMap:\n          name: agent-config\n---\napiVersion: v1\nkind: Service\nmetadata:\n  name: agent-service\n  namespace: agent-platform\nspec:\n  selector:\n    app: agent-service\n  ports:\n  - port: 80\n    targetPort: 8080\n  type: ClusterIP",
      "section_ref": "36.8.1",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-24",
      "language": "hcl",
      "description": "",
      "code": "# terraform/main.tf\nterraform {\n  required_version = \">= 1.5\"\n  required_providers {\n    kubernetes = {\n      source  = \"hashicorp/kubernetes\"\n      version = \"~> 2.23\"\n    }\n    rediscloud = {\n      source  = \"rediscloud/rediscloud\"\n      version = \"~> 0.13\"\n    }\n  }\n}\n\n# Redis Cluster 配置\nresource \"rediscloud_subscription\" \"agent_platform\" {\n  name            = \"agent-platform-cache\"\n  payment_method  = \"credit-card\"\n  memory_storage  = \"ram\"\n  redis_version   = \"7.2\"\n  \n  plan {\n    memory_limit_in_gb = 10\n    quantity           = 3  # 3节点集群\n    throughput_measurement {\n      by = \"operations-per-second\"\n      value = 50000\n    }\n  }\n}\n\n# PostgreSQL 数据库\nresource \"aws_db_instance\" \"agent_platform\" {\n  identifier     = \"agent-platform-db\"\n  engine         = \"postgres\"\n  engine_version = \"16.1\"\n  instance_class = \"db.r6g.xlarge\"\n  \n  allocated_storage     = 500\n  max_allocated_storage = 1000\n  storage_type          = \"gp3\"\n  \n  multi_az               = true\n  db_subnet_group_name   = aws_db_subnet_group.agent.name\n  vpc_security_group_ids = [aws_security_group.db.id]\n  \n  backup_retention_period = 30\n  backup_window          = \"03:00-04:00\"\n  maintenance_window     = \"Mon:04:00-Mon:05:00\"\n  \n  skip_final_snapshot = false\n  final_snapshot_identifier = \"agent-platform-final\"\n  \n  tags = {\n    Project = \"agent-platform\"\n    Env     = \"production\"\n  }\n}",
      "section_ref": "36.8.2",
      "runnable": false,
      "dependencies": []
    }
  ],
  "tables": [
    {
      "headers": [
        "服务",
        "职责",
        "拆分时机",
        "数据库策略"
      ],
      "data": [
        [
          "Chat Service",
          "处理用户对话请求",
          "启动时即独立",
          "独立 Schema"
        ],
        [
          "Agent Orchestrator",
          "Agent 编排和任务调度",
          "QPS > 500",
          "独立 Schema"
        ],
        [
          "RAG Service",
          "检索增强生成",
          "知识库功能上线时",
          "共享向量库"
        ],
        [
          "Tool Execution",
          "工具调用执行",
          "工具数量 > 20",
          "独立 Schema"
        ],
        [
          "Session Manager",
          "会话状态管理",
          "需要跨服务共享状态",
          "Redis + PG"
        ],
        [
          "Skill Registry",
          "技能注册与管理",
          "技能系统上线时",
          "独立 Schema"
        ],
        [
          "Knowledge Base",
          "知识库管理",
          "RAG 上线时",
          "独立 Schema"
        ],
        [
          "Model Router",
          "LLM 模型路由",
          "多模型支持时",
          "配置存储"
        ]
      ]
    },
    {
      "headers": [
        "方案",
        "适用场景",
        "优势",
        "劣势"
      ],
      "data": [
        [
          "Istio",
          "大规模集群（>100 节点）",
          "功能全面",
          "运维复杂"
        ],
        [
          "Linkerd",
          "中小规模集群",
          "轻量、易用",
          "功能较少"
        ],
        [
          "不使用 Service Mesh",
          "团队 < 10 人",
          "简单直接",
          "缺少流量治理"
        ]
      ]
    },
    {
      "headers": [
        "方案",
        "语言",
        "性能",
        "可扩展性",
        "运维复杂度"
      ],
      "data": [
        [
          "Kong + Lua",
          "Lua",
          "高",
          "高（插件丰富）",
          "中"
        ],
        [
          "APISIX",
          "Lua",
          "极高",
          "高（原生 Nginx）",
          "中"
        ],
        [
          "Tyk",
          "Go",
          "高",
          "中",
          "低"
        ],
        [
          "自研 Gateway",
          "Go/Rust",
          "最高",
          "完全可控",
          "高"
        ]
      ]
    },
    {
      "headers": [
        "方案",
        "吞吐量",
        "延迟",
        "持久化",
        "适用场景"
      ],
      "data": [
        [
          "Kafka",
          "极高（百万/秒）",
          "中",
          "强",
          "审计日志、使用统计"
        ],
        [
          "RabbitMQ",
          "高（万/秒）",
          "低",
          "可选",
          "任务通知、事件驱动"
        ],
        [
          "Redis Streams",
          "高",
          "极低",
          "可选",
          "轻量级实时通知"
        ],
        [
          "NATS",
          "极高",
          "极低",
          "JetStream",
          "高性能内部通信"
        ]
      ]
    },
    {
      "headers": [
        "阶段",
        "QPS",
        "关键变化",
        "基础设施"
      ],
      "data": [
        [
          "阶段一",
          "0-1K",
          "单体部署，共享数据库",
          "单台服务器 + RDS"
        ],
        [
          "阶段二",
          "1K-10K",
          "核心服务拆分，Redis 缓存",
          "K8s 集群（3-5 节点）"
        ],
        [
          "阶段三",
          "10K-100K",
          "完全微服务，消息队列",
          "K8s 集群（10+ 节点）+ 专线"
        ],
        [
          "阶段四",
          "100K+",
          "多区域部署，全球路由",
          "多云 + CDN + Anycast"
        ]
      ]
    }
  ],
  "key_takeaways": [],
  "common_pitfalls": [],
  "related_chapters": [
    "ch35",
    "ch37",
    "ch38",
    "ch39",
    "ch40",
    "ch42",
    "ch44"
  ]
}