{
  "metadata": {
    "id": "appendixB",
    "title": "附录B：常用工具与API速查",
    "volume": "vol6",
    "volume_title": "附录",
    "word_count": 4633,
    "difficulty": "beginner",
    "prerequisites": [],
    "key_concepts": [
      "B.1 大语言模型API",
      "B.1.1 OpenAI API",
      "B.1.2 Anthropic API (Claude)",
      "B.1.3 本地模型推理",
      "B.2 向量数据库",
      "B.2.1 选择指南",
      "B.2.2 Chroma — 嵌入式首选",
      "B.2.3 Pinecone — 托管服务首选",
      "B.2.4 Milvus — 超大规模首选",
      "B.2.5 Qdrant — 高性能过滤首选",
      "B.3 嵌入模型",
      "B.3.1 嵌入模型对比",
      "B.3.2 OpenAI嵌入 — 快速使用",
      "B.4 搜索引擎API",
      "B.4.1 Tavily — AI优化的搜索API"
    ],
    "learning_objectives": [],
    "estimated_tokens": 2780,
    "source_file": "vol6/appendixB_常用工具与API速查.md"
  },
  "overview": "",
  "sections": [
    {
      "id": "B.1 大语言模型API",
      "title": "B.1 大语言模型API",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.1.1 OpenAI API",
          "title": "B.1.1 OpenAI API",
          "content": "**基础信息**\n\n| 项目 | 说明 |\n|------|------|\n| Base URL | `https://api.openai.com/v1` |\n| 认证方式 | Bearer Token (`Authorization: Bearer sk-...`) |\n| SDK | `pip install openai` |\n| 支持模型 | GPT-4o, GPT-4o-mini, o1, o1-mini, o3-mini, DALL-E 3, Whisper, TTS |\n\n**Chat Completion — 最小示例**\n\n\n**流式输出**\n\n\n**函数调用（Tool Use）**\n\n\n**关键参数速查**\n\n| 参数 | 类型 | 说明 | 推荐值 |\n|------|------|------|--------|\n| `model` | string | 模型名称 | `gpt-4o`（通用）, `gpt-4o-mini`（低成本） |\n| `temperature` | float | 随机性 0-2 | 0（事实性）, 0.7（创造性）, 1.0（多样性） |\n| `max_tokens` | integer | 最大输出token | 视需求，通常500-4000 |\n| `top_p` | float | 核采样 | 1.0（不限制）, 0.9（常用） |\n| `frequency_penalty` | float | 频率惩罚 -2到2 | 0（默认） |\n| `presence_penalty` | float | 存在惩罚 -2到2 | 0（默认） |\n| `stop` | list/string | 停止序列 | `[\"\\n\"]`, `\"<|im_end|>\"] |\n| `response_format` | object | 输出格式 | `{\"type\": \"json_object\"}` |\n| `seed` | integer | 确定性种子 | 固定值用于可复现性 |\n\n**Token计费参考（2025 Q4）**\n\n| 模型 | 输入价格 | 输出价格 | 上下文窗口 | 适合场景 |\n|------|---------|---------|-----------|---------|\n| GPT-4o | $2.50/1M | $10.00/1M | 128K | 通用任务 |\n| GPT-4o-mini | $0.15/1M | $0.60/1M | 128K | 高频低复杂度 |\n| o1 | $15.00/1M | $60.00/1M | 200K | 复杂推理 |\n| o3-mini | $1.10/1M | $4.40/1M | 200K | 性价比推理 |\n| GPT-4.1 | $2.00/1M | $8.00/1M | 1M | 长文档处理 |\n\n⚠️ **常见陷阱**\n\n1. **Token计算**：中文1个字≈1.5-2个token，预估时需要考虑\n2. **速率限制**：免费/低等级API key有严格的RPM/TPM限制\n3. **超时处理**：长文本生成需要设置合理的timeout\n4. **JSON模式**：使用 `response_format={\"type\": \"json_object\"}` 时，prompt中必须明确提到JSON\n\n---"
        },
        {
          "id": "B.1.2 Anthropic API ",
          "title": "B.1.2 Anthropic API (Claude)",
          "content": "**基础信息**\n\n| 项目 | 说明 |\n|------|------|\n| Base URL | `https://api.anthropic.com/v1` |\n| 认证方式 | `x-api-key` Header + `anthropic-version` Header |\n| SDK | `pip install anthropic` |\n| 支持模型 | Claude Opus 4, Claude Sonnet 4, Claude Haiku 3.5 |\n| API版本 | `2023-06-01` |\n\n**Messages API — 最小示例**\n\n\n**Claude独特功能**\n\n\n**工具调用**\n\n\n**Token计费参考**\n\n| 模型 | 输入价格 | 输出价格 | 上下文窗口 | 适合场景 |\n|------|---------|---------|-----------|---------|\n| Claude Opus 4 | $15.00/1M | $75.00/1M | 200K | 最强能力 |\n| Claude Sonnet 4 | $3.00/1M | $15.00/1M | 200K | 最佳性价比 |\n| Claude Haiku 3.5 | $0.80/1M | $4.00/1M | 200K | 高速低延迟 |\n\n⚠️ **常见陷阱**\n\n1. **API Header差异**：Anthropic用 `x-api-key` 而非 `Authorization: Bearer`\n2. **system位置**：system prompt是顶层参数，不在messages数组中\n3. **content格式**：assistant消息中tool_use和text可能同时存在，需要遍历content数组\n4. **缓存提示**：使用 `cache_control: {\"type\": \"ephemeral\"}` 可以降低重复prompt的成本\n\n---"
        },
        {
          "id": "B.1.3 本地模型推理",
          "title": "B.1.3 本地模型推理",
          "content": "**Ollama — 最简单的本地推理方案**\n\n\n\n**vLLM — 高性能推理服务**\n\n\n\n**本地模型推荐**\n\n| 模型 | 参数量 | 显存需求 | 能力水平 | 推荐场景 |\n|------|-------|---------|---------|---------|\n| Qwen2.5-7B-Instruct | 7B | 8GB | 中等 | 通用对话 |\n| Qwen2.5-14B-Instruct | 14B | 16GB | 良好 | 复杂任务 |\n| Llama-3.1-8B | 8B | 8GB | 中等 | 英文场景 |\n| DeepSeek-Coder-V2-Lite | 16B | 20GB | 代码强 | 编程辅助 |\n| Mistral-Nemo-12B | 12B | 12GB | 良好 | 多语言 |\n\n---"
        }
      ]
    },
    {
      "id": "B.2 向量数据库",
      "title": "B.2 向量数据库",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.2.1 选择指南",
          "title": "B.2.1 选择指南",
          "content": "| 数据库 | 类型 | 许可证 | 最大规模 | 延迟 | 适合场景 |\n|--------|------|--------|---------|------|---------|\n| **Pinecone** | 托管 | 商业 | 10亿+ | < 50ms | 快速上手，无需运维 |\n| **Weaviate** | 自托管/托管 | BSD-3 | 10亿+ | < 100ms | 全功能，混合搜索 |\n| **Milvus** | 自托管 | Apache 2.0 | 100亿+ | < 100ms | 超大规模 |\n| **Chroma** | 嵌入式 | Apache 2.0 | 100万+ | < 10ms | 本地开发/原型 |\n| **Qdrant** | 自托管/托管 | Apache 2.0 | 10亿+ | < 50ms | 高性能过滤 |"
        },
        {
          "id": "B.2.2 Chroma — 嵌入式首选",
          "title": "B.2.2 Chroma — 嵌入式首选",
          "content": ""
        },
        {
          "id": "B.2.3 Pinecone — 托管服",
          "title": "B.2.3 Pinecone — 托管服务首选",
          "content": "**Pinecone Serverless计费**\n\n| 项目 | 价格 |\n|------|------|\n| 索引存储 | $0.14/GB/月 |\n| 读取 | $2.00/百万次查询 |\n| 写入 | $1.00/百万次更新 |\n| 命名空间 | 前100个免费 |"
        },
        {
          "id": "B.2.4 Milvus — 超大规模首",
          "title": "B.2.4 Milvus — 超大规模首选",
          "content": "**Milvus部署资源建议**\n\n| 规模 | CPU | 内存 | 存储 | 推荐配置 |\n|------|-----|------|------|---------|\n| 开发 | 4核 | 8GB | 50GB SSD | 单机Docker |\n| 生产（小） | 8核 | 32GB | 200GB SSD | 3节点集群 |\n| 生产（大） | 16核 | 64GB | 1TB SSD | 5+节点集群 |\n| 超大规模 | 32核+ | 128GB+ | 10TB+ | K8s集群 |"
        },
        {
          "id": "B.2.5 Qdrant — 高性能过滤",
          "title": "B.2.5 Qdrant — 高性能过滤首选",
          "content": "---"
        }
      ]
    },
    {
      "id": "B.3 嵌入模型",
      "title": "B.3 嵌入模型",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.3.1 嵌入模型对比",
          "title": "B.3.1 嵌入模型对比",
          "content": "| 模型 | 维度 | 最大输入 | 速度 | 质量 | 价格 | 推荐场景 |\n|------|------|---------|------|------|------|---------|\n| `text-embedding-3-small` | 1536 | 8191 tokens | 快 | 良好 | $0.02/1M tokens | 通用 |\n| `text-embedding-3-large` | 3072 | 8191 tokens | 中 | 优秀 | $0.13/1M tokens | 高质量 |\n| `text-embedding-ada-002` | 1536 | 8191 tokens | 快 | 一般 | $0.10/1M tokens | 兼容旧系统 |\n| `bge-large-zh-v1.5` | 1024 | 512 tokens | 快 | 优秀(中文) | 免费(本地) | 中文场景 |\n| `bge-m3` | 1024 | 8192 tokens | 中 | 优秀(多语言) | 免费(本地) | 多语言 |\n| `m3e-base` | 768 | 512 tokens | 快 | 良好(中文) | 免费(本地) | 轻量中文 |\n| `nomic-embed-text` | 768 | 8192 tokens | 快 | 良好 | 免费(本地) | 通用本地 |\n| `cohere-embed-v3` | 1024 | 512 tokens | 快 | 优秀 | $0.10/1M tokens | 多语言 |"
        },
        {
          "id": "B.3.2 OpenAI嵌入 — 快速使",
          "title": "B.3.2 OpenAI嵌入 — 快速使用",
          "content": ""
        },
        {
          "id": "B.3.3 本地嵌入 — 使用sente",
          "title": "B.3.3 本地嵌入 — 使用sentence-transformers",
          "content": "💡 **嵌入降维技巧**\n\n\n---"
        }
      ]
    },
    {
      "id": "B.4 搜索引擎API",
      "title": "B.4 搜索引擎API",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.4.1 Tavily — AI优化的",
          "title": "B.4.1 Tavily — AI优化的搜索API",
          "content": "**Tavily计费**\n\n| 计划 | 月价格 | 搜索次数 | 特点 |\n|------|--------|---------|------|\n| Free | $0 | 1000次 | 基础搜索 |\n| Starter | $40 | 1000次 | 深度搜索 |\n| Pro | $100 | 5000次 | API优先支持 |\n| Enterprise | 定制 | 定制 | SLA保障 |"
        },
        {
          "id": "B.4.2 SerpAPI — Goog",
          "title": "B.4.2 SerpAPI — Google搜索结果",
          "content": ""
        },
        {
          "id": "B.4.3 Brave Search A",
          "title": "B.4.3 Brave Search API",
          "content": "**搜索引擎对比**\n\n| API | 月价格 | 搜索质量 | 中文支持 | 速率限制 |\n|-----|--------|---------|---------|---------|\n| Tavily | $0-100+ | AI优化 | ✅ | 按计划 |\n| SerpAPI | $50-500+ | Google原生 | ✅ | 100/月免费 |\n| Brave Search | $0-25+ | 良好 | ⚠️ | 2000/月免费 |\n| Bing Web Search | $0+ | 良好 | ✅ | 1000/月免费 |\n\n---"
        }
      ]
    },
    {
      "id": "B.5 文件处理工具",
      "title": "B.5 文件处理工具",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.5.1 PDF处理",
          "title": "B.5.1 PDF处理",
          "content": "**PyPDF2 — 基础PDF操作**\n\n\n**Unstructured — 智能文档解析**\n\n\n\n**pdfplumber — 表格提取**"
        },
        {
          "id": "B.5.2 文档转换",
          "title": "B.5.2 文档转换",
          "content": "**python-docx — Word文档操作**\n\n\n**pandas — 数据文件处理**"
        },
        {
          "id": "B.5.3 多媒体处理",
          "title": "B.5.3 多媒体处理",
          "content": "**FFmpeg — 音视频处理**\n\n\n**Whisper — 语音转文字**\n\n\n**Pillow — 图像处理**\n\n\n---"
        }
      ]
    },
    {
      "id": "B.6 实用工具库",
      "title": "B.6 实用工具库",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.6.1 HTTP客户端",
          "title": "B.6.1 HTTP客户端",
          "content": ""
        },
        {
          "id": "B.6.2 JSON处理",
          "title": "B.6.2 JSON处理",
          "content": ""
        },
        {
          "id": "B.6.3 环境变量管理",
          "title": "B.6.3 环境变量管理",
          "content": ""
        },
        {
          "id": "B.6.4 日志记录",
          "title": "B.6.4 日志记录",
          "content": "---"
        }
      ]
    },
    {
      "id": "B.7 成本估算工具",
      "title": "B.7 成本估算工具",
      "level": 2,
      "content": "",
      "subsections": [
        {
          "id": "B.7.1 Token计数",
          "title": "B.7.1 Token计数",
          "content": ""
        },
        {
          "id": "B.7.2 成本估算公式",
          "title": "B.7.2 成本估算公式",
          "content": "**示例**：\n\n假设使用GPT-4o，日均1000次请求，平均每次输入1000 tokens + 输出500 tokens："
        },
        {
          "id": "B.7.3 成本优化速查",
          "title": "B.7.3 成本优化速查",
          "content": "| 策略 | 节省幅度 | 实现难度 |\n|------|---------|---------|\n| 使用GPT-4o-mini替代GPT-4o | 60-80% | ⭐ |\n| Prompt缓存（Anthropic） | 90%（缓存命中） | ⭐⭐ |\n| 上下文压缩 | 30-50% | ⭐⭐ |\n| 语义缓存 | 50-80%（缓存命中） | ⭐⭐⭐ |\n| 模型路由（简单→复杂） | 40-60% | ⭐⭐ |\n| 本地模型（部分请求） | 80-100%（本地部分） | ⭐⭐⭐⭐ |\n\n---\n\n*附录B完*"
        }
      ]
    }
  ],
  "code_blocks": [
    {
      "id": "code-1",
      "language": "python",
      "description": "Chat Completion — 最小示例",
      "code": "from openai import OpenAI\n\nclient = OpenAI(api_key=\"sk-...\")\n\nresponse = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[\n        {\"role\": \"system\", \"content\": \"你是一个有帮助的助手。\"},\n        {\"role\": \"user\", \"content\": \"什么是RAG？\"}\n    ],\n    temperature=0.7,\n    max_tokens=1000,\n)\n\nprint(response.choices[0].message.content)",
      "section_ref": "B.1.1 OpenAI API",
      "runnable": true,
      "dependencies": [
        "openai"
      ]
    },
    {
      "id": "code-2",
      "language": "python",
      "description": "流式输出",
      "code": "stream = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"讲个故事\"}],\n    stream=True,\n)\n\nfor chunk in stream:\n    if chunk.choices[0].delta.content is not None:\n        print(chunk.choices[0].delta.content, end=\"\", flush=True)",
      "section_ref": "B.1.1 OpenAI API",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-3",
      "language": "python",
      "description": "函数调用（Tool Use）",
      "code": "tools = [\n    {\n        \"type\": \"function\",\n        \"function\": {\n            \"name\": \"get_weather\",\n            \"description\": \"获取指定城市的天气信息\",\n            \"parameters\": {\n                \"type\": \"object\",\n                \"properties\": {\n                    \"city\": {\"type\": \"string\", \"description\": \"城市名称\"},\n                    \"unit\": {\"type\": \"string\", \"enum\": [\"celsius\", \"fahrenheit\"]}\n                },\n                \"required\": [\"city\"]\n            }\n        }\n    }\n]\n\nresponse = client.chat.completions.create(\n    model=\"gpt-4o\",\n    messages=[{\"role\": \"user\", \"content\": \"北京天气怎么样？\"}],\n    tools=tools,\n    tool_choice=\"auto\",\n)\n\n# 检查是否有工具调用\nif response.choices[0].message.tool_calls:\n    tool_call = response.choices[0].message.tool_calls[0]\n    function_name = tool_call.function.name\n    function_args = json.loads(tool_call.function.arguments)\n    # 执行本地函数...",
      "section_ref": "B.1.1 OpenAI API",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-4",
      "language": "python",
      "description": "Messages API — 最小示例",
      "code": "import anthropic\n\nclient = anthropic.Anthropic(api_key=\"sk-ant-...\")\n\nmessage = client.messages.create(\n    model=\"claude-sonnet-4-20250514\",\n    max_tokens=1024,\n    system=\"你是一个有帮助的助手。\",\n    messages=[\n        {\"role\": \"user\", \"content\": \"什么是RAG？\"}\n    ],\n)\n\nprint(message.content[0].text)",
      "section_ref": "B.1.2 Anthropic API ",
      "runnable": true,
      "dependencies": [
        "anthropic"
      ]
    },
    {
      "id": "code-5",
      "language": "python",
      "description": "Claude独特功能",
      "code": "# 扩展思考（Extended Thinking）- 复杂推理场景\nmessage = client.messages.create(\n    model=\"claude-sonnet-4-20250514\",\n    max_tokens=16000,\n    thinking={\n        \"type\": \"enabled\",\n        \"budget_tokens\": 10000  # 思考token预算\n    },\n    messages=[\n        {\"role\": \"user\", \"content\": \"请分析量子计算对密码学的影响\"}\n    ]\n)\n\n# 读取思考过程（调试用）\nfor block in message.content:\n    if block.type == \"thinking\":\n        print(f\"[思考] {block.thinking}\")\n    elif block.type == \"text\":\n        print(f\"[回答] {block.text}\")",
      "section_ref": "B.1.2 Anthropic API ",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-6",
      "language": "python",
      "description": "工具调用",
      "code": "tools = [\n    {\n        \"name\": \"get_weather\",\n        \"description\": \"获取城市天气\",\n        \"input_schema\": {\n            \"type\": \"object\",\n            \"properties\": {\n                \"city\": {\"type\": \"string\", \"description\": \"城市名\"}\n            },\n            \"required\": [\"city\"]\n        }\n    }\n]\n\n# 第一次调用 - 获取工具请求\nresponse = client.messages.create(\n    model=\"claude-sonnet-4-20250514\",\n    max_tokens=4096,\n    tools=tools,\n    messages=[{\"role\": \"user\", \"content\": \"北京天气？\"}]\n)\n\n# 提取工具调用\ntool_use = next(b for b in response.content if b.type == \"tool_use\")\nprint(f\"调用: {tool_use.name}({tool_use.input})\")\n\n# 第二次调用 - 返回工具结果\nresponse = client.messages.create(\n    model=\"claude-sonnet-4-20250514\",\n    max_tokens=4096,\n    tools=tools,\n    messages=[\n        {\"role\": \"user\", \"content\": \"北京天气？\"},\n        {\"role\": \"assistant\", \"content\": response.content},\n        {\"role\": \"user\", \"content\": [\n            {\"type\": \"tool_result\", \"tool_use_id\": tool_use.id, \"content\": \"晴天，25°C\"}\n        ]}\n    ]\n)",
      "section_ref": "B.1.2 Anthropic API ",
      "runnable": true,
      "dependencies": []
    },
    {
      "id": "code-7",
      "language": "bash",
      "description": "Ollama — 最简单的本地推理方案",
      "code": "# 安装并运行\ncurl -fsSL https://ollama.com/install.sh | sh\nollama run llama3.1:8b\nollama run qwen2.5:14b\n\n# OpenAI兼容API\n# Base URL: http://localhost:11434/v1",
      "section_ref": "B.1.3 本地模型推理",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-8",
      "language": "python",
      "description": "",
      "code": "# 使用OpenAI SDK连接Ollama\nfrom openai import OpenAI\n\nclient = OpenAI(\n    base_url=\"http://localhost:11434/v1\",\n    api_key=\"ollama\"  # 任意值\n)\n\nresponse = client.chat.completions.create(\n    model=\"qwen2.5:14b\",\n    messages=[{\"role\": \"user\", \"content\": \"你好\"}]\n)",
      "section_ref": "B.1.3 本地模型推理",
      "runnable": true,
      "dependencies": [
        "openai"
      ]
    },
    {
      "id": "code-9",
      "language": "bash",
      "description": "vLLM — 高性能推理服务",
      "code": "# 启动推理服务\npython -m vllm.entrypoints.openai.api_server \\\n    --model Qwen/Qwen2.5-14B-Instruct \\\n    --tensor-parallel-size 2 \\\n    --max-model-len 8192 \\\n    --port 8000",
      "section_ref": "B.1.3 本地模型推理",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-10",
      "language": "python",
      "description": "--port 8000",
      "code": "# 完全兼容OpenAI API\nfrom openai import OpenAI\n\nclient = OpenAI(base_url=\"http://localhost:8000/v1\", api_key=\"not-needed\")",
      "section_ref": "B.1.3 本地模型推理",
      "runnable": true,
      "dependencies": [
        "openai"
      ]
    },
    {
      "id": "code-11",
      "language": "python",
      "description": "| Qdrant | 自托管/托管 | Apache 2.0 | 10亿+ | < 50ms | 高性能过滤 |",
      "code": "import chromadb\nfrom chromadb.utils import embedding_functions\n\n# 初始化（嵌入式，无需服务器）\nclient = chromadb.PersistentClient(path=\"./chroma_db\")\n\n# OpenAI嵌入函数\nopenai_ef = embedding_functions.OpenAIEmbeddingFunction(\n    api_key=\"sk-...\",\n    model_name=\"text-embedding-3-small\"\n)\n\n# 创建集合\ncollection = client.get_or_create_collection(\n    name=\"documents\",\n    embedding_function=openai_ef,\n    metadata={\"hnsw:space\": \"cosine\"}\n)\n\n# 添加文档\ncollection.add(\n    documents=[\"RAG是检索增强生成的缩写\", \"Agent是自主的AI系统\"],\n    metadatas=[{\"source\": \"wiki\"}, {\"source\": \"wiki\"}],\n    ids=[\"doc1\", \"doc2\"]\n)\n\n# 查询\nresults = collection.query(\n    query_texts=[\"什么是RAG？\"],\n    n_results=3,\n    where={\"source\": \"wiki\"}  # 元数据过滤\n)",
      "section_ref": "B.2.2 Chroma — 嵌入式首选",
      "runnable": true,
      "dependencies": [
        "chromadb"
      ]
    },
    {
      "id": "code-12",
      "language": "python",
      "description": "",
      "code": "from pinecone import Pinecone\n\npc = Pinecone(api_key=\"your-key\")\n\n# 创建索引\npc.create_index(\n    name=\"agent-docs\",\n    dimension=1536,  # text-embedding-3-small维度\n    metric=\"cosine\",\n    spec={\"serverless\": {\"cloud\": \"aws\", \"region\": \"us-east-1\"}}\n)\n\n# 连接\nindex = pc.Index(\"agent-docs\")\n\n# 插入向量\nindex.upsert(\n    vectors=[\n        {\"id\": \"v1\", \"values\": [0.1, 0.2, ...], \"metadata\": {\"source\": \"doc1\"}},\n        {\"id\": \"v2\", \"values\": [0.3, 0.4, ...], \"metadata\": {\"source\": \"doc2\"}},\n    ]\n)\n\n# 查询\nresults = index.query(\n    vector=[0.1, 0.2, ...],\n    top_k=5,\n    include_metadata=True,\n    filter={\"source\": {\"$eq\": \"doc1\"}}\n)",
      "section_ref": "B.2.3 Pinecone — 托管服",
      "runnable": true,
      "dependencies": [
        "pinecone"
      ]
    },
    {
      "id": "code-13",
      "language": "python",
      "description": "| 命名空间 | 前100个免费 |",
      "code": "from pymilvus import MilvusClient, DataType\n\n# 连接\nclient = MilvusClient(uri=\"http://localhost:19530\")\n\n# 创建集合\nclient.create_collection(\n    collection_name=\"documents\",\n    dimension=1536,\n    metric_type=\"COSINE\",\n    id_type=\"string\",  # 支持自定义ID\n)\n\n# 插入数据\nclient.insert(\n    collection_name=\"documents\",\n    data=[\n        {\"id\": \"doc1\", \"vector\": [0.1, ...], \"text\": \"RAG技术介绍\", \"source\": \"wiki\"},\n        {\"id\": \"doc2\", \"vector\": [0.3, ...], \"text\": \"Agent架构设计\", \"source\": \"wiki\"},\n    ]\n)\n\n# 搜索\nresults = client.search(\n    collection_name=\"documents\",\n    data=[[0.1, ...]],  # 查询向量\n    limit=5,\n    output_fields=[\"text\", \"source\"],\n    filter='source == \"wiki\"'\n)",
      "section_ref": "B.2.4 Milvus — 超大规模首",
      "runnable": true,
      "dependencies": [
        "pymilvus"
      ]
    },
    {
      "id": "code-14",
      "language": "python",
      "description": "| 超大规模 | 32核+ | 128GB+ | 10TB+ | K8s集群 |",
      "code": "from qdrant_client import QdrantClient\nfrom qdrant_client.models import Distance, VectorParams, PointStruct\n\nclient = QdrantClient(host=\"localhost\", port=6333)\n\n# 创建集合\nclient.create_collection(\n    collection_name=\"docs\",\n    vectors_config=VectorParams(size=1536, distance=Distance.COSINE),\n)\n\n# 插入\nclient.upsert(\n    collection_name=\"docs\",\n    points=[\n        PointStruct(id=1, vector=[0.1, ...], payload={\"text\": \"RAG介绍\"}),\n    ]\n)\n\n# 搜索（带复杂过滤）\nresults = client.search(\n    collection_name=\"docs\",\n    query_vector=[0.1, ...],\n    query_filter={\n        \"must\": [\n            {\"key\": \"category\", \"match\": {\"value\": \"tech\"}},\n            {\"key\": \"date\", \"range\": {\"gte\": \"2025-01-01\"}}\n        ]\n    },\n    limit=10\n)",
      "section_ref": "B.2.5 Qdrant — 高性能过滤",
      "runnable": true,
      "dependencies": [
        "qdrant_client"
      ]
    },
    {
      "id": "code-15",
      "language": "python",
      "description": "| cohere-embed-v3 | 1024 | 512 tokens | 快 | 优秀 | $0.10/1M tokens | 多语言 |",
      "code": "from openai import OpenAI\n\nclient = OpenAI(api_key=\"sk-...\")\n\n# 基础嵌入\nresponse = client.embeddings.create(\n    model=\"text-embedding-3-small\",\n    input=\"这是一段需要嵌入的文本\",\n    dimensions=1536  # 可降维以节省存储\n)\n\nembedding = response.data[0].embedding\nprint(f\"维度: {len(embedding)}\")\n\n# 批量嵌入（降低API调用次数）\ntexts = [\"文本1\", \"文本2\", \"文本3\", ...]  # 最多2048条\nresponse = client.embeddings.create(\n    model=\"text-embedding-3-small\",\n    input=texts\n)",
      "section_ref": "B.3.2 OpenAI嵌入 — 快速使",
      "runnable": true,
      "dependencies": [
        "openai"
      ]
    },
    {
      "id": "code-16",
      "language": "python",
      "description": "",
      "code": "from sentence_transformers import SentenceTransformer\n\n# 加载模型（首次会自动下载）\nmodel = SentenceTransformer('BAAI/bge-large-zh-v1.5')\n\n# 单条嵌入\nembedding = model.encode(\"RAG是检索增强生成\")\n\n# 批量嵌入\ntexts = [\"文本1\", \"文本2\", \"文本3\"]\nembeddings = model.encode(texts, batch_size=32, show_progress_bar=True)\n\n# 查看维度\nprint(f\"维度: {embeddings.shape[1]}\")  # 1024",
      "section_ref": "B.3.3 本地嵌入 — 使用sente",
      "runnable": true,
      "dependencies": [
        "sentence_transformers"
      ]
    },
    {
      "id": "code-17",
      "language": "python",
      "description": "💡 嵌入降维技巧",
      "code": "# OpenAI支持直接降维\nresponse = client.embeddings.create(\n    model=\"text-embedding-3-small\",\n    input=\"文本\",\n    dimensions=512  # 从1536降到512，节省2/3存储\n)\n\n# 本地模型使用PCA降维\nfrom sklearn.decomposition import PCA\nimport numpy as np\n\nembeddings = model.encode(texts)\npca = PCA(n_components=256)\nreduced = pca.fit_transform(embeddings)",
      "section_ref": "B.3.3 本地嵌入 — 使用sente",
      "runnable": true,
      "dependencies": [
        "sklearn",
        "numpy"
      ]
    },
    {
      "id": "code-18",
      "language": "python",
      "description": "",
      "code": "import tavily\n\n# 安装: pip install tavily-python\n\nclient = tavily.TavilyClient(api_key=\"tvly-...\")\n\n# 基础搜索\nresult = client.search(\"RAG技术最新进展\", max_results=5)\n\n# AI Agent专用搜索\nresult = client.search(\n    query=\"RAG技术最新进展\",\n    search_depth=\"advanced\",  # basic | advanced\n    include_answer=True,       # AI生成摘要\n    include_raw_content=True,  # 原始网页内容\n    max_results=5\n)\n\n# 提取结果\nfor r in result[\"results\"]:\n    print(f\"标题: {r['title']}\")\n    print(f\"URL: {r['url']}\")\n    print(f\"内容: {r['content'][:200]}...\")",
      "section_ref": "B.4.1 Tavily — AI优化的",
      "runnable": true,
      "dependencies": [
        "tavily"
      ]
    },
    {
      "id": "code-19",
      "language": "python",
      "description": "| Enterprise | 定制 | 定制 | SLA保障 |",
      "code": "from serpapi import GoogleSearch\n\nparams = {\n    \"api_key\": \"your-key\",\n    \"engine\": \"google\",\n    \"q\": \"RAG技术最新进展\",\n    \"num\": 10,\n    \"gl\": \"cn\",   # 地理位置\n    \"hl\": \"zh-cn\"  # 语言\n}\n\nsearch = GoogleSearch(params)\nresults = search.get_dict()\n\nfor organic in results.get(\"organic_results\", []):\n    print(f\"标题: {organic['title']}\")\n    print(f\"链接: {organic['link']}\")\n    print(f\"摘要: {organic.get('snippet', 'N/A')}\")",
      "section_ref": "B.4.2 SerpAPI — Goog",
      "runnable": true,
      "dependencies": [
        "serpapi"
      ]
    },
    {
      "id": "code-20",
      "language": "python",
      "description": "",
      "code": "import requests\n\nheaders = {\"X-Subscription-Token\": \"your-key\"}\nparams = {\"q\": \"RAG technology\", \"count\": 5}\n\nresponse = requests.get(\n    \"https://api.search.brave.com/res/v1/web/search\",\n    headers=headers,\n    params=params\n)\n\ndata = response.json()\nfor web in data.get(\"web\", {}).get(\"results\", []):\n    print(f\"标题: {web['title']}\")\n    print(f\"URL: {web['url']}\")",
      "section_ref": "B.4.3 Brave Search A",
      "runnable": true,
      "dependencies": [
        "requests"
      ]
    },
    {
      "id": "code-21",
      "language": "python",
      "description": "PyPDF2 — 基础PDF操作",
      "code": "from pypdf import PdfReader, PdfWriter\n\n# 读取PDF\nreader = PdfReader(\"document.pdf\")\nprint(f\"页数: {len(reader.pages)}\")\n\n# 提取文本\nfull_text = \"\"\nfor page in reader.pages:\n    full_text += page.extract_text() + \"\\n\"\n\n# 提取元数据\nmeta = reader.metadata\nprint(f\"标题: {meta.title}\")\nprint(f\"作者: {meta.author}\")",
      "section_ref": "B.5.1 PDF处理",
      "runnable": true,
      "dependencies": [
        "pypdf"
      ]
    },
    {
      "id": "code-22",
      "language": "bash",
      "description": "Unstructured — 智能文档解析",
      "code": "pip install unstructured[all-docs]",
      "section_ref": "B.5.1 PDF处理",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-23",
      "language": "python",
      "description": "pip install unstructured[all-docs]",
      "code": "from unstructured.partition.auto import partition\n\n# 自动检测文件类型并解析\nelements = partition(filename=\"document.pdf\")\n\nfor element in elements:\n    print(f\"类型: {type(element).__name__}\")\n    print(f\"内容: {str(element)[:200]}\")",
      "section_ref": "B.5.1 PDF处理",
      "runnable": true,
      "dependencies": [
        "unstructured"
      ]
    },
    {
      "id": "code-24",
      "language": "python",
      "description": "pdfplumber — 表格提取",
      "code": "import pdfplumber\n\nwith pdfplumber.open(\"report.pdf\") as pdf:\n    for page in pdf.pages:\n        # 提取表格\n        tables = page.extract_tables()\n        for table in tables:\n            for row in table:\n                print(row)",
      "section_ref": "B.5.1 PDF处理",
      "runnable": true,
      "dependencies": [
        "pdfplumber"
      ]
    },
    {
      "id": "code-25",
      "language": "python",
      "description": "python-docx — Word文档操作",
      "code": "from docx import Document\n\n# 读取Word文档\ndoc = Document(\"report.docx\")\n\n# 提取所有段落\nfor para in doc.paragraphs:\n    print(f\"[{para.style.name}] {para.text}\")\n\n# 提取表格\nfor table in doc.tables:\n    for row in table.rows:\n        cells = [cell.text for cell in row.cells]\n        print(\" | \".join(cells))",
      "section_ref": "B.5.2 文档转换",
      "runnable": true,
      "dependencies": [
        "docx"
      ]
    },
    {
      "id": "code-26",
      "language": "python",
      "description": "pandas — 数据文件处理",
      "code": "import pandas as pd\n\n# 读取Excel\ndf = pd.read_excel(\"data.xlsx\", sheet_name=\"Sheet1\")\n\n# 读取CSV\ndf = pd.read_csv(\"data.csv\", encoding=\"utf-8\")\n\n# 读取JSON\ndf = pd.read_json(\"data.json\")\n\n# 转换为文本（适合嵌入）\ntext = df.to_string(index=False)",
      "section_ref": "B.5.2 文档转换",
      "runnable": true,
      "dependencies": [
        "pandas"
      ]
    },
    {
      "id": "code-27",
      "language": "bash",
      "description": "FFmpeg — 音视频处理",
      "code": "# 音频转文字（配合Whisper）\nffmpeg -i audio.mp3 -ar 16000 -ac 1 audio_16k.wav\n\n# 提取音频\nffmpeg -i video.mp4 -vn -acodec copy audio.aac\n\n# 视频截图\nffmpeg -i video.mp4 -ss 00:01:30 -frames:v 1 screenshot.jpg",
      "section_ref": "B.5.3 多媒体处理",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-28",
      "language": "python",
      "description": "Whisper — 语音转文字",
      "code": "import whisper\n\nmodel = whisper.load_model(\"base\")  # tiny, base, small, medium, large\nresult = model.transcribe(\"audio.mp3\", language=\"zh\")\n\nprint(result[\"text\"])",
      "section_ref": "B.5.3 多媒体处理",
      "runnable": true,
      "dependencies": [
        "whisper"
      ]
    },
    {
      "id": "code-29",
      "language": "python",
      "description": "Pillow — 图像处理",
      "code": "from PIL import Image\n\n# 读取图像\nimg = Image.open(\"photo.jpg\")\nprint(f\"尺寸: {img.size}\")\n\n# 转换格式\nimg.save(\"photo.png\")\n\n# 压缩\nimg.save(\"photo_compressed.jpg\", quality=85, optimize=True)\n\n# 获取Base64（用于多模态API）\nimport base64\nimport io\n\nbuffer = io.BytesIO()\nimg.save(buffer, format=\"JPEG\")\nb64 = base64.b64encode(buffer.getvalue()).decode()",
      "section_ref": "B.5.3 多媒体处理",
      "runnable": true,
      "dependencies": [
        "PIL",
        "base64"
      ]
    },
    {
      "id": "code-30",
      "language": "python",
      "description": "",
      "code": "import httpx\n\n# 异步HTTP客户端（推荐用于Agent开发）\nasync def fetch(url: str) -> str:\n    async with httpx.AsyncClient(timeout=30.0) as client:\n        response = await client.get(url)\n        response.raise_for_status()\n        return response.text\n\n# 带重试\nasync def fetch_with_retry(url: str, max_retries: int = 3):\n    async with httpx.AsyncClient() as client:\n        for attempt in range(max_retries):\n            try:\n                response = await client.get(url)\n                response.raise_for_status()\n                return response.json()\n            except httpx.HTTPError as e:\n                if attempt == max_retries - 1:\n                    raise\n                await asyncio.sleep(2 ** attempt)  # 指数退避",
      "section_ref": "B.6.1 HTTP客户端",
      "runnable": true,
      "dependencies": [
        "httpx"
      ]
    },
    {
      "id": "code-31",
      "language": "python",
      "description": "",
      "code": "import json\nimport orjson  # 高性能JSON库\n\n# 标准JSON\ndata = json.loads(json_string)\njson_string = json.dumps(data, ensure_ascii=False, indent=2)\n\n# orjson（快2-3倍）\ndata = orjson.loads(json_bytes)\njson_bytes = orjson.dumps(data, option=orjson.OPT_INDENT_2)",
      "section_ref": "B.6.2 JSON处理",
      "runnable": true,
      "dependencies": [
        "orjson"
      ]
    },
    {
      "id": "code-32",
      "language": "python",
      "description": "",
      "code": "# python-dotenv\nfrom dotenv import load_dotenv\nimport os\n\nload_dotenv()  # 加载.env文件\n\napi_key = os.getenv(\"OPENAI_API_KEY\")\nassert api_key, \"OPENAI_API_KEY 环境变量未设置\"",
      "section_ref": "B.6.3 环境变量管理",
      "runnable": true,
      "dependencies": [
        "dotenv"
      ]
    },
    {
      "id": "code-33",
      "language": "bash",
      "description": "assert apikey, \"OPENAIAPIKEY 环境变量未设置\"",
      "code": "# .env 文件\nOPENAI_API_KEY=sk-...\nANTHROPIC_API_KEY=sk-ant-...\nTAVILY_API_KEY=tvly-...\nDATABASE_URL=postgresql://user:pass@localhost:5432/mydb",
      "section_ref": "B.6.3 环境变量管理",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-34",
      "language": "python",
      "description": "",
      "code": "import logging\nimport structlog\n\n# 标准logging\nlogging.basicConfig(\n    level=logging.INFO,\n    format=\"%(asctime)s [%(levelname)s] %(name)s: %(message)s\"\n)\nlogger = logging.getLogger(\"my_agent\")\n\n# structlog（推荐用于Agent，结构化日志）\nstructlog.configure(processors=[\n    structlog.processors.add_log_level,\n    structlog.processors.JSONRenderer()\n])\nlogger = structlog.get_logger()\n\nlogger.info(\"agent_invocation\", model=\"gpt-4o\", latency_ms=1200, tokens=500)",
      "section_ref": "B.6.4 日志记录",
      "runnable": true,
      "dependencies": [
        "structlog"
      ]
    },
    {
      "id": "code-35",
      "language": "python",
      "description": "",
      "code": "import tiktoken\n\n# OpenAI模型\nenc = tiktoken.encoding_for_model(\"gpt-4o\")\ntoken_count = len(enc.encode(\"这是一段文本\"))\nprint(f\"Token数: {token_count}\")\n\n# 通用计数（适用于大多数模型）\nenc = tiktoken.get_encoding(\"cl100k_base\")",
      "section_ref": "B.7.1 Token计数",
      "runnable": true,
      "dependencies": [
        "tiktoken"
      ]
    },
    {
      "id": "code-36",
      "language": "text",
      "description": "",
      "code": "单次请求成本 = (input_tokens × input_price + output_tokens × output_price) / 1,000,000\n\n月度成本估算 = 日均请求量 × 30 × 单次请求成本 × 1.2 (20%余量)",
      "section_ref": "B.7.2 成本估算公式",
      "runnable": false,
      "dependencies": []
    },
    {
      "id": "code-37",
      "language": "text",
      "description": "假设使用GPT-4o，日均1000次请求，平均每次输入1000 tokens + 输出500 tokens：",
      "code": "单次成本 = (1000 × $2.50 + 500 × $10.00) / 1,000,000 = $0.0075\n月度成本 = 1000 × 30 × $0.0075 × 1.2 = $270",
      "section_ref": "B.7.2 成本估算公式",
      "runnable": false,
      "dependencies": []
    }
  ],
  "tables": [
    {
      "headers": [
        "项目",
        "说明"
      ],
      "data": [
        [
          "Base URL",
          "`https://api.openai.com/v1`"
        ],
        [
          "认证方式",
          "Bearer Token (`Authorization: Bearer sk-...`)"
        ],
        [
          "SDK",
          "`pip install openai`"
        ],
        [
          "支持模型",
          "GPT-4o, GPT-4o-mini, o1, o1-mini, o3-mini, DALL-E 3, Whisper, TTS"
        ]
      ]
    },
    {
      "headers": [
        "参数",
        "类型",
        "说明",
        "推荐值"
      ],
      "data": [
        [
          "`model`",
          "string",
          "模型名称",
          "`gpt-4o`（通用）, `gpt-4o-mini`（低成本）"
        ],
        [
          "`temperature`",
          "float",
          "随机性 0-2",
          "0（事实性）, 0.7（创造性）, 1.0（多样性）"
        ],
        [
          "`max_tokens`",
          "integer",
          "最大输出token",
          "视需求，通常500-4000"
        ],
        [
          "`top_p`",
          "float",
          "核采样",
          "1.0（不限制）, 0.9（常用）"
        ],
        [
          "`frequency_penalty`",
          "float",
          "频率惩罚 -2到2",
          "0（默认）"
        ],
        [
          "`presence_penalty`",
          "float",
          "存在惩罚 -2到2",
          "0（默认）"
        ],
        [
          "`stop`",
          "list/string",
          "停止序列",
          "`[\"\\n\"]`, `\"<",
          "im_end",
          ">\"]"
        ],
        [
          "`response_format`",
          "object",
          "输出格式",
          "`{\"type\": \"json_object\"}`"
        ],
        [
          "`seed`",
          "integer",
          "确定性种子",
          "固定值用于可复现性"
        ]
      ]
    },
    {
      "headers": [
        "模型",
        "输入价格",
        "输出价格",
        "上下文窗口",
        "适合场景"
      ],
      "data": [
        [
          "GPT-4o",
          "$2.50/1M",
          "$10.00/1M",
          "128K",
          "通用任务"
        ],
        [
          "GPT-4o-mini",
          "$0.15/1M",
          "$0.60/1M",
          "128K",
          "高频低复杂度"
        ],
        [
          "o1",
          "$15.00/1M",
          "$60.00/1M",
          "200K",
          "复杂推理"
        ],
        [
          "o3-mini",
          "$1.10/1M",
          "$4.40/1M",
          "200K",
          "性价比推理"
        ],
        [
          "GPT-4.1",
          "$2.00/1M",
          "$8.00/1M",
          "1M",
          "长文档处理"
        ]
      ]
    },
    {
      "headers": [
        "项目",
        "说明"
      ],
      "data": [
        [
          "Base URL",
          "`https://api.anthropic.com/v1`"
        ],
        [
          "认证方式",
          "`x-api-key` Header + `anthropic-version` Header"
        ],
        [
          "SDK",
          "`pip install anthropic`"
        ],
        [
          "支持模型",
          "Claude Opus 4, Claude Sonnet 4, Claude Haiku 3.5"
        ],
        [
          "API版本",
          "`2023-06-01`"
        ]
      ]
    },
    {
      "headers": [
        "模型",
        "输入价格",
        "输出价格",
        "上下文窗口",
        "适合场景"
      ],
      "data": [
        [
          "Claude Opus 4",
          "$15.00/1M",
          "$75.00/1M",
          "200K",
          "最强能力"
        ],
        [
          "Claude Sonnet 4",
          "$3.00/1M",
          "$15.00/1M",
          "200K",
          "最佳性价比"
        ],
        [
          "Claude Haiku 3.5",
          "$0.80/1M",
          "$4.00/1M",
          "200K",
          "高速低延迟"
        ]
      ]
    },
    {
      "headers": [
        "模型",
        "参数量",
        "显存需求",
        "能力水平",
        "推荐场景"
      ],
      "data": [
        [
          "Qwen2.5-7B-Instruct",
          "7B",
          "8GB",
          "中等",
          "通用对话"
        ],
        [
          "Qwen2.5-14B-Instruct",
          "14B",
          "16GB",
          "良好",
          "复杂任务"
        ],
        [
          "Llama-3.1-8B",
          "8B",
          "8GB",
          "中等",
          "英文场景"
        ],
        [
          "DeepSeek-Coder-V2-Lite",
          "16B",
          "20GB",
          "代码强",
          "编程辅助"
        ],
        [
          "Mistral-Nemo-12B",
          "12B",
          "12GB",
          "良好",
          "多语言"
        ]
      ]
    },
    {
      "headers": [
        "数据库",
        "类型",
        "许可证",
        "最大规模",
        "延迟",
        "适合场景"
      ],
      "data": [
        [
          "**Pinecone**",
          "托管",
          "商业",
          "10亿+",
          "< 50ms",
          "快速上手，无需运维"
        ],
        [
          "**Weaviate**",
          "自托管/托管",
          "BSD-3",
          "10亿+",
          "< 100ms",
          "全功能，混合搜索"
        ],
        [
          "**Milvus**",
          "自托管",
          "Apache 2.0",
          "100亿+",
          "< 100ms",
          "超大规模"
        ],
        [
          "**Chroma**",
          "嵌入式",
          "Apache 2.0",
          "100万+",
          "< 10ms",
          "本地开发/原型"
        ],
        [
          "**Qdrant**",
          "自托管/托管",
          "Apache 2.0",
          "10亿+",
          "< 50ms",
          "高性能过滤"
        ]
      ]
    },
    {
      "headers": [
        "项目",
        "价格"
      ],
      "data": [
        [
          "索引存储",
          "$0.14/GB/月"
        ],
        [
          "读取",
          "$2.00/百万次查询"
        ],
        [
          "写入",
          "$1.00/百万次更新"
        ],
        [
          "命名空间",
          "前100个免费"
        ]
      ]
    },
    {
      "headers": [
        "规模",
        "CPU",
        "内存",
        "存储",
        "推荐配置"
      ],
      "data": [
        [
          "开发",
          "4核",
          "8GB",
          "50GB SSD",
          "单机Docker"
        ],
        [
          "生产（小）",
          "8核",
          "32GB",
          "200GB SSD",
          "3节点集群"
        ],
        [
          "生产（大）",
          "16核",
          "64GB",
          "1TB SSD",
          "5+节点集群"
        ],
        [
          "超大规模",
          "32核+",
          "128GB+",
          "10TB+",
          "K8s集群"
        ]
      ]
    },
    {
      "headers": [
        "模型",
        "维度",
        "最大输入",
        "速度",
        "质量",
        "价格",
        "推荐场景"
      ],
      "data": [
        [
          "`text-embedding-3-small`",
          "1536",
          "8191 tokens",
          "快",
          "良好",
          "$0.02/1M tokens",
          "通用"
        ],
        [
          "`text-embedding-3-large`",
          "3072",
          "8191 tokens",
          "中",
          "优秀",
          "$0.13/1M tokens",
          "高质量"
        ],
        [
          "`text-embedding-ada-002`",
          "1536",
          "8191 tokens",
          "快",
          "一般",
          "$0.10/1M tokens",
          "兼容旧系统"
        ],
        [
          "`bge-large-zh-v1.5`",
          "1024",
          "512 tokens",
          "快",
          "优秀(中文)",
          "免费(本地)",
          "中文场景"
        ],
        [
          "`bge-m3`",
          "1024",
          "8192 tokens",
          "中",
          "优秀(多语言)",
          "免费(本地)",
          "多语言"
        ],
        [
          "`m3e-base`",
          "768",
          "512 tokens",
          "快",
          "良好(中文)",
          "免费(本地)",
          "轻量中文"
        ],
        [
          "`nomic-embed-text`",
          "768",
          "8192 tokens",
          "快",
          "良好",
          "免费(本地)",
          "通用本地"
        ],
        [
          "`cohere-embed-v3`",
          "1024",
          "512 tokens",
          "快",
          "优秀",
          "$0.10/1M tokens",
          "多语言"
        ]
      ]
    },
    {
      "headers": [
        "计划",
        "月价格",
        "搜索次数",
        "特点"
      ],
      "data": [
        [
          "Free",
          "$0",
          "1000次",
          "基础搜索"
        ],
        [
          "Starter",
          "$40",
          "1000次",
          "深度搜索"
        ],
        [
          "Pro",
          "$100",
          "5000次",
          "API优先支持"
        ],
        [
          "Enterprise",
          "定制",
          "定制",
          "SLA保障"
        ]
      ]
    },
    {
      "headers": [
        "API",
        "月价格",
        "搜索质量",
        "中文支持",
        "速率限制"
      ],
      "data": [
        [
          "Tavily",
          "$0-100+",
          "AI优化",
          "✅",
          "按计划"
        ],
        [
          "SerpAPI",
          "$50-500+",
          "Google原生",
          "✅",
          "100/月免费"
        ],
        [
          "Brave Search",
          "$0-25+",
          "良好",
          "⚠️",
          "2000/月免费"
        ],
        [
          "Bing Web Search",
          "$0+",
          "良好",
          "✅",
          "1000/月免费"
        ]
      ]
    },
    {
      "headers": [
        "策略",
        "节省幅度",
        "实现难度"
      ],
      "data": [
        [
          "使用GPT-4o-mini替代GPT-4o",
          "60-80%",
          "⭐"
        ],
        [
          "Prompt缓存（Anthropic）",
          "90%（缓存命中）",
          "⭐⭐"
        ],
        [
          "上下文压缩",
          "30-50%",
          "⭐⭐"
        ],
        [
          "语义缓存",
          "50-80%（缓存命中）",
          "⭐⭐⭐"
        ],
        [
          "模型路由（简单→复杂）",
          "40-60%",
          "⭐⭐"
        ],
        [
          "本地模型（部分请求）",
          "80-100%（本地部分）",
          "⭐⭐⭐⭐"
        ]
      ]
    }
  ],
  "key_takeaways": [],
  "common_pitfalls": [],
  "related_chapters": []
}