第29章：内容生成流水线

从"一个人写一周"到"AI 帮你一天产出十篇"——构建全链路智能内容生成系统

29.1 需求分析与功能规划

29.1.1 业务背景

内容营销是数字时代的核心增长引擎，但内容生产面临三大困境：

产能瓶颈：一个优秀的内容创作者每周产出 3-5 篇高质量文章，远不够覆盖多渠道需求
质量不稳定：不同创作者水平差异大，审校标准不统一，发布质量参差不齐
渠道适配难：同一主题需要为公众号、知乎、小红书、抖音等不同平台定制不同格式和风格

我们需要构建一个 AI 驱动的内容生成流水线，覆盖从选题策划到多渠道分发的全流程：

智能选题策划：基于热点趋势和用户画像推荐高价值选题
AI 辅助写作：基于大纲和参考资料自动生成高质量初稿
自动审校优化：语法检查、风格统一、SEO 优化、敏感词过滤
多渠道适配：一键将内容转换为不同平台的格式和风格

29.1.2 功能清单

┌──────────────────────────────────────────────────────────┐
│              内容生成流水线功能架构                         │
├──────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌──────────────┐    │
│  │  选题策划    │  │  内容撰写    │  │  审校优化    │    │
│  │ • 热点追踪  │  │ • 大纲生成  │  │ • 语法检查  │    │
│  │ • 竞品分析  │  │ • 段落扩写  │  │ • SEO优化   │    │
│  │ • 关键词挖掘│  │ • 资料整合  │  │ • 敏感词过滤│    │
│  │ • 用户画像  │  │ • 风格定制  │  │ • 可读性评分│    │
│  └─────────────┘  └─────────────┘  └──────────────┘    │
│  ┌─────────────────────────────────────────────────┐     │
│  │                  多渠道分发层                     │     │
│  │  • 微信公众号  • 知乎  • 小红书  • 头条号         │     │
│  │  • 抖音脚本   • SEO文章  • 邮件营销              │     │
│  └─────────────────────────────────────────────────┘     │
└──────────────────────────────────────────────────────────┘

29.1.3 非功能需求

维度	指标
单篇文章生成时间	< 30 秒（初稿）
内容原创度	> 85%（相似度检测）
敏感词准确率	> 95%
并发支持	50 QPS
支持渠道	微信公众号、知乎、小红书、头条号、抖音

29.2 架构设计

29.2.1 项目结构

content-pipeline/
├── app/
│   ├── main.py                     # FastAPI 入口
│   ├── config.py                   # 配置管理
│   ├── models/                     # 数据模型
│   │   ├── topic.py                # 选题模型
│   │   ├── article.py              # 文章模型
│   │   └── channel.py              # 渠道配置
│   ├── agents/                     # Agent 核心
│   │   ├── topic_agent.py          # 选题策划 Agent
│   │   ├── writing_agent.py        # 内容撰写 Agent
│   │   ├── review_agent.py         # 审校优化 Agent
│   │   └── distribute_agent.py     # 分发适配 Agent
│   ├── services/
│   │   ├── plagiarism_checker.py   # 原创度检测
│   │   ├── sensitive_filter.py     # 敏感词过滤
│   │   └── seo_optimizer.py        # SEO 优化
│   └── utils/
│       └── llm_client.py           # LLM 客户端
├── tests/
└── requirements.txt

29.2.2 核心类设计

系统由四个 Agent 组成，形成完整的内容生产 Pipeline：

TopicAgent：分析热点趋势和用户需求，推荐高价值选题并生成内容大纲
WritingAgent：基于大纲和参考资料撰写内容初稿，支持多种写作风格
ReviewAgent：对初稿进行多维度审校——语法、SEO、敏感词、可读性
DistributeAgent：将审校后的内容适配为不同平台的格式和风格

设计决策：采用流水线 + 迭代反馈模式。ReviewAgent 发现问题后反馈给 WritingAgent 修改，最多迭代 3 次。DistributeAgent 并行适配多渠道。

29.3 核心代码实现

29.3.1 配置与 LLM 客户端

python

# app/config.py
"""内容生成流水线配置"""

from pydantic_settings import BaseSettings
from enum import Enum


class WritingStyle(str, Enum):
    PROFESSIONAL = "professional"      # 专业严谨
    CASUAL = "casual"                  # 轻松活泼
    STORYTELLING = "storytelling"      # 故事叙述
    TUTORIAL = "tutorial"              # 教程指南
    OPINION = "opinion"                # 观点评论


class Channel(str, Enum):
    WECHAT = "wechat"                  # 微信公众号
    ZHIHU = "zhihu"                    # 知乎
    XIAOHONGSHU = "xiaohongshu"        # 小红书
    TOUTIAO = "toutiao"                # 头条号
    DOUYIN = "douyin"                  # 抖音脚本


class Settings(BaseSettings):
    APP_NAME: str = "内容生成流水线"
    APP_VERSION: str = "1.0.0"
    DEBUG: bool = False

    LLM_API_KEY: str = ""
    LLM_BASE_URL: str = "https://api.openai.com/v1"
    LLM_MODEL: str = "gpt-4o"
    LLM_MAX_TOKENS: int = 4096

    # 内容参数
    MAX_REVIEW_ITERATIONS: int = 3
    MIN_ORIGINALITY_SCORE: float = 0.85
    SENSITIVE_WORD_STRICT: bool = True

    # 渠道字数限制
    CHANNEL_LIMITS: dict = {
        "wechat": {"min": 1500, "max": 3000},
        "zhihu": {"min": 1000, "max": 5000},
        "xiaohongshu": {"min": 300, "max": 1000},
        "toutiao": {"min": 800, "max": 2500},
        "douyin": {"min": 100, "max": 500},
    }

    class Config:
        env_file = ".env"
        env_prefix = "CP_"


settings = Settings()

python

# app/utils/llm_client.py
"""LLM 客户端封装（同前章，略作简化）"""

import json
from typing import Optional, List, Dict
from openai import OpenAI
from app.config import settings


class LLMClient:
    _instance: Optional['LLMClient'] = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
            cls._instance._client = OpenAI(
                api_key=settings.LLM_API_KEY,
                base_url=settings.LLM_BASE_URL)
        return cls._instance

    async def chat(self, messages, system_prompt=None,
                   temperature=0.7, max_tokens=None) -> str:
        full = []
        if system_prompt:
            full.append({"role": "system", "content": system_prompt})
        full.extend(messages)
        resp = self._client.chat.completions.create(
            model=settings.LLM_MODEL, messages=full,
            temperature=temperature,
            max_tokens=max_tokens or settings.LLM_MAX_TOKENS)
        return resp.choices[0].message.content

    async def chat_json(self, messages, system_prompt=None) -> dict:
        content = await self.chat(messages, system_prompt,
                                  temperature=0.1,
                                  response_format={"type": "json_object"})
        return json.loads(content)


llm_client = LLMClient()

29.3.2 选题策划 Agent

python

# app/agents/topic_agent.py
"""选题策划 Agent"""

import json
from dataclasses import dataclass, field
from typing import List, Dict, Optional
from datetime import datetime
from app.utils.llm_client import llm_client


@dataclass
class TopicSuggestion:
    title: str
    description: str
    target_audience: str
    keywords: List[str]
    estimated_read_time: int  # 分钟
    trending_score: float     # 热度评分 0-100
    competition_level: str    # low/medium/high
    suggested_channels: List[str]
    outline: List[str]        # 建议大纲


@dataclass
class TopicAnalysis:
    suggestions: List[TopicSuggestion]
    trending_keywords: List[str]
    content_gaps: List[str]
    summary: str


class TopicAgent:
    SYSTEM_PROMPT = """你是一个资深内容策划专家，擅长选题分析和内容规划。

你的任务：
1. 分析给定的领域和关键词，推荐 5 个高价值选题
2. 为每个选题生成详细的内容大纲
3. 评估选题的热度、竞争度和目标受众
4. 推荐适合的分发渠道

返回 JSON：
{{
  "suggestions": [
    {{
      "title": "文章标题",
      "description": "一句话描述",
      "target_audience": "目标受众",
      "keywords": ["关键词1", "关键词2"],
      "estimated_read_time": 8,
      "trending_score": 85,
      "competition_level": "medium",
      "suggested_channels": ["wechat", "zhihu"],
      "outline": ["一、引言：...", "二、核心观点：...", ...]
    }}
  ],
  "trending_keywords": ["趋势关键词"],
  "content_gaps": ["竞品尚未覆盖的内容方向"],
  "summary": "选题分析总结"
}}"""

    async def analyze(
        self,
        domain: str,
        keywords: Optional[List[str]] = None,
        target_channels: Optional[List[str]] = None,
        count: int = 5,
    ) -> TopicAnalysis:
        """分析领域并推荐选题"""
        kw_text = f"，关键词：{', '.join(keywords)}" if keywords else ""
        ch_text = (f"，目标渠道：{', '.join(target_channels)}"
                   if target_channels else "")

        messages = [{
            "role": "user",
            "content": (f"请为以下领域推荐 {count} 个选题：\n"
                        f"领域：{domain}{kw_text}{ch_text}\n"
                        f"当前日期：{datetime.now().strftime('%Y-%m-%d')}")
        }]

        try:
            result = await llm_client.chat_json(
                messages=messages, system_prompt=self.SYSTEM_PROMPT)
            suggestions = []
            for s in result.get("suggestions", []):
                suggestions.append(TopicSuggestion(
                    title=s.get("title", ""),
                    description=s.get("description", ""),
                    target_audience=s.get("target_audience", ""),
                    keywords=s.get("keywords", []),
                    estimated_read_time=s.get("estimated_read_time", 5),
                    trending_score=float(s.get("trending_score", 50)),
                    competition_level=s.get("competition_level", "medium"),
                    suggested_channels=s.get("suggested_channels", []),
                    outline=s.get("outline", []),
                ))
            return TopicAnalysis(
                suggestions=suggestions,
                trending_keywords=result.get("trending_keywords", []),
                content_gaps=result.get("content_gaps", []),
                summary=result.get("summary", ""),
            )
        except Exception as e:
            return TopicAnalysis(
                suggestions=[], trending_keywords=[],
                content_gaps=[], summary=f"分析失败: {str(e)}")

    async def refine_outline(
        self, topic: TopicSuggestion, feedback: str,
    ) -> TopicSuggestion:
        """根据反馈优化大纲"""
        messages = [
            {"role": "user", "content": f"选题: {topic.title}\n"
                                        f"当前大纲: {topic.outline}"},
            {"role": "user", "content": f"优化反馈: {feedback}\n"
                                        "请优化大纲结构。"},
        ]
        try:
            result = await llm_client.chat_json(
                messages=messages, system_prompt=self.SYSTEM_PROMPT)
            if result.get("suggestions"):
                s = result["suggestions"][0]
                topic.outline = s.get("outline", topic.outline)
                topic.description = s.get("description", topic.description)
        except Exception:
            pass
        return topic

29.3.3 内容撰写 Agent

python

# app/agents/writing_agent.py
"""内容撰写 Agent"""

from dataclasses import dataclass, field
from typing import List, Dict, Optional
from app.utils.llm_client import llm_client
from app.agents.topic_agent import TopicSuggestion
from app.config import WritingStyle


@dataclass
class ArticleDraft:
    title: str
    subtitle: str = ""
    content: str = ""
    sections: List[Dict[str, str]] = field(default_factory=list)
    # sections: [{"heading": "...", "body": "..."}]
    word_count: int = 0
    style: str = "professional"
    target_channel: str = "wechat"
    references: List[str] = field(default_factory=list)


STYLE_GUIDES = {
    WritingStyle.PROFESSIONAL: {
        "tone": "专业、严谨、有深度",
        "structure": "提出问题→分析原因→给出方案→总结展望",
        "avoid": ["口语化表达", "过度感叹号", "网络流行语"],
    },
    WritingStyle.CASUAL: {
        "tone": "轻松、有趣、接地气",
        "structure": "吸引眼球开头→轻松展开→互动结尾",
        "avoid": ["过于学术的术语", "长篇大论", "干巴巴的数据"],
    },
    WritingStyle.STORYTELLING: {
        "tone": "故事化、场景化、有画面感",
        "structure": "场景引入→人物/事件→冲突→解决→升华",
        "avoid": ["空洞说教", "枯燥理论", "没有场景"],
    },
    WritingStyle.TUTORIAL: {
        "tone": "清晰、步骤化、实用",
        "structure": "需求场景→环境准备→步骤详解→常见问题",
        "avoid": ["模糊描述", "跳过关键步骤", "缺少代码示例"],
    },
    WritingStyle.OPINION: {
        "tone": "犀利、有观点、引发思考",
        "structure": "热点切入→核心观点→论证→对立观点→结论",
        "avoid": ["模棱两可", "人云亦云", "没有数据支撑"],
    },
}


class WritingAgent:
    def __init__(self):
        pass

    async def write(
        self,
        topic: TopicSuggestion,
        style: WritingStyle = WritingStyle.PROFESSIONAL,
        target_channel: str = "wechat",
        reference_materials: Optional[List[str]] = None,
        max_words: int = 2500,
    ) -> ArticleDraft:
        """根据选题和大纲撰写内容"""
        style_guide = STYLE_GUIDES[style]

        system_prompt = f"""你是一个专业的内容创作者。

写作风格要求：
- 语气：{style_guide['tone']}
- 结构：{style_guide['structure']}
- 避免：{', '.join(style_guide['avoid'])}
- 目标平台：{target_channel}

写作规范：
1. 每个段落控制在 100-200 字
2. 使用小标题分割内容模块
3. 适当使用数据、案例、引用增强说服力
4. 结尾要有行动号召或互动引导
5. 目标字数：{max_words}字左右

返回 JSON：
{{
  "title": "最终标题",
  "subtitle": "副标题",
  "sections": [
    {{"heading": "小标题", "body": "正文内容"}},
    ...
  ],
  "references": ["引用的资料来源"]
}}"""

        # 构建参考资料
        ref_text = ""
        if reference_materials:
            ref_text = "\n\n参考资料：\n" + "\n---\n".join(reference_materials)

        messages = [{
            "role": "user",
            "content": (f"请撰写以下文章：\n\n"
                        f"选题：{topic.title}\n"
                        f"描述：{topic.description}\n"
                        f"目标受众：{topic.target_audience}\n"
                        f"关键词：{', '.join(topic.keywords)}\n"
                        f"大纲：\n" +
                        "\n".join(f"  {i+1}. {item}"
                                 for i, item in enumerate(topic.outline)) +
                        ref_text)
        }]

        try:
            import json
            result = await llm_client.chat_json(
                messages=messages, system_prompt=system_prompt,
                temperature=0.7, max_tokens=4096)

            sections = []
            for s in result.get("sections", []):
                sections.append({
                    "heading": s.get("heading", ""),
                    "body": s.get("body", ""),
                })

            full_content = "\n\n".join(
                f"## {s['heading']}\n\n{s['body']}" if s['heading']
                else s['body']
                for s in sections
            )
            word_count = len(full_content)

            return ArticleDraft(
                title=result.get("title", topic.title),
                subtitle=result.get("subtitle", ""),
                content=full_content,
                sections=sections,
                word_count=word_count,
                style=style.value,
                target_channel=target_channel,
                references=result.get("references", []),
            )
        except Exception as e:
            return ArticleDraft(
                title=topic.title, content=f"撰写失败: {str(e)}")

    async def rewrite_section(
        self, section: Dict[str, str], feedback: str,
        style: WritingStyle = WritingStyle.PROFESSIONAL,
    ) -> Dict[str, str]:
        """根据审校反馈重写某个段落"""
        style_guide = STYLE_GUIDES[style]
        system_prompt = f"""你是一个内容编辑，根据反馈优化段落。
风格要求：{style_guide['tone']}

返回 JSON：{{"heading": "...", "body": "..."}}"""

        messages = [
            {"role": "user", "content": f"原段落标题: {section['heading']}\n"
                                        f"原段落内容: {section['body']}"},
            {"role": "user", "content": f"修改意见: {feedback}"},
        ]

        try:
            import json
            result = await llm_client.chat_json(
                messages=messages, system_prompt=system_prompt,
                temperature=0.7)
            return {
                "heading": result.get("heading", section["heading"]),
                "body": result.get("body", section["body"]),
            }
        except Exception:
            return section

29.3.4 审校优化 Agent

python

# app/agents/review_agent.py
"""审校优化 Agent"""

import re
from dataclasses import dataclass, field
from typing import List, Dict, Optional, Tuple
from app.utils.llm_client import llm_client
from app.agents.writing_agent import ArticleDraft


@dataclass
class ReviewIssue:
    section_index: int
    issue_type: str  # grammar/style/seo/sensitive/readability
    severity: str    # info/warning/error
    description: str
    suggestion: str
    auto_fixable: bool = False


@dataclass
class ReviewResult:
    overall_score: float  # 0-100
    issues: List[ReviewIssue]
    readability_score: float
    seo_score: float
    sensitivity_check: Dict[str, List[str]]  # {"pass": [], "fail": []}
    word_count: int
    keyword_density: Dict[str, float]
    summary: str
    needs_rewrite: bool = False


class ReviewAgent:
    """多维度内容审校"""

    # 基础敏感词库（生产环境应使用专业词库）
    SENSITIVE_WORDS = [
        "最", "第一", "首个", "独家", "国家级", "世界级",
        "100%", "绝对", "保证", "包治", "根治",
    ]

    def review(
        self,
        draft: ArticleDraft,
        keywords: Optional[List[str]] = None,
    ) -> ReviewResult:
        """执行多维度审校"""
        issues = []

        # 1. 敏感词检查
        sensitivity = self._check_sensitive(draft.content)
        for word in sensitivity["fail"]:
            issues.append(ReviewIssue(
                section_index=-1, issue_type="sensitive",
                severity="error",
                description=f"发现敏感词: {word}",
                suggestion=f"替换或删除「{word}」",
                auto_fixable=False,
            ))

        # 2. 可读性评分
        readability = self._score_readability(draft.content)

        # 3. SEO 评分
        seo_score, keyword_density = self._score_seo(
            draft.content, draft.title, keywords or [])

        # 4. LLM 深度审校（语法、风格、逻辑）
        llm_issues = self._llm_review(draft)
        issues.extend(llm_issues)

        # 5. 字数检查
        channel_limits = settings.CHANNEL_LIMITS.get(
            draft.target_channel, {"min": 500, "max": 5000})
        if draft.word_count < channel_limits["min"]:
            issues.append(ReviewIssue(
                section_index=-1, issue_type="readability",
                severity="warning",
                description=f"字数不足: {draft.word_count}字，"
                            f"建议最少{channel_limits['min']}字",
                suggestion="增加更多论据和案例",
            ))
        if draft.word_count > channel_limits["max"]:
            issues.append(ReviewIssue(
                section_index=-1, issue_type="readability",
                severity="warning",
                description=f"字数超标: {draft.word_count}字，"
                            f"建议最多{channel_limits['max']}字",
                suggestion="精简内容，删减冗余段落",
            ))

        # 计算总分
        overall = self._calculate_overall(
            readability, seo_score,
            len(sensitivity["fail"]) == 0,
            len([i for i in issues if i.severity == "error"]) == 0)

        needs_rewrite = any(i.severity == "error" for i in issues)

        return ReviewResult(
            overall_score=overall,
            issues=issues,
            readability_score=readability,
            seo_score=seo_score,
            sensitivity_check=sensitivity,
            word_count=draft.word_count,
            keyword_density=keyword_density,
            summary=self._generate_summary(issues, overall),
            needs_rewrite=needs_rewrite,
        )

    def _check_sensitive(self, text: str) -> Dict[str, List[str]]:
        """敏感词检查"""
        found_pass = []
        found_fail = []
        for word in self.SENSITIVE_WORDS:
            if word in text:
                # 广告法极限词需要严格处理
                if word in ["最", "第一", "首个", "100%", "绝对", "国家级"]:
                    found_fail.append(word)
                else:
                    found_pass.append(word)
        return {"pass": found_pass, "fail": found_fail}

    def _score_readability(self, text: str) -> float:
        """可读性评分（0-100）"""
        score = 70.0  # 基础分

        # 段落长度适中（100-300字最佳）
        paragraphs = [p.strip() for p in text.split("\n\n") if p.strip()]
        for p in paragraphs:
            length = len(p)
            if 100 <= length <= 300:
                score += 2
            elif length > 500:
                score -= 5

        # 句子长度适中（10-30字最佳）
        sentences = re.split(r'[。！？；]', text)
        avg_len = sum(len(s) for s in sentences) / max(len(sentences), 1)
        if 10 <= avg_len <= 30:
            score += 10
        elif avg_len > 50:
            score -= 10

        # 小标题分布
        headings = re.findall(r'^##?\s+.+$', text, re.MULTILINE)
        if len(headings) >= 3:
            score += 5

        # 互动元素（提问、感叹）
        if re.search(r'[？?!]', text):
            score += 3

        return min(max(score, 0), 100)

    def _score_seo(
        self, text: str, title: str, keywords: List[str],
    ) -> Tuple[float, Dict[str, float]]:
        """SEO 评分"""
        score = 60.0
        density = {}

        if not keywords:
            return score, density

        total_words = len(text)
        for kw in keywords:
            count = text.count(kw)
            d = count / max(total_words, 1) * 100
            density[kw] = round(d, 2)
            # 理想关键词密度 1%-3%
            if 1.0 <= d <= 3.0:
                score += 8
            elif d > 5.0:
                score -= 5  # 关键词堆砌扣分
            # 标题包含关键词
            if kw in title:
                score += 5

        return min(max(score, 0), 100), density

    def _llm_review(self, draft: ArticleDraft) -> List[ReviewIssue]:
        """LLM 深度审校"""
        # 逐段审校（限制段落数避免 Token 过多）
        issues = []
        for i, section in enumerate(draft.sections[:5]):
            try:
                import json
                result = self._review_section(section)
                issues.append(ReviewIssue(
                    section_index=i,
                    issue_type=result.get("type", "style"),
                    severity=result.get("severity", "info"),
                    description=result.get("description", ""),
                    suggestion=result.get("suggestion", ""),
                    auto_fixable=result.get("auto_fixable", False),
                ))
            except Exception:
                pass
        return issues

    def _review_section(self, section: Dict) -> Dict:
        """审校单个段落"""
        import json
        import asyncio
        system_prompt = """审校以下段落，检查：
1. 语法错误
2. 逻辑不通顺
3. 表达不清晰
4. 事实性错误风险

返回 JSON：
{"type": "grammar|style|logic", "severity": "info|warning|error",
 "description": "问题描述", "suggestion": "修改建议",
 "auto_fixable": false}
如果没有问题返回：
{"type": "pass", "severity": "info",
 "description": "无问题", "suggestion": "", "auto_fixable": false}"""

        # 简化处理，实际可用 async
        result = llm_client._client.chat.completions.create(
            model=settings.LLM_MODEL,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user",
                 "content": f"标题: {section['heading']}\n"
                            f"内容: {section['body'][:500]}"},
            ],
            temperature=0.1,
            max_tokens=500,
        )
        return json.loads(result.choices[0].message.content)

    def _calculate_overall(
        self, readability, seo, sensitive_pass, no_errors,
    ) -> float:
        """计算综合评分"""
        score = readability * 0.3 + seo * 0.3 + 40
        if not sensitive_pass:
            score -= 20
        if not no_errors:
            score -= 10
        return min(max(round(score, 1), 0), 100)

    def _generate_summary(
        self, issues: List[ReviewIssue], score: float,
    ) -> str:
        errors = [i for i in issues if i.severity == "error"]
        warnings = [i for i in issues if i.severity == "warning"]
        parts = [f"综合评分: {score}/100"]
        if errors:
            parts.append(f"❌ {len(errors)} 个严重问题需要修改")
        if warnings:
            parts.append(f"⚠️ {len(warnings)} 个建议优化项")
        if not errors and not warnings:
            parts.append("✅ 审校通过，质量良好")
        return "；".join(parts)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259

29.3.5 分发适配 Agent

python

# app/agents/distribute_agent.py
"""多渠道分发适配 Agent"""

from dataclasses import dataclass, field
from typing import List, Dict, Optional
from app.utils.llm_client import llm_client
from app.agents.writing_agent import ArticleDraft
from app.config import Channel


@dataclass
class ChannelContent:
    channel: str
    title: str
    content: str
    tags: List[str]
    word_count: int
    adaptation_notes: str


CHANNEL_STYLES = {
    Channel.WECHAT: {
        "name": "微信公众号",
        "format": "Markdown → 排版",
        "title_style": "吸引眼球但不过标题党，20字以内",
        "content_style": "分段清晰、图文并茂、引用用引用框",
        "tone": "专业但有温度",
        "max_length": 3000,
        "features": ["文末引导关注", "阅读原文链接", "分享提示"],
    },
    Channel.ZHIHU: {
        "name": "知乎",
        "format": "Markdown",
        "title_style": "提问式或观点式，体现专业性",
        "content_style": "逻辑严密、数据支撑、引用来源",
        "tone": "理性、深度、有干货",
        "max_length": 5000,
        "features": ["开头直击痛点", "结构化论证", "评论区互动"],
    },
    Channel.XIAOHONGSHU: {
        "name": "小红书",
        "format": "短文 + Emoji",
        "title_style": "20字以内，含关键词和Emoji",
        "content_style": "要点式、多用Emoji、话题标签",
        "tone": "种草感、闺蜜式分享",
        "max_length": 1000,
        "features": ["吸睛首图", "话题标签 #", "互动引导"],
    },
    Channel.TOUTIAO: {
        "name": "头条号",
        "format": "富文本",
        "title_style": "信息量大、含数字和关键词",
        "content_style": "新闻式开头、结构化内容",
        "tone": "客观、信息量大",
        "max_length": 2500,
        "features": ["三段式标题", "摘要精炼", "标签分类"],
    },
    Channel.DOUYIN: {
        "name": "抖音脚本",
        "format": "视频脚本",
        "title_style": "口语化、有悬念",
        "content_style": "口语化表达、场景描述、时间标注",
        "tone": "口语化、快节奏、有画面感",
        "max_length": 500,
        "features": ["前3秒吸引", "节奏标注", "BGM建议"],
    },
}


class DistributeAgent:
    SYSTEM_PROMPT = """你是一个内容分发专家，擅长将一篇文章适配为不同平台的风格。

规则：
1. 保留核心观点和信息
2. 调整标题、结构和语气以匹配目标平台
3. 添加平台特有的元素（话题标签、引导语等）
4. 控制字数在目标范围内

返回 JSON：
{{
  "title": "适配后的标题",
  "content": "适配后的正文",
  "tags": ["标签1", "标签2"],
  "adaptation_notes": "适配说明"
}}"""

    async def adapt(
        self, draft: ArticleDraft, channels: List[str],
    ) -> List[ChannelContent]:
        """并行适配多渠道内容"""
        import asyncio
        tasks = [
            self._adapt_single(draft, ch) for ch in channels
        ]
        results = await asyncio.gather(*tasks, return_exceptions=True)
        return [r for r in results if isinstance(r, ChannelContent)]

    async def _adapt_single(
        self, draft: ArticleDraft, channel: str,
    ) -> ChannelContent:
        """适配单个渠道"""
        style = CHANNEL_STYLES.get(Channel(channel), CHANNEL_STYLES[Channel.WECHAT])

        messages = [{
            "role": "user",
            "content": (f"请将以下文章适配为【{style['name']}】平台：\n\n"
                        f"原标题: {draft.title}\n"
                        f"原内容: {draft.content[:2000]}\n\n"
                        f"要求:\n"
                        f"- 标题风格: {style['title_style']}\n"
                        f"- 内容风格: {style['content_style']}\n"
                        f"- 语气: {style['tone']}\n"
                        f"- 最大字数: {style['max_length']}\n"
                        f"- 特色: {', '.join(style['features'])}")
        }]

        try:
            import json
            result = await llm_client.chat_json(
                messages=messages, system_prompt=self.SYSTEM_PROMPT,
                temperature=0.6)
            content = result.get("content", "")
            return ChannelContent(
                channel=channel,
                title=result.get("title", draft.title),
                content=content,
                tags=result.get("tags", []),
                word_count=len(content),
                adaptation_notes=result.get("adaptation_notes", ""),
            )
        except Exception as e:
            return ChannelContent(
                channel=channel, title=draft.title,
                content=draft.content, tags=[],
                word_count=draft.word_count,
                adaptation_notes=f"适配失败: {str(e)}")

29.3.6 敏感词过滤服务

python

# app/services/sensitive_filter.py
"""敏感词过滤服务"""

import re
from dataclasses import dataclass, field
from typing import List, Dict, Tuple


# 广告法极限词（基础版）
AD_LIMIT_WORDS = [
    "最", "最佳", "最好", "最优", "最强", "最先进",
    "第一", "首个", "首款", "唯一", "独一无二",
    "国家级", "世界级", "顶级", "极致", "绝无仅有",
    "100%", "绝对", "保证治愈", "包治百病", "根治",
    "永久", "万能", "特效", "高效", "超效",
]


# 政治敏感词（基础版，生产环境需要完整词库）
POLITICAL_WORDS = [
    # 生产环境请使用专业敏感词库
]


@dataclass
class FilterResult:
    is_clean: bool
    found_words: List[Tuple[str, int]]  # (word, position)
    suggestions: List[str]
    cleaned_text: str = ""


class SensitiveFilter:
    def __init__(self, extra_words: List[str] = None):
        self._words = set(AD_LIMIT_WORDS + POLITICAL_WORDS)
        if extra_words:
            self._words.update(extra_words)
        # 按长度降序排列，优先匹配长词
        self._sorted_words = sorted(self._words, key=len, reverse=True)

    def check(self, text: str) -> FilterResult:
        """检查文本中的敏感词"""
        found = []
        for word in self._sorted_words:
            idx = 0
            while True:
                pos = text.find(word, idx)
                if pos == -1:
                    break
                found.append((word, pos))
                idx = pos + len(word)

        # 生成替换建议
        suggestions = []
        replacements = {
            "最": "非常", "最佳": "优秀", "第一": "领先",
            "100%": "极高", "绝对": "非常",
        }
        for word, pos in found:
            replacement = replacements.get(word, f"**{word[0]}{'*'*(len(word)-1)}")
            suggestions.append(f"位置{pos}: 「{word}」→ 建议替换为「{replacement}」")

        # 自动替换
        cleaned = text
        for word, _ in found:
            replacement = replacements.get(word, "")
            if replacement:
                cleaned = cleaned.replace(word, replacement)

        return FilterResult(
            is_clean=len(found) == 0,
            found_words=found,
            suggestions=suggestions,
            cleaned_text=cleaned if found else text,
        )

    def check_title(self, title: str) -> FilterResult:
        """标题敏感词检查（更严格）"""
        return self.check(title)

29.3.7 FastAPI 入口

python

# app/main.py
"""内容生成流水线 - FastAPI 入口"""

from contextlib import asynccontextmanager
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
from typing import Optional, List

from app.config import settings, WritingStyle
from app.agents.topic_agent import TopicAgent
from app.agents.writing_agent import WritingAgent
from app.agents.review_agent import ReviewAgent
from app.agents.distribute_agent import DistributeAgent
from app.services.sensitive_filter import SensitiveFilter

topic_agent = TopicAgent()
writing_agent = WritingAgent()
review_agent = ReviewAgent()
distribute_agent = DistributeAgent()
sensitive_filter = SensitiveFilter()


@asynccontextmanager
async def lifespan(app: FastAPI):
    print(f"🚀 {settings.APP_NAME} v{settings.APP_VERSION} 启动")
    yield


app = FastAPI(
    title=settings.APP_NAME, version=settings.APP_VERSION,
    lifespan=lifespan,
    description="AI 驱动的内容生成流水线")
app.add_middleware(CORSMiddleware, allow_origins=["*"],
                   allow_credentials=True,
                   allow_methods=["*"], allow_headers=["*"])


class TopicRequest(BaseModel):
    domain: str
    keywords: Optional[List[str]] = None
    channels: Optional[List[str]] = None
    count: int = 5


class WriteRequest(BaseModel):
    topic_title: str
    topic_description: str
    topic_keywords: List[str]
    outline: List[str]
    target_audience: str = "通用"
    style: str = "professional"
    channel: str = "wechat"
    max_words: int = 2500


class PipelineRequest(BaseModel):
    domain: str
    keywords: Optional[List[str]] = None
    style: str = "professional"
    channels: List[str] = ["wechat", "zhihu"]
    max_words: int = 2500


@app.get("/health")
async def health():
    return {"status": "ok", "version": settings.APP_VERSION}


@app.post("/api/v1/topics")
async def analyze_topics(req: TopicRequest):
    """分析领域并推荐选题"""
    result = await topic_agent.analyze(
        domain=req.domain,
        keywords=req.keywords,
        target_channels=req.channels,
        count=req.count,
    )
    return {
        "suggestions": [
            {"title": s.title, "description": s.description,
             "trending_score": s.trending_score,
             "competition_level": s.competition_level,
             "keywords": s.keywords,
             "suggested_channels": s.suggested_channels,
             "outline": s.outline}
            for s in result.suggestions
        ],
        "trending_keywords": result.trending_keywords,
        "content_gaps": result.content_gaps,
        "summary": result.summary,
    }


@app.post("/api/v1/write")
async def write_article(req: WriteRequest):
    """根据选题撰写文章"""
    from app.agents.topic_agent import TopicSuggestion
    topic = TopicSuggestion(
        title=req.topic_title,
        description=req.topic_description,
        target_audience=req.target_audience,
        keywords=req.topic_keywords,
        outline=req.outline,
    )
    style = WritingStyle(req.style)
    draft = await writing_agent.write(
        topic=topic, style=style,
        target_channel=req.channel,
        max_words=req.max_words,
    )
    return {
        "title": draft.title,
        "subtitle": draft.subtitle,
        "content": draft.content,
        "sections": draft.sections,
        "word_count": draft.word_count,
        "style": draft.style,
    }


@app.post("/api/v1/review")
async def review_article(
    content: str,
    title: str = "",
    keywords: Optional[List[str]] = None,
):
    """审校文章"""
    draft = ArticleDraft(title=title, content=content,
                         word_count=len(content))
    result = review_agent.review(draft, keywords)
    return {
        "overall_score": result.overall_score,
        "readability_score": result.readability_score,
        "seo_score": result.seo_score,
        "needs_rewrite": result.needs_rewrite,
        "issues": [
            {"section": i.issue_type,
             "severity": i.severity,
             "description": i.description,
             "suggestion": i.suggestion}
            for i in result.issues
        ],
        "sensitive_check": result.sensitivity_check,
        "summary": result.summary,
    }


@app.post("/api/v1/pipeline")
async def run_pipeline(req: PipelineRequest):
    """完整流水线：选题 → 撰写 → 审校 → 分发"""
    # 1. 选题分析
    topic_result = await topic_agent.analyze(
        domain=req.domain,
        keywords=req.keywords,
        target_channels=req.channels,
        count=1,
    )
    if not topic_result.suggestions:
        raise HTTPException(400, detail="未能生成合适选题")

    topic = topic_result.suggestions[0]

    # 2. 撰写初稿
    style = WritingStyle(req.style)
    draft = await writing_agent.write(
        topic=topic, style=style,
        target_channel=req.channels[0],
        max_words=req.max_words,
    )

    # 3. 审校
    review = review_agent.review(draft, topic.keywords)

    # 4. 迭代修改（如有严重问题）
    if review.needs_rewrite and settings.MAX_REVIEW_ITERATIONS > 0:
        errors = [i for i in review.issues if i.severity == "error"]
        for issue in errors[:2]:  # 最多修2个问题
            if 0 <= issue.section_index < len(draft.sections):
                draft.sections[issue.section_index] = \
                    await writing_agent.rewrite_section(
                        draft.sections[issue.section_index],
                        issue.suggestion, style)

    # 5. 多渠道适配
    adapted = await distribute_agent.adapt(draft, req.channels)

    return {
        "topic": {
            "title": topic.title,
            "description": topic.description,
            "trending_score": topic.trending_score,
        },
        "draft": {
            "title": draft.title,
            "content": draft.content[:500] + "...",
            "word_count": draft.word_count,
        },
        "review": {
            "overall_score": review.overall_score,
            "readability": review.readability_score,
            "seo": review.seo_score,
            "summary": review.summary,
        },
        "distributed": [
            {"channel": a.channel, "title": a.title,
             "word_count": a.word_count, "tags": a.tags}
            for a in adapted
        ],
    }


if __name__ == "__main__":
    import uvicorn
    uvicorn.run("app.main:app", host="0.0.0.0", port=8000,
                reload=settings.DEBUG)

29.4 测试

29.4.1 选题分析测试

python

# tests/test_topic.py
"""选题分析测试"""

import pytest
from app.agents.topic_agent import TopicAgent


@pytest.mark.asyncio
async def test_topic_analysis():
    agent = TopicAgent()
    # 需要配置 LLM API_KEY
    # result = await agent.analyze("AI技术", ["大模型", "Agent"])
    # assert len(result.suggestions) > 0
    # assert result.suggestions[0].title
    pass

29.4.2 敏感词过滤测试

python

# tests/test_sensitive.py
"""敏感词过滤测试"""

from app.services.sensitive_filter import SensitiveFilter


def test_ad_limit_words():
    sf = SensitiveFilter()
    text = "这是全国最好用的产品，100%有效，绝对保证"
    result = sf.check(text)
    assert not result.is_clean
    assert len(result.found_words) >= 3


def test_clean_text():
    sf = SensitiveFilter()
    text = "这是一篇关于技术分享的文章"
    result = sf.check(text)
    assert result.is_clean
    assert len(result.found_words) == 0


def test_title_check():
    sf = SensitiveFilter()
    title = "全球第一款AI写作工具"
    result = sf.check_title(title)
    assert not result.is_clean

29.4.3 可读性评分测试

python

# tests/test_readability.py
"""可读性评分测试"""

from app.agents.review_agent import ReviewAgent
from app.agents.writing_agent import ArticleDraft


def test_good_readability():
    agent = ReviewAgent()
    text = ("## 引言\n\nAI技术的发展日新月异。\n\n"
            "你知道Agent是什么吗？它可能改变我们的工作方式。\n\n"
            "## 核心观点\n\n首先，Agent具备自主决策能力。\n"
            "其次，Agent可以调用外部工具。\n"
            "最后，Agent能够从反馈中学习。")
    draft = ArticleDraft(title="test", content=text, word_count=len(text))
    score = agent._score_readability(text)
    assert score > 60


def test_long_paragraph_penalty():
    agent = ReviewAgent()
    text = "这是一段非常长的文字" * 100
    score = agent._score_readability(text)
    assert score < 70

29.5 部署

dockerfile

FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app/ ./app/
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

yaml

# docker-compose.yml
version: '3.8'
services:
  content-pipeline:
    build: .
    ports: ["8000:8000"]
    environment:
      - CP_LLM_API_KEY=${LLM_API_KEY}

29.6 经验总结

29.6.1 踩坑记录

坑1：AI 生成内容的"AI 味"

早期生成的文章有明显的人工智能痕迹——大量"首先...其次...最后..."、过于工整的排比、缺乏个人观点。解决方案是注入风格 Persona：在 Prompt 中定义明确的写作人格（语气、偏好用词、避讳表达），并在 Few-Shot 中提供同风格范文。对于品牌内容，还需注入品牌调性词库。

坑2：多渠道适配的质量不一致

小红书需要大量 Emoji 和口语化表达，知乎需要学术严谨，抖音需要视觉化脚本——用同一个 Prompt 适配所有平台效果很差。我们为每个渠道编写了专属适配 Prompt 模板，包含平台特有的格式规范和风格指南。

坑3：审校闭环的 Token 成本

审校发现→反馈→重写→再审校的循环可能消耗大量 Token。我们将审校分级处理：语法和格式问题用规则引擎直接修正（不消耗 LLM），只有逻辑和风格问题才交由 LLM 处理，成本降低 60%。

坑4：敏感词误判

"最好"在"最好的实践"语境中是正常的，但在广告语境中违规。解决方案是上下文感知过滤：结合文案类型（广告/教程/观点）和语境判断是否为违规使用，而非简单的字符串匹配。

29.6.2 性能优化经验

并行审校：语法检查、SEO 分析、敏感词过滤三路并行执行
分段审校：不审校全文，而是逐段审校，发现 3 个问题后提前终止
流式输出：文章生成使用流式输出，用户可实时看到生成进度
缓存机制：相同选题的大纲可复用，避免重复生成

29.6.3 关键设计模式总结

模式	应用场景	效果
Style Persona	内容风格控制	AI味降低 80%
规则 + LLM 混合审校	内容质量控制	成本降低 60%
Pipeline + 反馈循环	内容质量迭代	一次通过率 70%→90%
渠道模板隔离	多渠道适配	各平台质量达标率 95%

29.6.4 未来演进方向

品牌知识库：为每个品牌构建专属风格库和术语库
A/B 测试集成：自动生成多个标题版本，通过 A/B 测试选择最优
内容日历：基于历史数据分析最佳发布时间和频率
多模态内容：从图文扩展到短视频脚本、播客大纲、PPT 演讲稿

本章小结：内容生成流水线展示了 Agent 在内容生产领域的强大能力。核心在于风格可控的生成（Style Persona）和质量闭环（审校→反馈→重写）。通过 Pipeline 模式串联选题、撰写、审校、分发四个环节，并在每个环节注入专业规则和 LLM 能力，可以将内容生产效率提升 10 倍以上，同时保证质量的一致性。

第29章：内容生成流水线 ​

29.1 需求分析与功能规划 ​

29.1.1 业务背景 ​

29.1.2 功能清单 ​

29.1.3 非功能需求 ​

29.2 架构设计 ​

29.2.1 项目结构 ​

29.2.2 核心类设计 ​

29.3 核心代码实现 ​

29.3.1 配置与 LLM 客户端 ​

29.3.2 选题策划 Agent ​

29.3.3 内容撰写 Agent ​

29.3.4 审校优化 Agent ​

29.3.5 分发适配 Agent ​

29.3.6 敏感词过滤服务 ​

29.3.7 FastAPI 入口 ​

29.4 测试 ​

29.4.1 选题分析测试 ​

29.4.2 敏感词过滤测试 ​

29.4.3 可读性评分测试 ​

29.5 部署 ​

29.6 经验总结 ​

29.6.1 踩坑记录 ​

29.6.2 性能优化经验 ​

29.6.3 关键设计模式总结 ​

29.6.4 未来演进方向 ​

第29章：内容生成流水线

29.1 需求分析与功能规划

29.1.1 业务背景

29.1.2 功能清单

29.1.3 非功能需求

29.2 架构设计

29.2.1 项目结构

29.2.2 核心类设计

29.3 核心代码实现

29.3.1 配置与 LLM 客户端

29.3.2 选题策划 Agent

29.3.3 内容撰写 Agent

29.3.4 审校优化 Agent

29.3.5 分发适配 Agent

29.3.6 敏感词过滤服务

29.3.7 FastAPI 入口

29.4 测试

29.4.1 选题分析测试

29.4.2 敏感词过滤测试

29.4.3 可读性评分测试

29.5 部署

29.6 经验总结

29.6.1 踩坑记录

29.6.2 性能优化经验

29.6.3 关键设计模式总结

29.6.4 未来演进方向