附录 A · Things to Remember 速查表

把这一页打印贴在显示器旁。每条后括号内为对应章节。

Part I · 理解 LLM 的本质

LLM 输出是概率分布上的一次采样，不是事实查询。（Item 1）
单次调用结果不算证据，n ≥ 30 才算。（Item 2）
用 temperature/top_p/seed 控制采样，而不是用 prompt。（Item 3）
失败案例先打 A/B/C 标签，按 A → B → C 顺序处理；用 prompt 修 A 档是浪费，修 C 档是自欺。（Item 4）
Prompt 修不了训练侧问题。（Item 5）

Part II · 工程根治（A 档）

4 位以上算术必须走工具。（Item 6）
日期/单位/统计交给确定性程序。（Item 7）
代码生成走"生成→执行→反馈"闭环。（Item 8）
私有/时效信息无条件 RAG。（Item 9）
检索质量优先于生成质量。（Item 10）
事实陈述必须 cite，否则视为不可信。（Item 11）
用 reranker 而不是更大 embedding。（Item 12）
用原生 response_format 而非 prompt 求 JSON。（Item 13）
用 Constrained Decoding 锁死字段。（Item 14）
可枚举字段一律用 enum。（Item 15）
事实任务 temperature=0；生成任务 top_p=0.9。（Item 16）
max_tokens + Schema maxLength 双闸防失控长度。（Item 17）
repetition_penalty + no_repeat_ngram 防退化。（Item 18）
≥3 步任务先 plan 再 solve。（Item 19）
LangGraph / 状态机做编排，不让 LLM 全权规划。（Item 20）

Part III · 统计缓解（B 档）

关键信息前置或末置，不放中段。（Item 21）
长文档用 Map-Reduce 而非超长窗口。（Item 22）
检索 top-100 → rerank top-3-5 → 生成。（Item 23）
不让模型评判自己的输出。（Item 24）
≥3 家不同基座做交叉裁判 + 位置交换。（Item 25）
用 length-controlled metric。（Item 26）
用 Self-RAG 让模型自决是否检索。（Item 27）
长事实回答后置 CoVe 自检。（Item 28）
Self-Consistency n=5-10 是甜点。（Item 29）
"再想一遍"必须配新证据。（Item 30）
Prompt 中隐藏用户立场。（Item 31）
数值/算法 CoT 改写为可执行代码。（Item 32）
用 Context-Aware Decoding 强化上下文。（Item 33）
RAG system prompt 必含 "trust context"。（Item 34）
输出 Schema 加 used_chunks 字段。（Item 35）
用 Spotlighting 物理隔离用户输入。（Item 36）
输入 + 输出双层安全过滤。（Item 37）
不可信源进入后撤销副作用工具权限。（Item 38）
长对话每 N 轮重注入关键约束。（Item 39）
上下文超阈值用 summary-then-continue。（Item 40）

Part IV · 识别不可解（C 档）

双向事实显式存两份（反转诅咒）。（Item 41）
≥4 步精确推理走 Tool / 分解。（Item 42）
业务规则避免依赖否定语义。（Item 43）
用 fresh / 私有 holdout 替代经典基准。（Item 44）
模型 version + 日期当配置项。（Item 45）
偏好主张必须配对实验 + 大样本。（Item 46）

Part V · 上线与监控

上线前跑 30 分钟体检套件。（Item 47）
用雷达图替代单一基准分数。（Item 48）
production trace 采样 hallucination 检测。（Item 49）
每月 LiveBench 监测漂移。（Item 50）