Vitriol 3D Viewer 已精确支持的主流大模型架构特性对比
| 模型 | 架构类型 | GQA | SWA | MoE | 多模态 | 特殊特性 | 3D Viewer |
|---|---|---|---|---|---|---|---|
| LLaMA 3.1 Meta |
LLaMA GQA | ✓ 32Q/8KV | — | — | — | SwiGLU, RoPE | 查看 |
| Mistral 7B Mistral AI |
GQA + SWA | ✓ 32Q/8KV | ✓ 4096 tokens | — | — | Sliding Window | 查看 |
| Mixtral 8x7B Mistral AI |
SWA + MoE | ✓ 32Q/8KV | ✓ | ✓ 8 experts / top-2 | — | Sparse MoE | 查看 |
| Qwen 2.5 7B Alibaba |
LLaMA GQA | ✓ 28Q/4KV | — | — | — | RoPE extended | 查看 |
| Qwen 3.6-27B Alibaba |
Hybrid Attn | ✓ 24Q/4KV | ~ Linear Attn | — | ✓ Vision 27L | Linear+Full, M-RoPE | 查看 |
| Gemma-2 9B |
Gemma | ✓ 16Q/8KV | ✓ 交替 SWA/Global | — | — | GeGLU, Pre+Post Norm, Softcap | 查看 |
| Phi-3 Mini Microsoft |
Phi-3 | ✓ MQA-like | ~ 2K/4K | — | — | Fused QKV | 查看 |
| DeepSeek-V4-Pro DeepSeek |
MLA + MoE | ~ MLA变体 | — | ✓ 256 experts | — | MLA, Q-LoRA, Hash Attn | 查看 |
| MiMo V2.5-Pro Xiaomi |
GQA + MoE | ✓ | — | ✓ 384 / top-8 | — | Fine-grained MoE | 查看 |
| 模型 | MMLU | GSM8K | HumanEval | MATH | HellaSwag | 相对综合实力 |
|---|