RoPE
LLaMA-2
Rotary Position Embedding
LLaMA
medium
SwiGLU
LLaMA-2
SwiGLU Activation
LLaMA
medium
RoPE
Qwen-7B
Rotary Position Embedding
Qwen
medium
Sliding Window
Mistral-7B
Sliding Window Attention
LLaMA
medium
RoPE
Mistral-7B
Rotary Position Embedding
LLaMA
medium
Rope Scaling
Mistral-Instruct
Extended Context via RoPE Scaling
LLaMA
medium
GLM Embedding
ChatGLM
General Language Model Pretraining
GLM
medium
Multi-Query Attention
ChatGLM2
Multi-Query Attention
GLM
medium
Long Context
ChatGLM2
32K Context
GLM
high
GQA
ChatGLM3
Grouped Query Attention
GLM
high
Self-Extension
ChatGLM3
Extended Context 128K
GLM
low
Long Context
Yi
200K Context Window
01-ai
high
RoPE
Yi
Rotary Position Embedding
01-ai
medium
Long Context
Kimi
128K Context Window
Kimi
high
Textbooks
Phi-1
High-Quality Textbook Data
microsoft
medium
Code Data
Phi-1
Synthetic Code Generation
microsoft
low
Small Scale
Phi-2
2.7B Parameter Efficiency
microsoft
low
FIM
Starcoder
Fill-in-the-Middle
bigcode
medium
Long Context
Starcoder
8K Context
bigcode
high
Billion-scale
Falcon
Web Data Filtering
tiiuae
low
LLM
Falcon
FlashAttention
tiiuae
low
GQA
Falcon-40B
Grouped Query Attention (40B)
tiiuae
high
WKV
RWKV
Weighted Key-Value
RWKV
high
RNN-Transformer
RWKV
RNN-Transformer Hybrid
RWKV
high
Linear Complexity
RWKV
O(n) for Long Context
RWKV
high
Long Context
InternLM
8K-32K Context
internlm
high
Open Weights
InternLM
Fully Open Source
internlm
medium
BaiChuan
Baichuan
Bilingual (ZH/EN)
baichuan-inc
low
Dynamic NTK
Baichuan
Dynamic NTK Scaling
baichuan-inc
medium
2.0
Baichuan2
Improved Training Data
baichuan-inc
low
GQA
Baichuan2
Grouped Query Attention
baichuan-inc
high
Open Source
Skywork
Fully Open Weights
Skywork
low
Long Context
Skywork
4K-16K Context
Skywork
high