Transformer#
Num |
Title |
Field |
Desc |
Author |
Time |
read |
---|---|---|---|---|---|---|
Attention Is All You Need |
Transformer |
|||||
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale |
ViT |
|||||
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows |
Swin-Transformer |
|||||
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions |
Pyramid Vision Transformer |
|||||
Transformer in Transformer |
||||||
Conformer: Convolution-augmented Transformer for Speech Recognition |
Conformer |
|||||
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification |
CrossViT |
|||||
Reformer: The Efficient Transformer |
Reformer |
|||||
Pre-Trained Image Processing Transformer |
||||||
Synthesizer: Rethinking Self-Attention in Transformer Models |
Synthesizer |
|||||
LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference |
LeViT |
|||||
EfficientFormer: Vision Transformers at MobileNet Speed |
EfficientFormer |
|||||
X-Transformer: A Green Self-attention Based Machine Translation Model |
X-Transformer |
|||||
Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization |
Mix-ViT |
|||||
METER: a mobile vision transformer architecture for monocular depth estimation |
METER |
|||||