Transformer

Contents

Transformer#

Num

Title

Field

Desc

Author

Time

read

Attention Is All You Need

Transformer

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

ViT

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Swin-Transformer

Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

Pyramid Vision Transformer

Transformer in Transformer

Conformer: Convolution-augmented Transformer for Speech Recognition

Conformer

CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT

Reformer: The Efficient Transformer

Reformer

Pre-Trained Image Processing Transformer

Synthesizer: Rethinking Self-Attention in Transformer Models

Synthesizer

LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference

LeViT

EfficientFormer: Vision Transformers at MobileNet Speed

EfficientFormer

X-Transformer: A Green Self-attention Based Machine Translation Model

X-Transformer

Mix-ViT: Mixing attentive vision transformer for ultra-fine-grained visual categorization

Mix-ViT

METER: a mobile vision transformer architecture for monocular depth estimation

METER

#