Model Overview¶
This page gives an overview of the Transformer models currently supported by adapter-transformers.
The table below further shows which model architectures support which adaptation methods and which features of adapter-transformers.
Note
Each supported model architecture X typically provides a class XAdapterModel for usage with AutoAdapterModel.
Additionally, it is possible to use adapters with the model classes already shipped with HuggingFace Transformers.
E.g., for BERT, this means adapter-transformers provides a BertAdapterModel class, but you can also use BertModel, BertForSequenceClassification etc. together with adapters.
| Model | (Bottleneck) Adapters |
Prefix Tuning |
LoRA | Compacter | Adapter Fusion |
Invertible Adapters |
Parallel block |
|---|---|---|---|---|---|---|---|
| ALBERT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| BART | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| BEIT | ✅ | ✅ | ✅ | ✅ | ✅ | ||
| BERT-Generation | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| BERT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| CLIP | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
| DeBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DeBERTa-v2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| DistilBERT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Encoder Decoder | (*) | (*) | (*) | (*) | (*) | (*) | |
| GPT-2 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| GPT-J | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| MBart | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| T5 | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| ViT | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| XLM-RoBERTa | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
(*) If the used encoder and decoder model class are supported.
Missing a model architecture you’d like to use? adapter-transformers can be easily extended to new model architectures as described in Adding Adapters to a Model. Feel free to open an issue requesting support for a new architecture. We very much welcome pull requests adding new model implementations!