Introduction to Adapters¶
Adapters have been introduced as a new alternative of fine-tuning language models on a downstream task. Instead of fine-tuning the full model, a small set of newly introduced task-specific parameters is updated during fine-tuning. The rest of the model is kept fix. Adapters provide advantages in terms of size, modularity and composability while often achieving results on-par with full fine-tuning. We will not go into detail about the theoretical background of adapters in the following but refer to some useful literature here:
Parameter-Efficient Transfer Learning for NLP (Houlsby et al., 2019)
Simple, Scalable Adaptation for Neural Machine Translation (Bapna and Firat, 2019)
MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer (Pfeiffer et al., 2020)
AdapterHub: A Framework for Adapting Transformers (Pfeiffer, Rückle, Poth et al., 2020)
AdapterHub aims to support a variety adapter setups. To understand our implementation of adapters, there’s some important terminology to grasp at first: the difference between adapter types and adapter architectures.
The adapter type describes for which purpose an adapter is used whereas the adapter architecture (or adapter configuration) describes from which components the adapter modules in the language model are constructed.
The adapter type defines the purpose of an adapter. Currently, all adapters are categorized as one of the following types:
Task adapter: Task adapters are fine-tuned to learn representations for a specific downstream tasks such as sentiment analysis, question answering etc. Task adapters for NLP were first introduced by Houlsby et al., 2019.
Language adapter: Language adapters are used to learn language-specific transformations. After being trained on a language modeling task, a language adapter can be stacked before a task adapter for training on a downstream task. To perform zero-shot cross-lingual transfer, one language adapter can simply be replaced by another. In terms of architecture, language adapters are largely similar to task adapters, except for an additional invertible adapter layer after the embedding layer. This setup was introduced and is further explained by Pfeiffer et al., 2020.
In adapter-transformers, the distinction between task and language adapters is made with the help of the
The concrete structure of adapter modules and their location in the layers of a Transformer model is specified by a configuration dictionary. This is referred to as the adapter architecture. The currently possible configuration options are visualized in the figure above. When adding a new adapters using the
add_adapter() method, the configuration can be set providing the
config argument. The passed value can be either a plain Python dict containing all keys pictured or a subclass of
For convenience, adapter-transformers has some common architectures built-in:
Furthermore, pre-defined architectures can be loaded from the Hub:
# load "pfeiffer" config from Hub, but replace the reduction factor config = AdapterConfig.load("pfeiffer", reduction_factor=12) # add a new adapter with the loaded config model.add_adapter("dummy", AdapterType.text_task, config=config)
You can also add your own architecture to the Hub.