# Overview and Configuration¶

Large pre-trained Transformer-based language models (LMs) have become the foundation of NLP in recent years. While the most prevalent method of using these LMs for transfer learning involves costly full fine-tuning of all model parameters, a series of efficient and lightweight alternatives have been established in recent time. Instead of updating all parameters of the pre-trained LM towards a downstream target task, these methods commonly introduce a small amount of new parameters and only update these while keeping the pre-trained model weights fixed.

Why use Efficient Fine-Tuning?

Efficient fine-tuning methods offer multiple benefits over full fine-tuning of LMs:

• They are parameter-efficient, i.e. they only update a very small subset (often under 1%) of a model’s parameters.

• They often are modular, i.e. the updated parameters can be extracted and shared independently of the base model parameters.

• They are easy to share and easy to deploy due to their small file sizes, e.g. having only ~3MB per task instead of ~440MB for sharing a full model.

• They speed up training, i.e. efficient fine-tuning often needs less time for training compared fully fine-tuning LMs.

• They are composable, e.g. multiple adapters trained on different tasks can be stacked, fused or mixed to leverage their combined knowledge.

• They often provide on-par performance with full fine-tuning.

More specifically, let the parameters of a LM be composed of a set of pre-trained parameters $$\Theta$$ (frozen) and a set of (newly introduced) parameters $$\Phi$$. Then, efficient fine-tuning methods optimize only $$\Phi$$ according to a loss function $$L$$ on a dataset $$D$$:

$\Phi^* \leftarrow \arg \min_{\Phi} L(D; \{\Theta, \Phi\})$

Efficient fine-tuning might insert parameters $$\Phi$$ at different locations of a Transformer-based LM. One early and successful method, (bottleneck) adapters, introduces bottleneck feed-forward layers in each layer of a Transformer model. While these adapters have laid the foundation of the adapter-transformers library, multiple alternative methods have been introduced and integrated since.

Important

In literature, different terms are used to refer to efficient fine-tuning methods. The term “adapter” is usually only applied to bottleneck adapter modules. However, most efficient fine-tuning methods follow the same general idea of inserting a small set of new parameters and by this “adapting” the pre-trained LM to a new task. In adapter-transformers, the term “adapter” thus may refer to any efficient fine-tuning method if not specified otherwise.

In the remaining sections, we will present how adapter methods can be configured in adapter-transformers. The next two pages will then present the methodological details of all currently supported adapter methods.

The following table gives an overview of all adapter methods supported by adapter-transformers. Identifiers and configuration classes are explained in more detail in the next section.

pfeiffer PfeifferConfig() Bottleneck Adapters
houlsby HoulsbyConfig() Bottleneck Adapters
parallel ParallelConfig() Bottleneck Adapters
scaled_parallel ParallelConfig(scaling="learned") Bottleneck Adapters
pfeiffer+inv PfeifferInvConfig() Invertible Adapters
houlsby+inv HoulsbyInvConfig() Invertible Adapters
compacter CompacterConfig() Compacter
compacter++ CompacterPlusPlusConfig() Compacter
prefix_tuning PrefixTuningConfig() Prefix Tuning
prefix_tuning_flat PrefixTuningConfig(flat=True) Prefix Tuning
lora LoRAConfig() LoRA
ia3 IA3Config() IA³
mam MAMConfig() Mix-and-Match Adapters
unipelt UniPELTConfig() UniPELT

## Configuration¶

All supported adapter methods can be added, trained, saved and shared using the same set of model class functions (see class documentation). Each method is specified and configured using a specific configuration class, all of which derive from the common AdapterConfigBase class. E.g., adding one of the supported adapter methods to an existing model instance follows this scheme:

model.add_adapter("name", config=<ADAPTER_CONFIG>)


Here, <ADAPTER_CONFIG> can either be:

• a configuration string, as described below

• an instance of a configuration class, as listed in the table above

• a path to a JSON file containing a configuration dictionary

### Configuration strings¶

Configuration strings are a concise way of defining a specific adapter method configuration. They are especially useful when adapter configurations are passed from external sources such as the command-line, when using configuration classes is not an option.

In general, a configuration string for a single method takes the form <identifier>[<key>=<value>, ...]. Here, <identifier> refers to one of the identifiers listed in the table above, e.g. parallel. In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. parallel[reduction_factor=2]. If all attributes remain at their default values, this can be omitted.

Finally, it is also possible to specify a method combination as a configuration string by joining multiple configuration strings with |. E.g., prefix_tuning[bottleneck_size=800]|parallel is identical to the following configuration class instance:

ConfigUnion(
PrefixTuningConfig(bottleneck_size=800),
ParallelConfig(),
)