Overview and Configuration
Large pre-trained Transformer-based language models (LMs) have become the foundation of NLP in recent years. While the most prevalent method of using these LMs for transfer learning involves costly full fine-tuning of all model parameters, a series of efficient and lightweight alternatives have recently been established. Instead of updating all parameters of the pre-trained LM towards a downstream target task, these methods commonly introduce a small number of new parameters and only update these while keeping the pre-trained model weights fixed.
Why use Efficient Fine-Tuning?
Efficient fine-tuning methods offer multiple benefits over the full fine-tuning of LMs:
They are parameter-efficient, i.e., they only update a tiny subset (often under 1%) of a model’s parameters.
They often are modular, i.e., the updated parameters can be extracted and shared independently of the base model parameters.
They are easy to share and deploy due to their small file sizes, e.g., having only ~3MB per task instead of ~440MB for sharing a full model.
They speed up training, i.e., efficient fine-tuning often requires less training time than fully fine-tuning LMs.
They are composable, e.g., multiple adapters trained on different tasks can be stacked, fused, or mixed to leverage their combined knowledge.
They often provide on-par performance with full fine-tuning.
More specifically, let the parameters of a LM be composed of a set of pre-trained parameters \(\Theta\) (frozen) and a set of (newly introduced) parameters \(\Phi\). Then, efficient fine-tuning methods optimize only \(\Phi\) according to a loss function \(L\) on a dataset \(D\):
Efficient fine-tuning might insert parameters \(\Phi\) at different locations of a Transformer-based LM.
One early and successful method, (bottleneck) adapters, introduces bottleneck feed-forward layers in each layer of a Transformer model.
While these adapters have laid the foundation of the adapters
library, multiple alternative methods have been introduced and integrated since.
Important
In literature, different terms are used to refer to efficient fine-tuning methods.
The term “adapter” is usually only applied to bottleneck adapter modules.
However, most efficient fine-tuning methods follow the same general idea of inserting a small set of new parameters and, by this, “adapting” the pre-trained LM to a new task.
In adapters
, the term “adapter” thus may refer to any efficient fine-tuning method if not specified otherwise.
In the remaining sections, we will present how adapter methods can be configured in adapters
.
The next two pages will then present the methodological details of all currently supported adapter methods.
Table of Adapter Methods
The following table gives an overview of all adapter methods supported by adapters
.
Identifiers and configuration classes are explained in more detail in the next section.
Identifier | Configuration class | More information |
---|---|---|
seq_bn |
SeqBnConfig() |
Bottleneck Adapters |
double_seq_bn |
DoubleSeqBnConfig() |
Bottleneck Adapters |
par_bn |
ParBnConfig() |
Bottleneck Adapters |
scaled_par_bn |
ParBnConfig(scaling="learned") |
Bottleneck Adapters |
seq_bn_inv |
SeqBnInvConfig() |
Invertible Adapters |
double_seq_bn_inv |
DoubleSeqBnInvConfig() |
Invertible Adapters |
compacter |
CompacterConfig() |
Compacter |
compacter++ |
CompacterPlusPlusConfig() |
Compacter |
prefix_tuning |
PrefixTuningConfig() |
Prefix Tuning |
prefix_tuning_flat |
PrefixTuningConfig(flat=True) |
Prefix Tuning |
lora |
LoRAConfig() |
LoRA |
ia3 |
IA3Config() |
IA³ |
mam |
MAMConfig() |
Mix-and-Match Adapters |
unipelt |
UniPELTConfig() |
UniPELT |
prompt_tuning |
PromptTuningConfig() |
Prompt Tuning |
loreft |
LoReftConfig() |
ReFT |
noreft |
NoReftConfig() |
ReFT |
direft |
DiReftConfig() |
ReFT |
Configuration
All supported adapter methods can be added, trained, saved and shared using the same set of model class functions (see class documentation).
Each method is specified and configured using a specific configuration class, all of which derive from the common AdapterConfig
class.
E.g., adding one of the supported adapter methods to an existing model instance follows this scheme:
model.add_adapter("name", config=<ADAPTER_CONFIG>)
Here, <ADAPTER_CONFIG>
can either be:
a configuration string, as described below
an instance of a configuration class, as listed in the table above
a path to a JSON file containing a configuration dictionary
Configuration strings
Configuration strings are a concise way of defining a specific adapter method configuration. They are especially useful when adapter configurations are passed from external sources such as the command-line, when using configuration classes is not an option.
In general, a configuration string for a single method takes the form <identifier>[<key>=<value>, ...]
.
Here, <identifier>
refers to one of the identifiers listed in the table above, e.g. par_bn
.
In square brackets after the identifier, you can set specific configuration attributes from the respective configuration class, e.g. par_bn[reduction_factor=2]
.
If all attributes remain at their default values, this can be omitted.
Finally, it is also possible to specify a method combination as a configuration string by joining multiple configuration strings with |
, e.g.:
config = "prefix_tuning[bottleneck_size=800]|parallel"
is identical to the following ConfigUnion
:
config = ConfigUnion(
PrefixTuningConfig(bottleneck_size=800),
ParBnConfig(),
)