Adapter Configuration
Classes representing the architectures of adapter modules and fusion layers.
Single (bottleneck) adapters
- class adapters.AdapterConfig
Base class for all adaptation methods. This class does not define specific configuration keys, but only provides some common helper methods.
- Parameters
architecture (str, optional) – The type of adaptation method defined by the configuration.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
- class adapters.BnConfig(mh_adapter: bool, output_adapter: bool, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping], non_linearity: str, original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
Base class that models the architecture of a bottleneck adapter.
- Parameters
mh_adapter (
bool
) – If True, add adapter modules after the multi-head attention block of each layer.output_adapter (
bool
) – If True, add adapter modules after the output FFN of each layer.reduction_factor (
float
orMapping
) – Either a scalar float (> 0) specifying the reduction factor for all layers or a mapping from layer ID (starting at 0) to values specifying the reduction_factor for individual layers. If not all layers are represented in the mapping a default value should be given e.g. {‘1’: 8, ‘6’: 32, ‘default’: 16}. Specifying a reduction factor < 1 will result in an up-projection layer.non_linearity (
str
) – The activation function to use in the adapter bottleneck.original_ln_before (
bool
, optional) – If True, apply layer pre-trained normalization and residual connection before the adapter modules. Defaults to False. Only applicable ifis_parallel
is False.original_ln_after (
bool
, optional) – If True, apply pre-trained layer normalization and residual connection after the adapter modules. Defaults to True.ln_before (
bool
, optional) – If True, add a new layer normalization before the adapter bottleneck. Defaults to False.ln_after (
bool
, optional) – If True, add a new layer normalization after the adapter bottleneck. Defaults to False.init_weights (
str
, optional) – Initialization method for the weights of the adapter modules. Currently, this can be either “bert” (default) or “mam_adapter” or “houlsby”.is_parallel (
bool
, optional) – If True, apply adapter transformations in parallel. By default (False), sequential application is used.scaling (
float
orstr
, optional) – Scaling factor to use for scaled addition of adapter outputs as done by He et al. (2021). Can be either a constant factor (float), or the string “learned”, in which case the scaling factor is learned, or the string “channel”, in which case we initialize a scaling vector of the channel shape that is then learned. Defaults to 1.0.use_gating (
bool
, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.residual_before_ln (
bool
orstr
, optional) – If True, take the residual connection around the adapter bottleneck before the layer normalization. If set to “post_add”, take the residual connection around the adapter bottleneck after the previous residual connection. Only applicable iforiginal_ln_before
is True.adapter_residual_before_ln (
bool
, optional) – If True, apply the residual connection around the adapter modules before the new layer normalization within the adapter. Only applicable ifln_after
is True andis_parallel
is False.inv_adapter (
str
, optional) – If not None (default), add invertible adapter modules after the model embedding layer. Currently, this can be either “nice” or “glow”.inv_adapter_reduction_factor (
float
, optional) – The reduction to use within the invertible adapter modules. Only applicable ifinv_adapter
is not None.cross_adapter (
bool
, optional) – If True, add adapter modules after the cross attention block of each decoder layer in an encoder-decoder model. Defaults to False.leave_out (
List[int]
, optional) – The IDs of the layers (starting at 0) where NO adapter modules should be added.dropout (
float
, optional) – The dropout rate used in the adapter layer. Defaults to 0.0.phm_layer (
bool
, optional) – If True the down and up projection layers are a PHMLayer. Defaults to Falsephm_dim (
int
, optional) – The dimension of the phm matrix. Only applicable if phm_layer is set to True. Defaults to 4.shared_phm_rule (
bool
, optional) – Whether the phm matrix is shared across all layers. Defaults to Truefactorized_phm_rule (
bool
, optional) – Whether the phm matrix is factorized into a left and right matrix. Defaults to False.learn_phm (
bool
, optional) – Whether the phm matrix should be learned during training. Defaults to True( (factorized_phm_W) – obj:bool, optional): Whether the weights matrix is factorized into a left and right matrix. Defaults to True
shared_W_phm (
bool
, optional) – Whether the weights matrix is shared across all layers. Defaults to False.phm_c_init (
str
, optional) – The initialization function for the weights of the phm matrix. The possible values are [“normal”, “uniform”]. Defaults to normal.phm_init_range (
float
, optional) – std for initializing phm weights if phm_c_init=”normal”. Defaults to 0.0001.hypercomplex_nonlinearity (
str
, optional) – This specifies the distribution to draw the weights in the phm layer from. Defaults to glorot-uniform.phm_rank (
int
, optional) – If the weight matrix is factorized this specifies the rank of the matrix. E.g. the left matrix of the down projection has the shape (phm_dim, _in_feats_per_axis, phm_rank) and the right matrix (phm_dim, phm_rank, _out_feats_per_axis). Defaults to 1phm_bias (
bool
, optional) – If True the down and up projection PHMLayer has a bias term. If phm_layer is False this is ignored. Defaults to Truestochastic_depth (
float
, optional) – This value specifies the probability of the model dropping entire layers during training. This parameter should be only used for vision based tasks involving residual networks.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
- class adapters.SeqBnConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.
- class adapters.SeqBnInvConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = 'nice', inv_adapter_reduction_factor: ~typing.Optional[float] = 2, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.
- class adapters.DoubleSeqBnConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The adapter architecture proposed by Houlsby et al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.
- class adapters.DoubleSeqBnInvConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = 'nice', inv_adapter_reduction_factor: ~typing.Optional[float] = 2, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The adapter architecture proposed by Houlsby et. al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.
- class adapters.ParBnConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 2, non_linearity: str = 'relu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'mam_adapter', is_parallel: bool = True, scaling: ~typing.Union[float, str] = 4.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The parallel adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.
- class adapters.CompacterConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The Compacter architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.
- class adapters.CompacterPlusPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)
The Compacter++ architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.
- class adapters.AdapterPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 96, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = False, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'houlsby', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 'channel', use_gating: bool = False, residual_before_ln: bool = False, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: float = 0.1)
The AdapterPlus config architecture proposed by Jan-Martin O, Steitz and Stefan Roth. See https://arxiv.org/pdf/2406.06820
Please note that some configurations of the adapters parameters original_ln_after, original_ln_before, and residual_before_ln may result in performance issues when training.
- In the general case:
At least one of original_ln_before or original_ln_after should be set to True in order to ensure that the original residual connection from pre-training is preserved.
If original_ln_after is set to False, residual_before_ln must also be set to False to ensure convergence during training.
Prefix Tuning
- class adapters.PrefixTuningConfig(architecture: ~typing.Optional[str] = 'prefix_tuning', encoder_prefix: bool = True, cross_prefix: bool = True, leave_out: ~typing.List[int] = <factory>, flat: bool = False, prefix_length: int = 30, bottleneck_size: int = 512, non_linearity: str = 'tanh', dropout: float = 0.0, use_gating: bool = False, shared_gating: bool = True)
The Prefix Tuning architecture proposed by Li & Liang (2021). See https://arxiv.org/pdf/2101.00190.pdf.
- Parameters
encoder_prefix (bool) – If True, add prefixes to the encoder of an encoder-decoder model.
cross_prefix (bool) – If True, add prefixes to the cross attention of an encoder-decoder model.
flat (bool) – If True, train the prefix parameters directly. Otherwise, reparametrize using a bottleneck MLP.
prefix_length (int) – The length of the prefix.
bottleneck_size (int) – If flat=False, the size of the bottleneck MLP.
non_linearity (str) – If flat=False, the non-linearity used in the bottleneck MLP.
dropout (float) – The dropout rate used in the prefix tuning layer.
leave_out (List[int]) – The IDs of the layers (starting at 0) where NO prefix should be added.
use_gating (
bool
, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.( (shared_gating) – obj:bool, optional): Whether to use a shared gate for the prefixes of all attention matrices. Only applicable if use_gating=True. Defaults to True.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
LoRAConfig
- class adapters.LoRAConfig(architecture: ~typing.Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = False, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 8, alpha: int = 8, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'add', init_weights: str = 'lora', use_gating: bool = False, dtype: ~typing.Optional[str] = None)
The Low-Rank Adaptation (LoRA) architecture proposed by Hu et al. (2021). See https://arxiv.org/pdf/2106.09685.pdf. LoRA adapts a model by reparametrizing the weights of a layer matrix. You can merge the additional weights with the original layer weights using
model.merge_adapter("lora_name")
.- Parameters
selfattn_lora (bool, optional) – If True, add LoRA to the self-attention weights of a model. Defaults to True.
intermediate_lora (bool, optional) – If True, add LoRA to the intermediate MLP weights of a model. Defaults to False.
output_lora (bool, optional) – If True, add LoRA to the output MLP weights of a model. Defaults to False.
leave_out (
List[int]
, optional) – The IDs of the layers (starting at 0) where NO adapter modules should be added.r (int, optional) – The rank of the LoRA layer. Defaults to 8.
alpha (int, optional) – The hyperparameter used for scaling the LoRA reparametrization. Defaults to 8.
dropout (float, optional) – The dropout rate used in the LoRA layer. Defaults to 0.0.
attn_matrices (List[str], optional) – Determines which matrices of the self-attention module to adapt. A list that may contain the strings “q” (query), “k” (key), “v” (value). Defaults to [“q”, “v”].
composition_mode (str, optional) – Defines how the injected weights are composed with the original model weights. Can be either “add” (addition of decomposed matrix, as in LoRA) or “scale” (element-wise multiplication of vector, as in (IA)^3). “scale” can only be used together with r=1. Defaults to “add”.
init_weights (
str
, optional) – Initialization method for the weights of the LoRA modules. Currently, this can be either “lora” (default) or “bert”.use_gating (
bool
, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False. Note that modules with use_gating=True cannot be merged using merge_adapter().dtype (str, optional) – torch dtype for reparametrization tensors. Defaults to None.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
IA3Config
- class adapters.IA3Config(architecture: ~typing.Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = True, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 1, alpha: int = 1, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'scale', init_weights: str = 'ia3', use_gating: bool = False, dtype: ~typing.Optional[str] = None)
The ‘Infused Adapter by Inhibiting and Amplifying Inner Activations’ ((IA)^3) architecture proposed by Liu et al. (2022). See https://arxiv.org/pdf/2205.05638.pdf. (IA)^3 builds on top of LoRA, however, unlike the additive composition of LoRA, it scales weights of a layer using an injected vector.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
PromptTuningConfig
- class adapters.PromptTuningConfig(architecture: str = 'prompt_tuning', prompt_length: int = 10, prompt_init: str = 'random_uniform', prompt_init_text: Optional[str] = None, combine: str = 'prefix')
The Prompt Tuning architecture proposed by Lester et al. (2021). See https://arxiv.org/pdf/2104.08691.pdf
- Parameters
prompt_length (int) – The number of tokens in the prompt. Defaults to 10.
prompt_init (str) – The initialization method for the prompt. Can be either “random_uniform” or “from_string”. Defaults to “random_uniform”.
prompt_init_text (str) – The text to use for prompt initialization if prompt_init=”from_string”.
random_uniform_scale (float) – The scale of the random uniform initialization if prompt_init=”random_uniform”. Defaults to 0.5 as in the paper.
combine (str) – The method used to combine the prompt with the input. Can be either “prefix” or “prefix_after_bos”. Defaults to “prefix”.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
ReFT
- class adapters.ReftConfig(layers: Union[Literal['all'], List[int]], prefix_positions: int, suffix_positions: int, r: int, orthogonality: bool, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)
Base class for Representation Fine-Tuning (ReFT) methods proposed in Wu et al. (2024). See https://arxiv.org/pdf/2404.03592. ReFT methods have in common that they add “interventions” after selected model layers and at selected sequence positions to adapt the representations produced by module outputs.
- Parameters
layers (Union[Literal["all"], List[int]]) – The IDs of the layers where interventions should be added. If “all”, interventions are added after all layers (default).
prefix_positions (int) – The number of prefix positions to add interventions to.
suffix_positions (int) – The number of suffix positions to add interventions to.
r (int) – The rank of the intervention layer.
orthogonality (bool) – If True, enforce an orthogonality constraint for the projection matrix.
tied_weights (bool) – If True, share intervention parameters between prefix and suffix positions in each layer.
subtract_projection (bool) – If True, subtract the projection of the input.
dropout (float) – The dropout rate used in the intervention layer.
non_linearity (str) – The activation function used in the intervention layer.
dtype (str, optional) – torch dtype for intervention tensors. Defaults to None.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
- class adapters.LoReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = True, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)
Low-Rank Linear Subspace ReFT method proposed in Wu et al. (2024). See https://arxiv.org/pdf/2404.03592.
- class adapters.NoReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = False, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)
Variation of LoReft without orthogonality constraint.
- class adapters.DiReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = False, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)
Variation of LoReft without orthogonality constraint and projection subtraction as proposed in Wu et al. (2024). See https://arxiv.org/pdf/2404.03592.
Combined configurations
- class adapters.ConfigUnion(*configs: List[AdapterConfig])
Composes multiple adaptation method configurations into one. This class can be used to define complex adaptation method setups.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)
Loads a given adapter configuration specifier into a full AdapterConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTER_CONFIG_MAP
the path to a file containing a full adapter configuration
an identifier string available in Adapter-Hub
- Returns
The resolved adapter configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
- static validate(configs)
Performs simple validations of a list of configurations to check whether they can be combined to a common setup.
- Parameters
configs (List[AdapterConfig]) – list of configs to check.
- Raises
TypeError – One of the configurations has a wrong type. ValueError: At least two given configurations
conflict. –
- class adapters.MAMConfig(prefix_tuning: Optional[PrefixTuningConfig] = None, adapter: Optional[BnConfig] = None)
The Mix-And-Match adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.
- class adapters.UniPELTConfig(prefix_tuning: Optional[PrefixTuningConfig] = None, adapter: Optional[BnConfig] = None, lora: Optional[LoRAConfig] = None)
The UniPELT adapter architecture proposed by Mao et al. (2022). See https://arxiv.org/pdf/2110.07577.pdf.
Adapter Fusion
- class adapters.AdapterFusionConfig(key: bool, query: bool, value: bool, query_before_ln: bool, regularization: bool, residual_before: bool, temperature: bool, value_before_softmax: bool, value_initialized: str, dropout_prob: float)
Base class that models the architecture of an adapter fusion layer.
- classmethod from_dict(config)
Creates a config class from a Python dict.
- classmethod load(config: Union[dict, str], **kwargs)
Loads a given adapter fusion configuration specifier into a full AdapterFusionConfig instance.
- Parameters
config (Union[dict, str]) –
The configuration to load. Can be either:
a dictionary representing the full config
an identifier string available in ADAPTERFUSION_CONFIG_MAP
the path to a file containing a full adapter fusion configuration
- Returns
The resolved adapter fusion configuration dictionary.
- Return type
dict
- replace(**changes)
Returns a new instance of the config class with the specified changes applied.
- to_dict()
Converts the config class to a Python dict.
- class adapters.StaticAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = False, query_before_ln: bool = False, regularization: bool = False, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = False, dropout_prob: Optional[float] = None)
Static version of adapter fusion without a value matrix. See https://arxiv.org/pdf/2005.00247.pdf.
- class adapters.DynamicAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = True, query_before_ln: bool = False, regularization: bool = True, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = True, dropout_prob: Optional[float] = None)
Dynamic version of adapter fusion with a value matrix and regularization. See https://arxiv.org/pdf/2005.00247.pdf.
Adapter Setup
- class adapters.AdapterSetup(adapter_setup, head_setup=None, ignore_empty: bool = False)
Represents an adapter setup of a model including active adapters and active heads. This class is intended to be used as a context manager using the
with
statement. The setup defined by theAdapterSetup
context will override static adapter setups defined in a model (i.e. setups specified viaactive_adapters
).Example:
with AdapterSetup(Stack("a", "b")): # will use the adapter stack "a" and "b" outputs = model(**inputs)
Note that the context manager is thread-local, i.e. it can be used with different setups in a multi-threaded environment.