Adapter Configuration

Classes representing the architectures of adapter modules and fusion layers.

Single (bottleneck) adapters

class adapters.AdapterConfig

Base class for all adaptation methods. This class does not define specific configuration keys, but only provides some common helper methods.

Parameters

architecture (str, optional) – The type of adaptation method defined by the configuration.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class adapters.BnConfig(mh_adapter: bool, output_adapter: bool, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping], non_linearity: str, original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

Base class that models the architecture of a bottleneck adapter.

Parameters
  • mh_adapter (bool) – If True, add adapter modules after the multi-head attention block of each layer.

  • output_adapter (bool) – If True, add adapter modules after the output FFN of each layer.

  • reduction_factor (float or Mapping) – Either a scalar float (> 0) specifying the reduction factor for all layers or a mapping from layer ID (starting at 0) to values specifying the reduction_factor for individual layers. If not all layers are represented in the mapping a default value should be given e.g. {‘1’: 8, ‘6’: 32, ‘default’: 16}. Specifying a reduction factor < 1 will result in an up-projection layer.

  • non_linearity (str) – The activation function to use in the adapter bottleneck.

  • original_ln_before (bool, optional) – If True, apply layer pre-trained normalization and residual connection before the adapter modules. Defaults to False. Only applicable if is_parallel is False.

  • original_ln_after (bool, optional) – If True, apply pre-trained layer normalization and residual connection after the adapter modules. Defaults to True.

  • ln_before (bool, optional) – If True, add a new layer normalization before the adapter bottleneck. Defaults to False.

  • ln_after (bool, optional) – If True, add a new layer normalization after the adapter bottleneck. Defaults to False.

  • init_weights (str, optional) – Initialization method for the weights of the adapter modules. Currently, this can be either “bert” (default) or “mam_adapter” or “houlsby”.

  • is_parallel (bool, optional) – If True, apply adapter transformations in parallel. By default (False), sequential application is used.

  • scaling (float or str, optional) – Scaling factor to use for scaled addition of adapter outputs as done by He et al. (2021). Can be either a constant factor (float), or the string “learned”, in which case the scaling factor is learned, or the string “channel”, in which case we initialize a scaling vector of the channel shape that is then learned. Defaults to 1.0.

  • use_gating (bool, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.

  • residual_before_ln (bool or str, optional) – If True, take the residual connection around the adapter bottleneck before the layer normalization. If set to “post_add”, take the residual connection around the adapter bottleneck after the previous residual connection. Only applicable if original_ln_before is True.

  • adapter_residual_before_ln (bool, optional) – If True, apply the residual connection around the adapter modules before the new layer normalization within the adapter. Only applicable if ln_after is True and is_parallel is False.

  • inv_adapter (str, optional) – If not None (default), add invertible adapter modules after the model embedding layer. Currently, this can be either “nice” or “glow”.

  • inv_adapter_reduction_factor (float, optional) – The reduction to use within the invertible adapter modules. Only applicable if inv_adapter is not None.

  • cross_adapter (bool, optional) – If True, add adapter modules after the cross attention block of each decoder layer in an encoder-decoder model. Defaults to False.

  • leave_out (List[int], optional) – The IDs of the layers (starting at 0) where NO adapter modules should be added.

  • dropout (float, optional) – The dropout rate used in the adapter layer. Defaults to 0.0.

  • phm_layer (bool, optional) – If True the down and up projection layers are a PHMLayer. Defaults to False

  • phm_dim (int, optional) – The dimension of the phm matrix. Only applicable if phm_layer is set to True. Defaults to 4.

  • shared_phm_rule (bool, optional) – Whether the phm matrix is shared across all layers. Defaults to True

  • factorized_phm_rule (bool, optional) – Whether the phm matrix is factorized into a left and right matrix. Defaults to False.

  • learn_phm (bool, optional) – Whether the phm matrix should be learned during training. Defaults to True

  • ( (factorized_phm_W) – obj:bool, optional): Whether the weights matrix is factorized into a left and right matrix. Defaults to True

  • shared_W_phm (bool, optional) – Whether the weights matrix is shared across all layers. Defaults to False.

  • phm_c_init (str, optional) – The initialization function for the weights of the phm matrix. The possible values are [“normal”, “uniform”]. Defaults to normal.

  • phm_init_range (float, optional) – std for initializing phm weights if phm_c_init=”normal”. Defaults to 0.0001.

  • hypercomplex_nonlinearity (str, optional) – This specifies the distribution to draw the weights in the phm layer from. Defaults to glorot-uniform.

  • phm_rank (int, optional) – If the weight matrix is factorized this specifies the rank of the matrix. E.g. the left matrix of the down projection has the shape (phm_dim, _in_feats_per_axis, phm_rank) and the right matrix (phm_dim, phm_rank, _out_feats_per_axis). Defaults to 1

  • phm_bias (bool, optional) – If True the down and up projection PHMLayer has a bias term. If phm_layer is False this is ignored. Defaults to True

  • stochastic_depth (float, optional) – This value specifies the probability of the model dropping entire layers during training. This parameter should be only used for vision based tasks involving residual networks.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class adapters.SeqBnConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.

class adapters.SeqBnInvConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'relu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = 'nice', inv_adapter_reduction_factor: ~typing.Optional[float] = 2, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The adapter architecture proposed by Pfeiffer et al. (2020). See https://arxiv.org/pdf/2005.00247.pdf.

class adapters.DoubleSeqBnConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The adapter architecture proposed by Houlsby et al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.

class adapters.DoubleSeqBnInvConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 16, non_linearity: str = 'swish', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = 'nice', inv_adapter_reduction_factor: ~typing.Optional[float] = 2, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The adapter architecture proposed by Houlsby et. al. (2019). See https://arxiv.org/pdf/1902.00751.pdf.

class adapters.ParBnConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 2, non_linearity: str = 'relu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'mam_adapter', is_parallel: bool = True, scaling: ~typing.Union[float, str] = 4.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The parallel adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.

class adapters.CompacterConfig(mh_adapter: bool = True, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = False, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The Compacter architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.

class adapters.CompacterPlusPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 32, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = True, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'bert', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 1.0, use_gating: bool = False, residual_before_ln: ~typing.Union[bool, str] = True, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = True, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: ~typing.Optional[float] = 0.0)

The Compacter++ architecture proposed by Mahabadi et al. (2021). See https://arxiv.org/pdf/2106.04647.pdf.

class adapters.AdapterPlusConfig(mh_adapter: bool = False, output_adapter: bool = True, reduction_factor: ~typing.Union[float, ~collections.abc.Mapping] = 96, non_linearity: str = 'gelu', original_ln_before: bool = True, original_ln_after: bool = False, ln_before: bool = False, ln_after: bool = False, init_weights: str = 'houlsby', is_parallel: bool = False, scaling: ~typing.Union[float, str] = 'channel', use_gating: bool = False, residual_before_ln: bool = False, adapter_residual_before_ln: bool = False, inv_adapter: ~typing.Optional[str] = None, inv_adapter_reduction_factor: ~typing.Optional[float] = None, cross_adapter: bool = False, leave_out: ~typing.List[int] = <factory>, dropout: float = 0.0, phm_layer: bool = False, phm_dim: int = 4, factorized_phm_W: ~typing.Optional[bool] = True, shared_W_phm: ~typing.Optional[bool] = False, shared_phm_rule: ~typing.Optional[bool] = True, factorized_phm_rule: ~typing.Optional[bool] = False, phm_c_init: ~typing.Optional[str] = 'normal', phm_init_range: ~typing.Optional[float] = 0.0001, learn_phm: ~typing.Optional[bool] = True, hypercomplex_nonlinearity: ~typing.Optional[str] = 'glorot-uniform', phm_rank: ~typing.Optional[int] = 1, phm_bias: ~typing.Optional[bool] = True, stochastic_depth: float = 0.1)

The AdapterPlus config architecture proposed by Jan-Martin O, Steitz and Stefan Roth. See https://arxiv.org/pdf/2406.06820

Please note that some configurations of the adapters parameters original_ln_after, original_ln_before, and residual_before_ln may result in performance issues when training.

In the general case:
  1. At least one of original_ln_before or original_ln_after should be set to True in order to ensure that the original residual connection from pre-training is preserved.

  2. If original_ln_after is set to False, residual_before_ln must also be set to False to ensure convergence during training.

Prefix Tuning

class adapters.PrefixTuningConfig(architecture: ~typing.Optional[str] = 'prefix_tuning', encoder_prefix: bool = True, cross_prefix: bool = True, leave_out: ~typing.List[int] = <factory>, flat: bool = False, prefix_length: int = 30, bottleneck_size: int = 512, non_linearity: str = 'tanh', dropout: float = 0.0, use_gating: bool = False, shared_gating: bool = True)

The Prefix Tuning architecture proposed by Li & Liang (2021). See https://arxiv.org/pdf/2101.00190.pdf.

Parameters
  • encoder_prefix (bool) – If True, add prefixes to the encoder of an encoder-decoder model.

  • cross_prefix (bool) – If True, add prefixes to the cross attention of an encoder-decoder model.

  • flat (bool) – If True, train the prefix parameters directly. Otherwise, reparametrize using a bottleneck MLP.

  • prefix_length (int) – The length of the prefix.

  • bottleneck_size (int) – If flat=False, the size of the bottleneck MLP.

  • non_linearity (str) – If flat=False, the non-linearity used in the bottleneck MLP.

  • dropout (float) – The dropout rate used in the prefix tuning layer.

  • leave_out (List[int]) – The IDs of the layers (starting at 0) where NO prefix should be added.

  • use_gating (bool, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False.

  • ( (shared_gating) – obj:bool, optional): Whether to use a shared gate for the prefixes of all attention matrices. Only applicable if use_gating=True. Defaults to True.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

LoRAConfig

class adapters.LoRAConfig(architecture: ~typing.Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = False, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 8, alpha: int = 8, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'add', init_weights: str = 'lora', use_gating: bool = False, dtype: ~typing.Optional[str] = None)

The Low-Rank Adaptation (LoRA) architecture proposed by Hu et al. (2021). See https://arxiv.org/pdf/2106.09685.pdf. LoRA adapts a model by reparametrizing the weights of a layer matrix. You can merge the additional weights with the original layer weights using model.merge_adapter("lora_name").

Parameters
  • selfattn_lora (bool, optional) – If True, add LoRA to the self-attention weights of a model. Defaults to True.

  • intermediate_lora (bool, optional) – If True, add LoRA to the intermediate MLP weights of a model. Defaults to False.

  • output_lora (bool, optional) – If True, add LoRA to the output MLP weights of a model. Defaults to False.

  • leave_out (List[int], optional) – The IDs of the layers (starting at 0) where NO adapter modules should be added.

  • r (int, optional) – The rank of the LoRA layer. Defaults to 8.

  • alpha (int, optional) – The hyperparameter used for scaling the LoRA reparametrization. Defaults to 8.

  • dropout (float, optional) – The dropout rate used in the LoRA layer. Defaults to 0.0.

  • attn_matrices (List[str], optional) – Determines which matrices of the self-attention module to adapt. A list that may contain the strings “q” (query), “k” (key), “v” (value). Defaults to [“q”, “v”].

  • composition_mode (str, optional) – Defines how the injected weights are composed with the original model weights. Can be either “add” (addition of decomposed matrix, as in LoRA) or “scale” (element-wise multiplication of vector, as in (IA)^3). “scale” can only be used together with r=1. Defaults to “add”.

  • init_weights (str, optional) – Initialization method for the weights of the LoRA modules. Currently, this can be either “lora” (default) or “bert”.

  • use_gating (bool, optional) – Place a trainable gating module besides the added parameter module to control module activation. This is e.g. used for UniPELT. Defaults to False. Note that modules with use_gating=True cannot be merged using merge_adapter().

  • dtype (str, optional) – torch dtype for reparametrization tensors. Defaults to None.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

IA3Config

class adapters.IA3Config(architecture: ~typing.Optional[str] = 'lora', selfattn_lora: bool = True, intermediate_lora: bool = True, output_lora: bool = False, leave_out: ~typing.List[int] = <factory>, r: int = 1, alpha: int = 1, dropout: float = 0.0, attn_matrices: ~typing.List[str] = <factory>, composition_mode: str = 'scale', init_weights: str = 'ia3', use_gating: bool = False, dtype: ~typing.Optional[str] = None)

The ‘Infused Adapter by Inhibiting and Amplifying Inner Activations’ ((IA)^3) architecture proposed by Liu et al. (2022). See https://arxiv.org/pdf/2205.05638.pdf. (IA)^3 builds on top of LoRA, however, unlike the additive composition of LoRA, it scales weights of a layer using an injected vector.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

PromptTuningConfig

class adapters.PromptTuningConfig(architecture: str = 'prompt_tuning', prompt_length: int = 10, prompt_init: str = 'random_uniform', prompt_init_text: Optional[str] = None, combine: str = 'prefix')

The Prompt Tuning architecture proposed by Lester et al. (2021). See https://arxiv.org/pdf/2104.08691.pdf

Parameters
  • prompt_length (int) – The number of tokens in the prompt. Defaults to 10.

  • prompt_init (str) – The initialization method for the prompt. Can be either “random_uniform” or “from_string”. Defaults to “random_uniform”.

  • prompt_init_text (str) – The text to use for prompt initialization if prompt_init=”from_string”.

  • random_uniform_scale (float) – The scale of the random uniform initialization if prompt_init=”random_uniform”. Defaults to 0.5 as in the paper.

  • combine (str) – The method used to combine the prompt with the input. Can be either “prefix” or “prefix_after_bos”. Defaults to “prefix”.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

ReFT

class adapters.ReftConfig(layers: Union[Literal['all'], List[int]], prefix_positions: int, suffix_positions: int, r: int, orthogonality: bool, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)

Base class for Representation Fine-Tuning (ReFT) methods proposed in Wu et al. (2024). See https://arxiv.org/pdf/2404.03592. ReFT methods have in common that they add “interventions” after selected model layers and at selected sequence positions to adapt the representations produced by module outputs.

Parameters
  • layers (Union[Literal["all"], List[int]]) – The IDs of the layers where interventions should be added. If “all”, interventions are added after all layers (default).

  • prefix_positions (int) – The number of prefix positions to add interventions to.

  • suffix_positions (int) – The number of suffix positions to add interventions to.

  • r (int) – The rank of the intervention layer.

  • orthogonality (bool) – If True, enforce an orthogonality constraint for the projection matrix.

  • tied_weights (bool) – If True, share intervention parameters between prefix and suffix positions in each layer.

  • subtract_projection (bool) – If True, subtract the projection of the input.

  • dropout (float) – The dropout rate used in the intervention layer.

  • non_linearity (str) – The activation function used in the intervention layer.

  • dtype (str, optional) – torch dtype for intervention tensors. Defaults to None.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class adapters.LoReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = True, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)

Low-Rank Linear Subspace ReFT method proposed in Wu et al. (2024). See https://arxiv.org/pdf/2404.03592.

class adapters.NoReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = False, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)

Variation of LoReft without orthogonality constraint.

class adapters.DiReftConfig(layers: Union[Literal['all'], List[int]] = 'all', prefix_positions: int = 3, suffix_positions: int = 0, r: int = 1, orthogonality: bool = False, tied_weights: bool = False, dropout: float = 0.05, non_linearity: Optional[str] = None, dtype: Optional[str] = None, architecture: str = 'reft', output_reft: bool = True)

Variation of LoReft without orthogonality constraint and projection subtraction as proposed in Wu et al. (2024). See https://arxiv.org/pdf/2404.03592.

Combined configurations

class adapters.ConfigUnion(*configs: List[AdapterConfig])

Composes multiple adaptation method configurations into one. This class can be used to define complex adaptation method setups.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], download_kwargs=None, **kwargs)

Loads a given adapter configuration specifier into a full AdapterConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTER_CONFIG_MAP

  • the path to a file containing a full adapter configuration

  • an identifier string available in Adapter-Hub

Returns

The resolved adapter configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

static validate(configs)

Performs simple validations of a list of configurations to check whether they can be combined to a common setup.

Parameters

configs (List[AdapterConfig]) – list of configs to check.

Raises
  • TypeError – One of the configurations has a wrong type. ValueError: At least two given configurations

  • conflict.

class adapters.MAMConfig(prefix_tuning: Optional[PrefixTuningConfig] = None, adapter: Optional[BnConfig] = None)

The Mix-And-Match adapter architecture proposed by He et al. (2021). See https://arxiv.org/pdf/2110.04366.pdf.

class adapters.UniPELTConfig(prefix_tuning: Optional[PrefixTuningConfig] = None, adapter: Optional[BnConfig] = None, lora: Optional[LoRAConfig] = None)

The UniPELT adapter architecture proposed by Mao et al. (2022). See https://arxiv.org/pdf/2110.07577.pdf.

Adapter Fusion

class adapters.AdapterFusionConfig(key: bool, query: bool, value: bool, query_before_ln: bool, regularization: bool, residual_before: bool, temperature: bool, value_before_softmax: bool, value_initialized: str, dropout_prob: float)

Base class that models the architecture of an adapter fusion layer.

classmethod from_dict(config)

Creates a config class from a Python dict.

classmethod load(config: Union[dict, str], **kwargs)

Loads a given adapter fusion configuration specifier into a full AdapterFusionConfig instance.

Parameters

config (Union[dict, str]) –

The configuration to load. Can be either:

  • a dictionary representing the full config

  • an identifier string available in ADAPTERFUSION_CONFIG_MAP

  • the path to a file containing a full adapter fusion configuration

Returns

The resolved adapter fusion configuration dictionary.

Return type

dict

replace(**changes)

Returns a new instance of the config class with the specified changes applied.

to_dict()

Converts the config class to a Python dict.

class adapters.StaticAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = False, query_before_ln: bool = False, regularization: bool = False, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = False, dropout_prob: Optional[float] = None)

Static version of adapter fusion without a value matrix. See https://arxiv.org/pdf/2005.00247.pdf.

class adapters.DynamicAdapterFusionConfig(key: bool = True, query: bool = True, value: bool = True, query_before_ln: bool = False, regularization: bool = True, residual_before: bool = False, temperature: bool = False, value_before_softmax: bool = True, value_initialized: str = True, dropout_prob: Optional[float] = None)

Dynamic version of adapter fusion with a value matrix and regularization. See https://arxiv.org/pdf/2005.00247.pdf.

Adapter Setup

class adapters.AdapterSetup(adapter_setup, head_setup=None, ignore_empty: bool = False)

Represents an adapter setup of a model including active adapters and active heads. This class is intended to be used as a context manager using the with statement. The setup defined by the AdapterSetup context will override static adapter setups defined in a model (i.e. setups specified via active_adapters).

Example:

with AdapterSetup(Stack("a", "b")):
    # will use the adapter stack "a" and "b" outputs = model(**inputs)

Note that the context manager is thread-local, i.e. it can be used with different setups in a multi-threaded environment.