Adapter Implementation

The following classes define the common interfaces for all adapter methods. They further hold logic shared by all adapter implementations. All newly added adapter methods should inherit from either one of these classes.

class adapters.AdapterLayerBase

Base class for all adaptation methods that require per-layer modules.

Make sure the ‘adapter_modules_name’ attribute is overriden in derived classes.

abstract add_adapter(adapter_name: str, layer_idx: int) bool

Adds a new adapter module to the layer.

  • adapter_name (str) – The name of the new adapter to add.

  • layer_idx (int) – The index of the adapters layer (this should be set once by the first added adapter and the kept fix).


True if the adapter was added, False otherwise.

Return type


average_adapter(adapter_name: str, input_adapters: Dict[str, float], combine_strategy, **kwargs) bool

Averages a set of adapter modules into a new adapter module.

  • adapter_name (str) – The name of the new (averaged) adapter module to add.

  • input_adapters (Dict[str, float]) – Dictionary of adapter names and their corresponding weights.

  • combine_strategy (str) – The strategy to combine the adapters. Available strategies depend on the used adapter method, see:

  • **kwargs – Additional arguments that are specific to the combine_strategy. E.g. svd_rank for LoRA.


True if the adapter was added, False otherwise.

Return type


delete_adapter(adapter_name: str)

Deletes an adapter module from the layer.


adapter_name (str) – The name of the adapter to delete.

enable_adapters(adapter_setup: AdapterCompositionBlock, unfreeze_adapters: bool, unfreeze_fusion: bool)

Enables/ disables a set of adapter modules within the layer.

  • adapter_setup (AdapterCompositionBlock) – The adapter setup to enable/ disable.

  • unfreeze_adapters (bool) – Whether to unfreeze the adapters.

  • unfreeze_fusion (bool) – Whether to unfreeze the fusion layers.

freeze_adapter(adapter_name: str, freeze: bool = True)

Freezes/ unfreezes an adapter module.

  • adapter_name (str) – The name of the adapter to freeze/ unfreeze.

  • freeze (bool, optional) – Whether to freeze the adapter. Defaults to True.

get_adapter(adapter_name: str) Module

Returns the adapter module with the given name.


adapter_name (str) – The name of the adapter module.


Called before saving the adapters to disk.

class adapters.ComposableAdapterLayerBase(*args, **kwargs)

Base class for all adapter methods that support composition.

Make sure the ‘adapter_modules_name’ and ‘supported_compositions’ attributes as well as all abstract methods are overriden in derived classes. ‘allow_multi_parallelize’ can be set to True to allow inputs to be parallelized independently multiple times. This is useful when there are multiple parallel input flows through an adapter layer (e.g. in LoRA).

check_composition_valid(parent: AdapterCompositionBlock, child: AdapterCompositionBlock, lvl: int)

Checks whether the given composition is valid.

  • parent (AdapterCompositionBlock) – The parent composition block.

  • child (AdapterCompositionBlock) – The child composition block.

  • lvl (int) – The composition depth.


ValueError – If the composition is invalid.

compose(adapter_setup: Union[AdapterCompositionBlock, str], state: NamedTuple) NamedTuple

The main composition forward method which recursively calls the composition blocks forward methods. This method should be called by the forward method of the derived class.

  • adapter_setup (Union[AdapterCompositionBlock, str]) – The adapter setup to be used.

  • state (NamedTuple) – The current state.


The state after forwarding through the adapter setup.

Return type


compose_average(adapter_setup: Average, state: NamedTuple, lvl: int = 0)

For averaging the output representations of multiple adapters.

compose_batch_split(adapter_setup: BatchSplit, state: NamedTuple, lvl: int = 0)

For splitting to multiple adapters along the batch size dimension.

compose_fuse(adapter_setup: Fuse, state: NamedTuple, lvl: int = 0)

For fusing multiple adapters using adapter fusion. NOTE: This method has no default implementation.

compose_parallel(adapter_setup: Parallel, state: NamedTuple, lvl: int = 0)

For parallel execution of the adapters on the same input. This means that the input is repeated N times before feeding it to the adapters (where N is the number of adapters).

abstract compose_single(adapter_setup: str, state: NamedTuple, lvl: int = 0) NamedTuple

Forwards the given state through the given single adapter.

  • adapter_setup (str) – The name of the adapter.

  • state (NamedTuple) – The state to be forwarded.

  • lvl (int, optional) – The composition depth. Defaults to 0.


The state after forwarding through the adapter.

Return type


compose_split(adapter_setup: Split, state: NamedTuple, lvl: int = 0)

For splitting to multiple adapters along the sequence length dimension. NOTE: This method has no default implementation.

compose_stack(adapter_setup: Stack, state: NamedTuple, lvl: int = 0) NamedTuple

For sequentially stacking multiple adapters.

abstract mean(states: List[NamedTuple], weights: Tensor) NamedTuple

Averages the given states along the batch size dimension by the given weights. This is e.g. used by the Average composition block. IMPORTANT: Has to be implemented by all derived classes.

  • states (List[NamedTuple]) – The states to be averaged.

  • weights (torch.Tensor) – The averaging weights.


The averaged state.

Return type


abstract pad_and_concat(states: List[NamedTuple]) NamedTuple

Concatenates the given states along the batch size dimension. Pads the states before concatenation if necessary. This is e.g. used by the BatchSplit and Parallel composition blocks. IMPORTANT: Has to be implemented by all derived classes.


states (List[NamedTuple]) – The states to be concatenated.


The concatenated state.

Return type


pre_block(adapter_setup: Union[AdapterCompositionBlock, str], state: NamedTuple) NamedTuple

Optional state pre-processing method which is invoked before passing the state to the first child block of a composition. By default, this method does not contain any logic. E.g. used for bottleneck adapters to implement residuals and LNs.

  • adapter_setup (Union[AdapterCompositionBlock, str]) – The current composition or single adapter.

  • state (NamedTuple) – The current state.


The pre-processed state.

Return type


abstract repeat(state: NamedTuple, channels: int) NamedTuple

Repeats the given state along the batch size dimension for the given number of times. This is e.g. used by the Parallel composition block. IMPORTANT: Has to be implemented by all derived classes.

  • state (NamedTuple) – The state to be repeated.

  • channels (int) – The number of times the state should be repeated.


The repeated state.

Return type


abstract vslice(state: NamedTuple, slice_obj: slice) NamedTuple

Slices the given state along the batch size (vertical) dimension. This is e.g. used by the BatchSplit and Parallel composition blocks. IMPORTANT: Has to be implemented by all derived classes.

  • state (NamedTuple) – The state to be sliced.

  • slice_obj (slice) – The slice object.


The sliced state.

Return type
