Adapter Implementation

The following classes define the common interfaces for all adapter methods. They further hold logic shared by all adapter implementations. All newly added adapter methods should inherit from either one of these classes.

class adapters.AdapterLayerBase

Base class for all adaptation methods that require per-layer modules.

Make sure the ‘adapter_modules_name’ attribute is overriden in derived classes.

abstract add_adapter(adapter_name: str, layer_idx: int) → bool

Adds a new adapter module to the layer.

Parameters

adapter_name (str) – The name of the new adapter to add.
layer_idx (int) – The index of the adapters layer (this should be set once by the first added adapter and the kept fix).

Returns

True if the adapter was added, False otherwise.

Return type

bool

average_adapter(adapter_name: str, input_adapters: Dict[str, float], combine_strategy, **kwargs) → bool

Averages a set of adapter modules into a new adapter module.

Parameters

adapter_name (str) – The name of the new (averaged) adapter module to add.
input_adapters (Dict[str, float]) – Dictionary of adapter names and their corresponding weights.
combine_strategy (str) – The strategy to combine the adapters. Available strategies depend on the used adapter method, see: https://docs.adapterhub.ml/adapter_composition.html#merging-adapters
**kwargs – Additional arguments that are specific to the combine_strategy. E.g. svd_rank for LoRA.

Returns

True if the adapter was added, False otherwise.

Return type

bool

delete_adapter(adapter_name: str)

Deletes an adapter module from the layer.

Parameters: adapter_name (str) – The name of the adapter to delete.

enable_adapters(adapter_setup: AdapterCompositionBlock, unfreeze_adapters: bool, unfreeze_fusion: bool)

Enables/ disables a set of adapter modules within the layer.

Parameters

adapter_setup (AdapterCompositionBlock) – The adapter setup to enable/ disable.
unfreeze_adapters (bool) – Whether to unfreeze the adapters.

freeze_adapter(adapter_name: str, freeze: bool = True)

Freezes/ unfreezes an adapter module.

Parameters

adapter_name (str) – The name of the adapter to freeze/ unfreeze.
freeze (bool, optional) – Whether to freeze the adapter. Defaults to True.

get_adapter(adapter_name: str) → Module

Returns the adapter module with the given name.

Parameters: adapter_name (str) – The name of the adapter module.

pre_save_adapters(): Called before saving the adapters to disk.

class adapters.ComposableAdapterLayerBase(*args, **kwargs)

Base class for all adapter methods that support composition.

Make sure the ‘adapter_modules_name’ and ‘supported_compositions’ attributes as well as all abstract methods are overriden in derived classes. ‘allow_multi_parallelize’ can be set to True to allow inputs to be parallelized independently multiple times. This is useful when there are multiple parallel input flows through an adapter layer (e.g. in LoRA).

check_composition_valid(parent: AdapterCompositionBlock, child: AdapterCompositionBlock, lvl: int)

Checks whether the given composition is valid.

Parameters

parent (AdapterCompositionBlock) – The parent composition block.
child (AdapterCompositionBlock) – The child composition block.
lvl (int) – The composition depth.

Raises

ValueError – If the composition is invalid.

compose(adapter_setup: Union[AdapterCompositionBlock, str], state: NamedTuple) → NamedTuple

The main composition forward method which recursively calls the composition blocks forward methods. This method should be called by the forward method of the derived class.

Parameters

adapter_setup (Union[AdapterCompositionBlock, str]) – The adapter setup to be used.
state (NamedTuple) – The current state.

Returns

The state after forwarding through the adapter setup.

Return type

NamedTuple

compose_average(adapter_setup: Average, state: NamedTuple, lvl: int = 0): For averaging the output representations of multiple adapters.

compose_batch_split(adapter_setup: BatchSplit, state: NamedTuple, lvl: int = 0): For splitting to multiple adapters along the batch size dimension.

compose_fuse(adapter_setup: Fuse, state: NamedTuple, lvl: int = 0): For fusing multiple adapters using adapter fusion. NOTE: This method has no default implementation.

compose_multi_task(adapter_setup: MultiTask, state: NamedTuple, lvl: int = 0): For splitting to multiple adapters along the task_ids.

compose_parallel(adapter_setup: Parallel, state: NamedTuple, lvl: int = 0): For parallel execution of the adapters on the same input. This means that the input is repeated N times before feeding it to the adapters (where N is the number of adapters).

abstract compose_single(adapter_setup: str, state: NamedTuple, lvl: int = 0) → NamedTuple

Forwards the given state through the given single adapter.

Parameters

adapter_setup (str) – The name of the adapter.
state (NamedTuple) – The state to be forwarded.
lvl (int, optional) – The composition depth. Defaults to 0.

Returns

The state after forwarding through the adapter.

Return type

NamedTuple

compose_split(adapter_setup: Split, state: NamedTuple, lvl: int = 0): For splitting to multiple adapters along the sequence length dimension. NOTE: This method has no default implementation.

compose_stack(adapter_setup: Stack, state: NamedTuple, lvl: int = 0) → NamedTuple: For sequentially stacking multiple adapters.

abstract mean(states: List[NamedTuple], weights: Tensor) → NamedTuple

Averages the given states along the batch size dimension by the given weights. This is e.g. used by the Average composition block. IMPORTANT: Has to be implemented by all derived classes.

Parameters

states (List[NamedTuple]) – The states to be averaged.
weights (torch.Tensor) – The averaging weights.

Returns

The averaged state.

Return type

NamedTuple

abstract pad_and_concat(states: List[NamedTuple]) → NamedTuple

Concatenates the given states along the batch size dimension. Pads the states before concatenation if necessary. This is e.g. used by the BatchSplit and Parallel composition blocks. IMPORTANT: Has to be implemented by all derived classes.

Parameters: states (List[NamedTuple]) – The states to be concatenated.
Returns: The concatenated state.
Return type: NamedTuple

pre_block(adapter_setup: Union[AdapterCompositionBlock, str], state: NamedTuple) → NamedTuple

Optional state pre-processing method which is invoked before passing the state to the first child block of a composition. By default, this method does not contain any logic. E.g. used for bottleneck adapters to implement residuals and LNs.

Parameters

adapter_setup (Union[AdapterCompositionBlock, str]) – The current composition or single adapter.
state (NamedTuple) – The current state.

Returns

The pre-processed state.

Return type

NamedTuple

abstract repeat(state: NamedTuple, channels: int) → NamedTuple

Repeats the given state along the batch size dimension for the given number of times. This is e.g. used by the Parallel composition block. IMPORTANT: Has to be implemented by all derived classes.

Parameters

state (NamedTuple) – The state to be repeated.
channels (int) – The number of times the state should be repeated.

Returns

The repeated state.

Return type

NamedTuple

abstract vslice(state: NamedTuple, slice_obj: slice) → NamedTuple

Slices the given state along the batch size (vertical) dimension. This is e.g. used by the BatchSplit and Parallel composition blocks. IMPORTANT: Has to be implemented by all derived classes.

Parameters

state (NamedTuple) – The state to be sliced.
slice_obj (slice) – The slice object.

Returns

The sliced state.

Return type

NamedTuple