Adapter Implementation

The following classes define the common interfaces for all adapter methods. They further hold logic shared by all adapter implementations. All newly added adapter methods should inherit from either one of these classes.

class adapters.AdapterLayerBase

Base class for all adaptation methods that require per-layer modules.

Make sure the ‘adapter_modules_name’ attribute is overriden in derived classes.

abstract add_adapter(adapter_name: str, layer_idx: int) bool

Adds a new adapter module to the layer.

Parameters
  • adapter_name (str) – The name of the new adapter to add.

  • layer_idx (int) – The index of the adapters layer (this should be set once by the first added adapter and the kept fix).

Returns

True if the adapter was added, False otherwise.

Return type

bool

abstract average_adapter(adapter_name: str, input_adapters: Dict[str, float]) bool

Averages a set of adapter modules into a new adapter module.

Parameters
  • adapter_name (str) – The name of the new (averaged) adapter module to add.

  • input_adapters (Dict[str, float]) – Either: - a list of adapter names (with equal weighting). - a dictionary of adapter names and their corresponding weights.

Returns

True if the adapter was added, False otherwise.

Return type

bool

abstract delete_adapter(adapter_name: str)

Deletes an adapter module from the layer.

Parameters

adapter_name (str) – The name of the adapter to delete.

abstract enable_adapters(adapter_setup: AdapterCompositionBlock, unfreeze_adapters: bool, unfreeze_fusion: bool)

Enables/ disables a set of adapter modules within the layer.

Parameters
  • adapter_setup (AdapterCompositionBlock) – The adapter setup to enable/ disable.

  • unfreeze_adapters (bool) – Whether to unfreeze the adapters.

  • unfreeze_fusion (bool) – Whether to unfreeze the fusion layers.

abstract get_adapter(adapter_name: str) Module

Returns the adapter module with the given name.

Parameters

adapter_name (str) – The name of the adapter module.

class adapters.ComposableAdapterLayerBase(*args, **kwargs)

Base class for all adapter methods that support composition.

Make sure the ‘adapter_modules_name’ and ‘supported_compositions’ attributes as well as all abstract methods are overriden in derived classes. ‘allow_multi_parallelize’ can be set to True to allow inputs to be parallelized independently multiple times. This is useful when there are multiple parallel input flows through an adapter layer (e.g. in LoRA).

check_composition_valid(parent: AdapterCompositionBlock, child: AdapterCompositionBlock, lvl: int)

Checks whether the given composition is valid.

Parameters
  • parent (AdapterCompositionBlock) – The parent composition block.

  • child (AdapterCompositionBlock) – The child composition block.

  • lvl (int) – The composition depth.

Raises

ValueError – If the composition is invalid.

compose(adapter_setup: Union[AdapterCompositionBlock, str], state: NamedTuple) NamedTuple

The main composition forward method which recursively calls the composition blocks forward methods. This method should be called by the forward method of the derived class.

Parameters
  • adapter_setup (Union[AdapterCompositionBlock, str]) – The adapter setup to be used.

  • state (NamedTuple) – The current state.

Returns

The state after forwarding through the adapter setup.

Return type

NamedTuple

compose_average(adapter_setup: Average, state: NamedTuple, lvl: int = 0)

For averaging the output representations of multiple adapters.

compose_batch_split(adapter_setup: BatchSplit, state: NamedTuple, lvl: int = 0)

For splitting to multiple adapters along the batch size dimension.

compose_fuse(adapter_setup: Fuse, state: NamedTuple, lvl: int = 0)

For fusing multiple adapters using adapter fusion. NOTE: This method has no default implementation.

compose_parallel(adapter_setup: Parallel, state: NamedTuple, lvl: int = 0)

For parallel execution of the adapters on the same input. This means that the input is repeated N times before feeding it to the adapters (where N is the number of adapters).

abstract compose_single(adapter_setup: str, state: NamedTuple, lvl: int = 0) NamedTuple

Forwards the given state through the given single adapter.

Parameters
  • adapter_setup (str) – The name of the adapter.

  • state (NamedTuple) – The state to be forwarded.

  • lvl (int, optional) – The composition depth. Defaults to 0.

Returns

The state after forwarding through the adapter.

Return type

NamedTuple

compose_split(adapter_setup: Split, state: NamedTuple, lvl: int = 0)

For splitting to multiple adapters along the sequence length dimension. NOTE: This method has no default implementation.

compose_stack(adapter_setup: Stack, state: NamedTuple, lvl: int = 0) NamedTuple

For sequentially stacking multiple adapters.

abstract mean(states: List[NamedTuple], weights: Tensor) NamedTuple

Averages the given states along the batch size dimension by the given weights. This is e.g. used by the Average composition block. IMPORTANT: Has to be implemented by all derived classes.

Parameters
  • states (List[NamedTuple]) – The states to be averaged.

  • weights (torch.Tensor) – The averaging weights.

Returns

The averaged state.

Return type

NamedTuple

abstract pad_and_concat(states: List[NamedTuple]) NamedTuple

Concatenates the given states along the batch size dimension. Pads the states before concatenation if necessary. This is e.g. used by the BatchSplit and Parallel composition blocks. IMPORTANT: Has to be implemented by all derived classes.

Parameters

states (List[NamedTuple]) – The states to be concatenated.

Returns

The concatenated state.

Return type

NamedTuple

pre_block(adapter_setup: Union[AdapterCompositionBlock, str], state: NamedTuple) NamedTuple

Optional state pre-processing method which is invoked before passing the state to the first child block of a composition. By default, this method does not contain any logic. E.g. used for bottleneck adapters to implement residuals and LNs.

Parameters
  • adapter_setup (Union[AdapterCompositionBlock, str]) – The current composition or single adapter.

  • state (NamedTuple) – The current state.

Returns

The pre-processed state.

Return type

NamedTuple

abstract repeat(state: NamedTuple, channels: int) NamedTuple

Repeats the given state along the batch size dimension for the given number of times. This is e.g. used by the Parallel composition block. IMPORTANT: Has to be implemented by all derived classes.

Parameters
  • state (NamedTuple) – The state to be repeated.

  • channels (int) – The number of times the state should be repeated.

Returns

The repeated state.

Return type

NamedTuple

abstract vslice(state: NamedTuple, slice_obj: slice) NamedTuple

Slices the given state along the batch size (vertical) dimension. This is e.g. used by the BatchSplit and Parallel composition blocks. IMPORTANT: Has to be implemented by all derived classes.

Parameters
  • state (NamedTuple) – The state to be sliced.

  • slice_obj (slice) – The slice object.

Returns

The sliced state.

Return type

NamedTuple