BERT Mixins

These classes added to the BERT module classes add support for adapters to all BERT-based transformer models.

class transformers.adapter_bert.BertEncoderAdaptersMixin

Adds adapters to the BertEncoder module.

class transformers.adapter_bert.BertLayerAdaptersMixin

Adds adapters to the BertLayer module.

class transformers.adapter_bert.BertModelAdaptersMixin(*args, **kwargs)

Adds adapters to the BertModel module.

add_adapter(adapter_name: str, adapter_type: transformers.adapter_utils.AdapterType, config=None)

Adds a new adapter module of the specified type to the model.

Parameters
  • adapter_name (str) – The name of the adapter module to be added.

  • adapter_type (AdapterType) – The adapter type.

  • config (str or dict or AdapterConfig, optional) – The adapter configuration, can be either: - the string identifier of a pre-defined configuration dictionary - a configuration dictionary specifying the full config - if not given, the default configuration for this adapter type will be used

train_adapter(adapter_names: list)

Sets the model into mode for training the given adapters.

train_fusion(adapter_names: list)

Sets the model into mode for training of adapter fusion determined by a list of adapter names.

class transformers.adapter_bert.BertModelHeadsMixin(*args, **kwargs)

Adds heads to a Bert-based module.

add_classification_head(head_name, num_labels=2, layers=2, activation_function='tanh', overwrite_ok=False, multilabel=False, id2label=None)

Adds a sequence classification head on top of the model.

Parameters
  • head_name (str) – The name of the head.

  • num_labels (int, optional) – Number of classification labels. Defaults to 2.

  • layers (int, optional) – Number of layers. Defaults to 2.

  • activation_function (str, optional) – Activation function. Defaults to ‘tanh’.

  • overwrite_ok (bool, optional) – Force overwrite if a head with the same name exists. Defaults to False.

  • multilabel (bool, optional) – Enable multilabel classification setup. Defaults to False.

add_multiple_choice_head(head_name, num_choices=2, layers=2, activation_function='tanh', overwrite_ok=False, id2label=None)

Adds a multiple choice head on top of the model.

Parameters
  • head_name (str) – The name of the head.

  • num_choices (int, optional) – Number of choices. Defaults to 2.

  • layers (int, optional) – Number of layers. Defaults to 2.

  • activation_function (str, optional) – Activation function. Defaults to ‘tanh’.

  • overwrite_ok (bool, optional) – Force overwrite if a head with the same name exists. Defaults to False.

add_tagging_head(head_name, num_labels=2, layers=1, activation_function='tanh', overwrite_ok=False, id2label=None)

Adds a token classification head on top of the model.

Parameters
  • head_name (str) – The name of the head.

  • num_labels (int, optional) – Number of classification labels. Defaults to 2.

  • layers (int, optional) – Number of layers. Defaults to 1.

  • activation_function (str, optional) – Activation function. Defaults to ‘tanh’.

  • overwrite_ok (bool, optional) – Force overwrite if a head with the same name exists. Defaults to False.

get_labels(head_name=None)

Returns the labels the given head is assigning/predicting :param head_name: (str, optional) the name of the head which labels should be returned. Default is None. :param If the name is None the labels of the active head are returned:

Returns: labels

get_labels_dict(head_name=None)

Returns the id2label dict for the given head :param head_name: (str, optional) the name of the head which labels should be returned. Default is None. :param If the name is None the labels of the active head are returned:

Returns: id2label

set_active_adapters(adapter_names: list)

Sets the adapter modules to be used by default in every forward pass. This setting can be overriden by passing the adapter_names parameter in the foward() pass. If no adapter with the given name is found, no module of the respective type will be activated. In case the calling model class supports named prediction heads, this method will attempt to activate a prediction head with the name of the last adapter in the list of passed adapter names.

Parameters

adapter_names (list) – The list of adapters to be activated by default. Can be a fusion or stacking configuration.

class transformers.adapter_bert.BertOutputAdaptersMixin

Adds adapters to the BertOutput module.

adapter_fusion(hidden_states, adapter_stack, residual, query)

If more than one adapter name is set for a stack layer, we fuse the adapters. For this, we pass through every adapter and learn an attention-like weighting of each adapter. The information stored in each of the adapters is thus fused together wrt the current example. :param hidden_states: output of the previous transformer layer or adapter :param adapter_stack: names of adapters for the current stack. Iff len(adapter_stack) == 1, we pass through a

single adapter. iff len(adapter_stack) > 1 we fuse the adapters

Parameters
  • residual – residual of the previous layer

  • query – query by which we attend over the adapters

Returns: hidden_states

adapter_stack_layer(hidden_states, input_tensor, adapter_stack)

One layer of stacked adapters. This either passes through a single adapter and prepares the data to be passed into a subsequent adapter, or the next transformer layer OR IFF more than one adapter names is set for one stack layer, we assume that fusion is activated. Thus, the adapters are fused together. :param hidden_states: output of the previous transformer layer or adapter :param input_tensor: residual connection of transformer :param adapter_stack: names of adapters for the current stack. Iff len(adapter_stack) == 1, we pass through a

single adapter. iff len(adapter_stack) > 1 we fuse the adapters

Returns: hidden_states

add_fusion_layer(adapter_names)

See BertModel.add_fusion_layer

get_adapter_layer(adapter_name)

Depending on the adapter type we retrieve the correct layer. If no adapter for that name was set at that layer we return None :param adapter_name: string name of the adapter

Returns: layer | None

get_adapter_preparams(adapter_config, hidden_states, input_tensor)

Retrieves the hidden_states, query (for Fusion), and residual connection according to the set configuration :param adapter_config: config file according to what the parameters are passed :param hidden_states: output of previous layer :param input_tensor: residual connection before FFN

Returns: hidden_states, query, residual

class transformers.adapter_bert.BertSelfOutputAdaptersMixin

Adds adapters to the BertSelfOutput module.

adapter_fusion(hidden_states, adapter_stack, residual, query)

If more than one adapter name is set for a stack layer, we fuse the adapters. For this, we pass through every adapter and learn an attention-like weighting of each adapter. The information stored in each of the adapters is thus fused together wrt the current example. :param hidden_states: output of the previous transformer layer or adapter :param adapter_stack: names of adapters for the current stack. Iff len(adapter_stack) == 1, we pass through a

single adapter. iff len(adapter_stack) > 1 we fuse the adapters

Parameters
  • residual – residual of the previous layer

  • query – query by which we attend over the adapters

Returns: hidden_states

adapter_stack_layer(hidden_states, input_tensor, adapter_stack)

One layer of stacked adapters. This either passes through a single adapter and prepares the data to be passed into a subsequent adapter, or the next transformer layer OR IFF more than one adapter names is set for one stack layer, we assume that fusion is activated. Thus, the adapters are fused together. :param hidden_states: output of the previous transformer layer or adapter :param input_tensor: residual connection of transformer :param adapter_stack: names of adapters for the current stack. Iff len(adapter_stack) == 1, we pass through a

single adapter. iff len(adapter_stack) > 1 we fuse the adapters

Returns: hidden_states

add_fusion_layer(adapter_names)

See BertModel.add_attention_layer

enable_adapters(adapter_names: list, unfreeze_adapters: bool, unfreeze_fusion: bool)

Unfreezes a given list of adapters, the adapter fusion layer, or both

Parameters
  • adapter_names – names of adapters to unfreeze (or names of adapters part of the fusion layer to unfreeze)

  • unfreeze_adapters – whether the adapters themselves should be unfreezed

  • unfreeze_fusion – whether the adapter attention layer for the given adapters should be unfreezed

get_adapter_layer(adapter_name)

Depending on the adapter type we retrieve the correct layer. If no adapter for that name was set at that layer we return None :param adapter_name: string name of the adapter

Returns: layer | None

get_adapter_preparams(adapter_config, hidden_states, input_tensor)

Retrieves the hidden_states, query (for Fusion), and residual connection according to the set configuration :param adapter_config: config file according to what the parameters are passed :param hidden_states: output of previous layer :param input_tensor: residual connection before FFN

Returns: hidden_states, query, residual