Embeddings

With adapters, we support dynamically adding, loading, and deleting of Embeddings. This section will give you an overview of these features. A toy example is illustrated in this notebook.

Adding and Deleting Embeddings

The methods for handling embeddings are similar to the ones handling adapters. To add new embeddings we call add_embeddings. This adds new embeddings for the vocabulary of the tokenizer. In some cases, it might be useful to initialize embeddings of tokens to the ones of another embeddings module. If a reference_embedding and reference_tokenizer are provided all embeddings for tokens that are present in both embeddings are initialized to the embedding provided by the reference_embedding. The new embedding will be created and set as the active embedding. If you are unsure which embedding is currently active, the active_embeddings property contains the currently active embedding.

model.add_embeddings('name', tokenizer, reference_embedding='default', reference_tokenizer=reference_tokenizer)

The original embedding of the transformers model is always available under the name "default". To set it as the active embedding simply call the set_active_embedding('name') method.

model.set_active_embeddings('name')

Similarly, all other embeddings can be set as active by passing their name to the set_active_embedding method.

To delete an embedding that is no longer needed, we can call the delete_embeddings method with the name of the adapter we want to delete. However, you cannot delete the default embedding.

model.delete_embeddings('name')

Please note, that if the active embedding is deleted the default embedding is set as the active embedding.

Training Embeddings

Embeddings can only be trained with an adapter. To freeze all weights except for the embedding and the adapter:

model.train_adapter('adapter_name', train_embeddings=True)

Except for the train_embeddings flag, the training is the same as for just training an adapter (see Adapter Training).

Saving and Loading Embeddings

You can save the embeddings by calling save_embeddings('path/to/dir', 'name') and load them with load_embeddings('path/to/dir', 'name').

model.save_embeddings(path, 'name')
model.load_embeddings(path, 'reloaded_name')

The path needs to be to a directory in which the weights of the embedding will be saved.

You can also save and load the tokenizer with the embedding by passing the tokenizer to save_embeddings.

model.save_embeddings(path, 'name', tokenizer)
loaded_tokenizer = model.load_embeddings(path, 'name')