Embeddings

With adapters, we support dynamically adding, loading, and deleting of Embeddings. This section will give you an overview of these features. A toy example is illustrated in this notebook.

Adding and Deleting Embeddings

The methods for handling embeddings are similar to the ones handling adapters. To add new embeddings we call add_embeddings. This adds new embeddings for the vocabulary of the tokenizer. In some cases, it might be useful to initialize embeddings of tokens to the ones of another embeddings module. If a reference_embedding and reference_tokenizer are provided all embeddings for tokens that are present in both embeddings are initialized to the embedding provided by the reference_embedding. The new embedding will be created and set as the active embedding. If you are unsure which embedding is currently active, the active_embeddings property contains the currently active embedding.

model.add_embeddings('name', tokenizer, reference_embedding='default', reference_tokenizer=reference_tokenizer)

The original embedding of the transformers model is always available under the name "default". To set it as the active embedding simply call the set_active_embedding('name') method.

model.set_active_embeddings('name')

Similarly, all other embeddings can be set as active by passing their name to the set_active_embedding method.

To delete an embedding that is no longer needed, we can call the delete_embeddings method with the name of the adapter we want to delete. However, you cannot delete the default embedding.

model.delete_embeddings('name')

Please note, that if the active embedding is deleted the default embedding is set as the active embedding.

Training Embeddings

Embeddings can only be trained with an adapter. To freeze all weights except for the embedding and the adapter:

model.train_adapter('adapter_name', train_embeddings=True)

Except for the train_embeddings flag, the training is the same as for just training an adapter (see Adapter Training).

Saving and Loading Embeddings

You can save the embeddings by calling save_embeddings('path/to/dir', 'name') and load them with load_embeddings('path/to/dir', 'name').

model.save_embeddings(path, 'name')
model.load_embeddings(path, 'reloaded_name')

The path needs to be to a directory in which the weights of the embedding will be saved.

You can also save and load the tokenizer with the embedding by passing the tokenizer to save_embeddings.

model.save_embeddings(path, 'name', tokenizer)
loaded_tokenizer = model.load_embeddings(path, 'name')