BERT¶
The BERT model was proposed in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It is a bidirectional transformer pre-trained using a combination of masked language modeling objective and next sentence prediction.
Note
This class is nearly identical to the PyTorch implementation of BERT in Huggingface Transformers. For more information, visit the corresponding section in their documentation.