BERT

The BERT model was proposed in BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding by Jacob Devlin, Ming-Wei Chang, Kenton Lee and Kristina Toutanova. It is a bidirectional transformer pre-trained using a combination of masked language modeling objective and next sentence prediction.

Note

This class is nearly identical to the PyTorch implementation of BERT in Huggingface Transformers. For more information, visit the corresponding section in their documentation.

BertConfig

BertTokenizer

BertModel

BertModelWithHeads

BertForPreTraining

BertForMaskedLM

BertForNextSentencePrediction

BertForSequenceClassification

BertForMultipleChoice

BertForTokenClassification

BertForQuestionAnswering