Original DistilBERT model, checkpoints obtained from using teacher-student learning from the original BERT checkpoints.