Sunday, 23 November 2025

Mastering BERT and DistilBERT

This is BERT.  

Introduced in 2018, it stands for "Bidirectional Encoder Representations from Transformers". 

The "bidirectional" component implies it use context to the left and right of critical words.

It's GLUE score is 80%.

This is DistilBERT, introduced in the context of edge computing.

In the paper, the authors also point out "We have made the trained weights available along with the training code in the Transformers library from HuggingFace".
 
It is worthwhile to study BERT as DistilBERT has the same general architecture as BERT.

No comments: