A nice, fairly compact, somewhat technical view of how to use BERT for unsupervised training problems. And some unexpected uses. True, everyone should know how to use this method. Click through for full detail.
For unsupervised task solving ...
BERT is a prize addition to the practitioner’s toolbox By Ajit Rajasekharan in TowardsDataScience
Figure 1. Few reasons why BERT is a valuable addition to a practitioner’s toolbox apart from its well known use of fine-tuning for downstream tasks. (1) BERT’s learned vocabulary of vectors (in say 768 dimensional space) serve as targets that masked output vectors predict and learn from prediction errors during training. After training, these moving targets settle into landmarks that can be clustered and annotated (a one-time step) and used for classifying model output vectors in a variety of tasks — NER, relation extraction etc. (2) A model pre-trained enough to achieve a low next sentence prediction loss (in addition to the masked word prediction loss) yields quality CLS vectors representing any input term/phrase/sentence. ... '
No comments:
Post a Comment