Member-only story

BERT Rediscovers the Classical NLP Pipeline

3 min readJan 8, 2022

source: Google search for probing machine learning

Interpretability of complex machine learning algorithms helps to create human trust and belief in its outcome instead of just treating these models as black boxes. In contrast, simple machine learning models learned on task motivated features are easy to interpret. Syntax trees, semantic labels or word classes were some of linguistically motivated features used earlier to understand aspects of natural language.

Pre-trained models such as BERT for language modelling have become very popular in recent times but at same time started a discussion on what these transduction model architectures have learned and what remains to be learned. To understand what contextual embeddings produced by pre-trained models carry in them, the author trained classifiers to predict various linguistic properties of input which are linguistic probes to understand model’s internal representations. The suite of edge probing tasks covered a broad range of syntactic, semantic, local, and long-range phenomena. This probing helps in asking the information that is encoded at each position, and how well a model encodes structural information about that word’s role in the sentence[1].

The probing model trained on dataset for classifying instances, has weights which were initialised randomly and then trained using gradient descent. At that time BERT weights were kept constant so…

BERT Rediscovers the Classical NLP Pipeline

Written by Nikhil Verma

No responses yet