Components of Transformer Architecture

Nikhil Verma
2 min readNov 27, 2021

Sequence modelling is popularly done using Recurrent Neural network(RNN) or its advancements as gated RNNs or Long-short term memory(LSTM). Handling events sequentially hinders parallel processing and when sequences are too long, then the model could potentially forget long-range dependencies in the input or could mix positional content.

--

--

Nikhil Verma

Knowledge shared is knowledge squared | My Portfolio https://lihkinverma.github.io/portfolio/ | My blogs are living document, updated as I receive comments