Long-Range Transformer with Unlimited Length Input

Nikhil Verma
3 min readAug 25, 2023
ChatGPT, not able to generate 4000 word essay due to context window limitation

Pretrained transformers generally have a context window of 512 (e.g. BERT , T5 ) or 1024 tokens (e.g. BART), which are sufficient lengths for many current conditional generation datasets. But vanilla transformers cannot simply scale, as naïve self-attention operation has quadratic complexity. So the tasks that involve long narratives, such as book summarization, which may contain inputs exceeding 500K…

--

--

Nikhil Verma

Knowledge shared is knowledge squared | My Portfolio https://lihkinverma.github.io/portfolio/ | My blogs are living document, updated as I receive comments