Long-Range Transformer with Unlimited Length Input

3 min readAug 25, 2023

ChatGPT, not able to generate 4000 word essay due to context window limitation

Pretrained transformers generally have a context window of 512 (e.g. BERT , T5 ) or 1024 tokens (e.g. BART), which are sufficient lengths for many current conditional generation datasets. But vanilla transformers cannot simply scale, as naïve self-attention operation has quadratic complexity. So the tasks that involve long narratives, such as book summarization, which may contain inputs exceeding 500K…

Long-Range Transformer with Unlimited Length Input

Written by Nikhil Verma