Large Language Models are just another hype

Nikhil Verma
5 min readJul 15, 2023

--

Since the latter part of the previous year, in the midst of a flurry of remarkable technological advancements, ChatGPT has risen as a transformative force, captivating the global community with its remarkable intellectual prowess. Igniting an unstoppable wave of progress and innovation, this prodigious AI has sparked the birth of numerous ventures, revolutionizing industries across the globe.

Not only that it spurred the interest for knowledge distillation techniques but also pushed for smaller-but-similar race by introducing LLaMA, Alpaca, Vicuna and many others. However, as the dust is settling down(or still storming the landscape | for people like me to catch the low-hanging fruits), there is a discernible shift in the air and acceptance of some facts, which I am here to share/discuss with you. If you disagree or strongly oppose any thoughts, please mention them in the comment with your viewpoint.

  1. Capabilities of LLMs are increasing, not because of technical improvements but merely due to huge investments: When examining the history of models like GPT, it becomes apparent that their progress and enhanced performance have been propelled by significant financial resources and dedicated research efforts. These investments have allowed for the collection of vast amounts of high-quality data, the scaling of computational infrastructure, and the employment of larger and more sophisticated models. Consequently, the remarkable advancements observed in LLMs are a testament to the combined power of investment, research, and technological infrastructure rather than isolated technical improvements alone. Lets check the architecture difference in GPT-1,2,3 models themself. Dont they look same from a bird’s eye view.

2. Behavioural improvement(Understanding, generation, and performance optimization) across various domains is result of scaling LLMs: The predictability of task performance in Large Language Models (LLMs) during training is inherently challenging. Scaling the model alone does not guarantee improved performance across all tasks. While training focuses on reducing pre-training test loss, it is impossible to determine beforehand which tasks the model will excel at. Interestingly, a model may initially struggle with certain tasks even when doubled in size, but show remarkable improvement with significant scaling. Therefore, investing in LLMs entails a certain level of uncertainty, akin to gambling on the potential performance of the “mystery box” for a specific task. The discovery of few-shot learning capabilities in GPT-family models emerged post-training, and further advancements such as chain-of-thought reasoning were discovered months later. Empathy, arithmetic reasoning, and success in exams have all shown improvements as LLMs are scaled up.

Effect of scaling up models on different tasks

3. LLMs seem to create a “mental model” of the world and use it for reasoning: Apart from what these models are fed, LLMs learn some latent space using which these models reason at an abstractive level with precision.

This become more and more evident as the scaling of these models is happening rapidly. Like a private version of GPT-4, trained without any access to visual information, when asked to write instructions in a graphics programming language incrementally to draw a unicorn, showed quiet impressive drawings.

4. Guiding the LLMs for particular task is a tricky deed: There are no reliable techniques for steering the behaviour of LLMs. There are techniques like prompting, supervised fine-tuning and RLHF to provide guidance to models once pre-trained for autocomplete. But saying that they will always produce expected outputs for the task they are trained on is not a guarentte. Models can misinterpret ambiguous prompts which may seem unambiguous to human users and LLM appears to behave unexpectedly.

Also such fine-tuning can surface in the form of other problems like

  • Sycophancy: where a model answers subjective questions in a way that flatters their user’s stated beliefs
  • Sandbagging: where models are more likely to endorse common misconceptions when their user appears to be less educated

5. Experts are not yet able to interpret the inner workings of LLMs: Neither even a neuro scientist can concretely tell that will you have a craving of eating the vanilla ice-cream even after viewing its advertisement on the bill board and nor can a mother interpret that will her child ask for food at exactly 7PM in the evening or not. So do research scientist in NLP space cannot predict that what this generative technology will produce in its next go, because the play of mathematical numbers that these models do internally(Millions or Billions of parameters) is not possible for a human to do in head.

6. Human performance on a task isn’t an upper bound on LLM performance: Being human, you may excel in specific tasks based on your training, but Large Language Models (LLMs) are artificially created replicas trained to perform multiple tasks simultaneously. They are exposed to vast amounts of data beyond an individual’s capacity and you are slacking to complete the next book lying on your shelf to complete it for a while, are not you 😉 ? By training LLMs with reinforcement techniques, they are enhanced to reach a higher level of proficiency, akin to creating a superhuman capable of extraordinary feats.

7. Short interactions with Large Language Models (LLMs) can be deceptive: Their outputs are unpredictable and may differ with each interaction, leading to a wide range of responses. Drawing conclusions about the behaviour of an LLM based on a single text input can be misleading and may result in overgeneralization.

Harsh but TRUE, LLM research is a powerful tool for driving investment since it allows R&D teams to propose model-training projects costing many millions of dollars, with reasonable confidence that these projects will succeed but they are not end of the world and the research ground will be shaken by something else in a year or may be in months.

References:

Most of the points are taken from attached report, with amalgamation of my thoughts on top of them, since I more-or-less agree with the author and wish to spread his words 😀

--

--

Nikhil Verma

Knowledge shared is knowledge squared | My Portfolio https://lihkinverma.github.io/portfolio/ | My blogs are living document, updated as I receive comments