❓ Bigger is better for LLMs, right?
❌ Not so. Recent AI research shows smaller models outperforming larger counterparts in many cases.
🌟 Meta recently released LIMA (Less Is More for Alignment), a 65B (billion parameters, or number of updateable values) model that outperformed GPT-4 in 43% of cases, outperformed Google’s Bard by 58%, and by 65% compared with DaVinci003.
🌟 DeepMind’s Chinchilla 70B outperformed Gopher 280B, GPT-3 175B, Jurassic-1 170B, and Megatron-Turin NLG 530B, on the MMLU benchmark. The paper by Hoffman et al. showed that the larger models were unnecessarily large and undertrained.
💡 Next time you’re designing a solution powered by LLMs, think carefully about the goal. Do you need the model to have knowledge of philosophy and nuclear physics, and be able to carry out many tasks? Or does the model only need to handle something specific like information retrieval or translation?
💡 Many organisations start with a do-it-all, off-the-shelf model like GPT-4. With some good prompt engineering, inference parameter tuning, and few shot inference to give the model some in-context examples, you may achieve just as good performance with a smaller model that you have more control and insight into, for less money.