❓ Working in a domain with specialized vocabulary, semantics, and technical terms? Is that one-size-fits-all LLM working well for you?

🌟 Researchers at Bloomberg trained a model from scratch to excel in the finance domain. Consider sentiment analysis, where headlines such as “Company to cut 10,000 jobs” portrays negative sentiment in the general sense but can be considered positive for financial sentiment, resulting in the stock price or investor confidence increasing.

🌟 BloombergGPT followed the Chinchilla scaling laws that recognised increased data, model parameter size, and compute can lead to better performance but benefits plateau beyond a certain point. They managed to create a 50B model (OpenAI GPT-3 is 175B) with a dataset consisting of 51% financial data and 49% public data. They demonstrated the mixed training approach led to a model vastly outperforming existing models on in-domain financial tasks while being on par or better on general NLP benchmarks. Here’s their paper.

💡 For specific tasks, a smaller, sub-billion parameter open-source model pre-trained on your jargon and language nuances from in-domain data could yield a powerful and affordable alternative to closed-source general-purpose models.