💥 Bam! You’d think the A.I. powering ChatGPT and other LLMs now advising us on travel and nutrition, acting as our personal assistant, generating poetry, and writing code emerged from thin air! Not so.
📜 Kicking off #100DaysOfLargeLanguageModels, I’m parking my coding itch, taking a moment to appreciate the rich history of LLMs, here in brief.
🔍 It all started with rule-based systems like ELIZA in the 1960s - simple pattern matching and substitution, generating responses based on predefined templates.
🧠In 2003, Bengio et al. introduced the concept of a neural network-based language model. It learned from lots of examples to predict what words are most probable given the words that came before.
👑 Mikolov et al.’s Word2Vec (2013) represented words with vectors (a list of numbers). By analyzing a lot of text, it was possible to see what words commonly hang out close together in a multi-dimensional vector space, like dog and pet, or King and Queen.
🤖 The 2017 introduction of the Transformer model marked a significant turning point. Transformers use attention mechanisms to capture long-range dependencies in text. In other words, they assign importance with greater weights to different words in a sequence to establish context and relevance. Imagine “I went to the park. It was sunny and full of people.”. The attention mechanism can connect the word “park” in the first sentence with the word “full” in the second sentence. It recognizes that these two words are related, despite separated by another sentence. The paper “Attention Is All You Need” by Vaswani et al, June 2017 is a must read!
💬 Today’s conversational models like OpenAI ChatGPT and Google BARD are trained on vast text data to understand the intricacies of human language, including humour (ask ChatGPT to rant as a fed-up, demeaned human-serving A.I. sometime, it’s hilarious and amazing). Improvements to performance, bias reduction, ethical use, fact-checking, and integrating an up-to-date view of the world continue.