Day 2 of 100 days of large language models

🔥 Ignited my brain cells with 4 papers read today as my LLM journey continues.

1️⃣ “Attention Is All You Need” by Vaswani et al., Google, 2017
🦸‍♂️ Comes with a superpower called attention! Figures out important words and how they relate to others in a sequence. Revisited this classic seminal paper, introducing the transformer architecture, the basis of current state-of-the-art (SOTA) models like ChatGPT.

2️⃣ “GPT-3: Its Nature, Scope, Limits, and Consequences” by Floridi, 2020
⚠️ Highlights the limitations of GPT-3. Includes a thought-provoking and insightful “Consequences” section; exploring the impact on creative human writing, verifying sources of text and the challenge of fake news, taking responsibility for generated output, and how humanity needs to improve our critical thinking.

3️⃣ “Training language models to follow instructions with human feedback” by Ouyang et al., OpenAI, 2022
💪 Human feedback for the win! Insight into training of architecture behind ChatGPT, with humans in the loop ranking prompt outputs to train reward models. This RLHF (Reinforcement Learning with Human Feedback) approach demonstrated the InstructGPT model outperformed larger models in truthfulness, toxicity, and user intent despite fewer parameters.

4️⃣ “LaMDA: Language Models for Dialog Applications” by Thoppilan et al., Google, 2022
🔍Fundamental to this family of transformer-based dialog models is their utilization of external knowledge sources to address safety and factual grounding by measuring dialog output with metrics for both. The researchers found that “augmenting model outputs with the ability to use external tools, such as an information retrieval system, is a promising approach to achieve” factual grounding.

Day 2 of 100 days of large language models

Read 4 LLM-related papers

Day 2 of 100 days of large language models

Read 4 LLM-related papers