Day 22 of 100 days of large language models

🛠️ “Building LLM applications for production” by Chip Huyen, author of “Designing Machine Learning Systems”

📖 Well that was a worthy lunchtime read! Some key points I took from Chip’s blog on the challenges & solutions of productionizing LLM apps:

💡 Apply as much engineering rigor as possible to mitigate the natural language ambiguity & non-deterministic nature of LLM output.

💡Evaluate prompts with few-shot examples. Verify the model understands the examples.

💡 Unit-test all prompts using output evaluation to catch model degradation.

💡 Version every prompt change! Even the smallest changes can significantly impact results.

💡 I knew about classical ML model ensembles with majority voting but never considered it for LLM output. Chip offers one idea to generate many outputs for the same input, & pick the final output by a consistency check. An awesome idea I can adopt today in using an LLM to build knowledgebase master data!

💡As costs & latency change, regularly do a feasibility estimation comparing buy (using paid APIs, e.g. Azure OpenAI) vs. build (using open source models).

💡 3 main factors when considering prompting vs. finetuning: data availability, performance, & cost. Prompting gets you started fast. Generally, fine-tuning will give you better performance (% improvement varies).

💡Try distillation, whereby a smaller, cheaper, faster model (e.g., LLaMa 7B) is trained on examples output by a larger model (e.g., GPT-3.5-turbo).

💡 Can any of your LLM output be considered static text? If yes, embed it in a vector store for a one-time cost, then you only have retrieval cost after that.

💡If using tasks/agents as part of a chain flow, test them both individually & together.

❓Any points resonate with you?

✨ There’s a ton more in this excellent blog, go check the link. Thanks for sharing Chip Huyen