🛠️ “Building LLM applications for production” by Chip Huyen, author of “Designing Machine Learning Systems”
📖 Well that was a worthy lunchtime read! Some key points I took from Chip’s blog on the challenges & solutions of productionizing LLM apps:
đź’ˇ Apply as much engineering rigor as possible to mitigate the natural language ambiguity & non-deterministic nature of LLM output.
đź’ˇEvaluate prompts with few-shot examples. Verify the model understands the examples.
đź’ˇ Unit-test all prompts using output evaluation to catch model degradation.
đź’ˇ Version every prompt change! Even the smallest changes can significantly impact results.
đź’ˇ I knew about classical ML model ensembles with majority voting but never considered it for LLM output. Chip offers one idea to generate many outputs for the same input, & pick the final output by a consistency check. An awesome idea I can adopt today in using an LLM to build knowledgebase master data!
đź’ˇAs costs & latency change, regularly do a feasibility estimation comparing buy (using paid APIs, e.g. Azure OpenAI) vs. build (using open source models).
đź’ˇ 3 main factors when considering prompting vs. finetuning: data availability, performance, & cost. Prompting gets you started fast. Generally, fine-tuning will give you better performance (% improvement varies).
đź’ˇTry distillation, whereby a smaller, cheaper, faster model (e.g., LLaMa 7B) is trained on examples output by a larger model (e.g., GPT-3.5-turbo).
đź’ˇ Can any of your LLM output be considered static text? If yes, embed it in a vector store for a one-time cost, then you only have retrieval cost after that.
đź’ˇIf using tasks/agents as part of a chain flow, test them both individually & together.
âť“Any points resonate with you?
✨ There’s a ton more in this excellent blog, go check the link. Thanks for sharing Chip Huyen