๐Ÿข LLMs can be slow. Working on a recommender solution, beautiful output but oh my, the latency of API batch!

๐Ÿš€ Step in API response streaming for reducing perceived latency, as the user sees near real-time LLM progress. Just like the #chatgpt you know.

๐Ÿ› ๏ธ Knocked this streaming demo app up. A basic Streamlit front-end, with #FastAPI (backend), LangChain (LLM orchestration), and #Azure OpenAI (LLM model) all configured with callback handlers and streaming enabled to send partial message deltas back to the client via WebSocket.

๐Ÿฅ‡ Looking forward to baking this into my recommender for a better customer experience!

๐Ÿ“ Demo backend is a boiled-down version of main py in Langchainโ€™s chat-langchain repo.

๐Ÿ“ฆ Demo app code here.