DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving
Published in Proceedings of the 41st International Conference on Machine Learning (ICML 2024), 2024
Recommended citation: Foteini Strati, Sara Mcallister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic, In Proceedings of the 41st International Conference on Machine Learning (ICML 2024) /files/2024-dejavu.pdf