DéjàVu: KV-cache Streaming for Fast, Fault-tolerant Generative LLM Serving

Published in Proceedings of the 41st International Conference on Machine Learning (ICML 2024), 2024

Recommended citation: Foteini Strati, Sara Mcallister, Amar Phanishayee, Jakub Tarnawski, Ana Klimovic, In Proceedings of the 41st International Conference on Machine Learning (ICML 2024) /files/2024-dejavu.pdf