Minicache: Kv cache compression in depth dimension for large language models

Publication
Advances in Neural Information Processing Systems