Towards Economical Inference: Enabling DeepSeek’s Multi-Head Latent Attention in Any Transformer-based LLMs

Tao Ji | Bin Guo | Yuanbin Wu | Qipeng Guo | Shenlixing Shenlixing | Chenzhan Chenzhan | Xipeng Qiu | Qi Zhang | Tao Gui |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |