Squeezed Attention: Accelerating Long Context Length LLM Inference

Coleman Richard Charles Hooper | Sehoon Kim | Hiva Mohammadzadeh | Monishwaran Maheswaran | Sebastian Zhao | June Paik | Michael W. Mahoney | Kurt Keutzer | Amir Gholami |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |