APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs

Yuxiang Huang | Mingye Li | Xu Han | Chaojun Xiao | Weilin Zhao | Sun Ao | Hao Zhou | Jie Zhou | Zhiyuan Liu | Maosong Sun |

Paper Details:

Month: July
Year: 2025
Location: Vienna, Austria
Venue: ACL |