NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
APB: Accelerating Distributed Long-Context Inference by Passing Compressed Context Blocks across GPUs
Yuxiang Huang
|
Mingye Li
|
Xu Han
|
Chaojun Xiao
|
Weilin Zhao
|
Sun Ao
|
Hao Zhou
|
Jie Zhou
|
Zhiyuan Liu
|
Maosong Sun
|
Paper Details:
Month: July
Year: 2025
Location: Vienna, Austria
Venue:
ACL |
Citations
URL
No Citations Yet
https://github.com/thunlp/APB
https://huggingface.co/gradientai/Llama-3-8B-Instruct-
https://github.com/huggingface/transformers
https://github.com/mobiusml/hqq
https://huggingface.co/datasets/wenbopan/anti-haystack
Field Of Study