NLPExplorer
Papers
Venues
Authors
Authors Timeline
Field of Study
URLs
ACL N-gram Stats
TweeNLP
API
Team
Scaling Down, Serving Fast: Compressing and Deploying Efficient LLMs for Recommendation Systems
Kayhan Behdin
|
Ata Fatahibaarzi
|
Qingquan Song
|
Yun Dai
|
Aman Gupta
|
Zhipeng Wang
|
Hejian Sang
|
Shao Tang
|
Gregory Dexter
|
Sirou Zhu
|
Siyu Zhu
|
Tejas Dharamsi
|
Vignesh Kothapalli
|
Zhoutong Fu
|
Yihan Cao
|
Pin-Lun Hsu
|
Fedor Borisyuk
|
Natesh S. Pillai
|
Luke Simon
|
Rahul Mazumder
|
Paper Details:
Month: November
Year: 2025
Location: Suzhou (China)
Venue:
EMNLP |
Citations
URL
No Citations Yet
https://github.com/NVIDIA/TensorRT-LLM
https://github.com/linkedin/FMCHISEL
Field Of Study