ScaleLLM: A Resource-Frugal LLM Serving Framework by Optimizing End-to-End Efficiency

Yuhang Yao | Han Jin | Alay Dilipbhai Shah | Shanshan Han | Zijian Hu | Dimitris Stripelis | Yide Ran | Zhaozhuo Xu | Salman Avestimehr | Chaoyang He |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, US
Venue: EMNLP |