QUIK: Towards End-to-end 4-Bit Inference on Generative Large Language Models

Saleh Ashkboos | Ilia Markov | Elias Frantar | Tingxuan Zhong | Xincheng Wang | Jie Ren | Torsten Hoefler | Dan Alistarh |

Paper Details:

Month: November
Year: 2024
Location: Miami, Florida, USA
Venue: EMNLP |

Citations

URL

No Citations Yet