QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models

Publication
International Conference on Learning Representations (ICLR)