(Selected Publications. * equal contribution)

Preprint

  • FlashQLA: CP- and Bwd-Friendly Fused Linear Attention Kernels for GDN
    Chengruidong Zhang, Xi Lin, Huiqiang Jiang, Zekun Wang, Xiao Li, Yizhong Cao, Bohan Zhuang, Rui Men, Jianwei Zhang, Bo Zheng, Junyang Lin, Dayiheng Liu, Jingren Zhou
    [Blog][Code][Zhihu]

  • Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation
    Inferix Team: Tianyu Feng, Yizeng Han, Jiahao He, Yuanyu He, Xi Lin, Teng Liu, Hanfeng Lu, Jiasheng Tang, Wei Wang, Zhiyuan Wang, Jichao Wu, Mingyang Yang, Yinghao Yu, Zeyu Zhang, Bohan Zhuang
    [Paper][Code][HuggingFace]

  • Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
    Weijie Wang*, Qihang Cao*, Sensen Gao*, Donny Y. Chen, Haofei Xu, Wenjing Bian, Songyou Peng, Tat-Jen Cham, Chuanxia Zheng, Andreas Geiger, Jianfei Cai, Jia-Wang Bian, Bohan Zhuang
    [Paper][Project Page][GitHub]

  • FlashAR: Efficient Post-Training Acceleration for Autoregressive Image Generation
    Junkang Zhou*, Yefei He*, Feng Chen*, Weijie Wang, Bohan Zhuang
    [Paper][Project Page][GitHub]

  • TriSplat: Simulation-Ready Feed-Forward 3D Scene Reconstruction
    Weijie Wang, Zimu Li, Jinchuan Shi, Zeyu Zhang, Botao Ye, Marc Pollefeys, Donny Y. Chen, Bohan Zhuang
    [Paper][Project Page][GitHub]

  • ReCA: Multi-Shot Long Video Extrapolation via Recursive Context Allocation
    Akide Liu, Jinbo Xing, Chaojie Mao, Ye Li, Zeyu Zhang, Yefei He, Weijie Wang, Zihan Wang, Yu Liu, Gholamreza Haffari, Bohan Zhuang [Paper][Project Page]

  • BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation
    Zeyu Zhang, Shuning Chang, Yuanyu He, Yizeng Han, Jiasheng Tang, Fan Wang, Bohan Zhuang
    [Paper][Code][Project Page]

  • PSA: Pyramid Sparse Attention for Efficient Video Understanding and Generation
    Xiaolong Li*, Youping Gu*, Xi Lin*, Weijie Wang, Bohan Zhuang
    [Paper][Code][Project Page]

  • R-Stitch: Dynamic Trajectory Stitching for Efficient Reasoning
    Zhuokun Chen, Zeren Chen, Jiahao He, Mingkui Tan, Jianfei Cai, Bohan Zhuang
    [Paper][Project Page]

  • Few-Step Distillation for Text-to-Image Generation: A Practical Guide
    Yifan Pu*, Yizeng Han*, Zhiwei Tang*, Jiasheng Tang, Fan Wang, Bohan Zhuang, Gao Huang
    [Paper][Project Page]

  • VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction
    Weijie Wang*, Yeqing Chen*, Zeyu Zhang, Hengyu Liu, Haoxiao Wang, Zhiyuan Feng, Wenkang Qin, Zheng Zhu , Donny Y. Chen, Bohan Zhuang
    [Paper][Project Page][Hugging Face]

2026

  • World-R1: Reinforcing 3D Constraints for Text-to-Video Generation
    Weijie Wang*, Xiaoxuan He*, Youping Gu*, Yifan Yang, Zeyu Zhang, Yefei He, Yanbo Ding, Xirui Hu, Donny Y. Chen, Zhiyuan He, Yuqing Yang, Bohan Zhuang
    [Paper][Code][Project Page][HuggingFace] ICML 2026

  • TriAttention: Efficient Long Reasoning with Trigonometric KV Compression
    Weian Mao*, Xi Lin*, Wei Huang*, Yuxin Xie, Tianfu Fu, Bohan Zhuang, Song Han, Yukang Chen
    [Paper][Code][Project Page][HuggingFace] ICML 2026

  • Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization
    Xiaoxuan He, Siming Fu, Zeyue Xue, Weijie Wang, Ruizhe He, Yuming Li, Dacheng Yin, Shuai Dong, Haoyang Huang, Hongfa Wang, Nan Duan, Bohan Zhuang
    [Paper][Code][Project Page][HuggingFace] ICML 2026

  • FlashBlock: Attention Caching for Efficient Long-Context Block Diffusion
    Zhuokun Chen, Jianfei Cai, Bohan Zhuang
    [Paper][Project Page] ICML 2026

  • BLADE: Block-Sparse Attention Meets Step Distillation for Efficient Video Generation
    Youping Gu*, Xiaolong Li*, Yuhao Hu, Minqi Chen, Bohan Zhuang
    [Paper][Project Page] ICLR 2026

  • Sparsity Forcing: Reinforcing Token Sparsity of MLLMs
    Feng Chen, Yefei He, Lequan Lin, Chenhui Gou, Jing Liu, Bohan Zhuang, Qi Wu
    [Paper] ICLR 2026

  • RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
    Wangbo Zhao, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Pengfei Zhou, Kai Wang, Bohan Zhuang, Zhangyang Wang, Fan Wang, Yang You
    [Paper] ICLR 2026

  • COV: Chain-of-View Prompting for Spatial Reasoning
    Haoyu Zhao, Akide Liu, Zeyu Zhang, Weijie Wang, Feng Chen, Ruihan Zhu, Gholamreza Haffari, Bohan Zhuang
    [Paper][Code][Project Page][HuggingFace] ACL 2026 (Findings)

  • Geometrically-Constrained Agent for Spatial Reasoning
    Zeren Chen, Xiaoya Lu, Zhijie Zheng, Pengrui Li, Lehan He, Yijin Zhou, Jing Shao, Bohan Zhuang, Lu Sheng
    [Paper][Project Page] CVPR 2026

  • An Empirical Study on How Video-LLMs Answer Videos Questions
    Chenhui Gou, Ziyu Ma, Zicheng Duan, Haoyu He, Feng Chen, Akide Liu, Bohan Zhuang, Jianfei Cai, Hamid Rezatofighi
    [Paper] CVPR 2026

2025

  • FPSAttention: Training-Aware FP8 and Sparsity Co-Design for Fast Video Diffusion
    Akide Liu*, Zeyu Zhang*, Zhexin Li, Xuehai Bai, Yuanjie Xing, Yizeng Han, Jiasheng Tang, Jichao Wu, Mingyang Yang, Weihua Chen, Jiahao He, Yuanyu He, Fan Wang , Gholamreza Haffari, Bohan Zhuang
    [Paper][Project Page] NeurIPS 2025 (Spotlight)

  • ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS
    Weijie Wang, Donny Y. Chen, Zeyu Zhang, Duochao Shi, Akide Liu, Bohan Zhuang
    [Paper][Project Page] NeurIPS 2025

  • ZipAR: Parallel Autoregressive Image Generation through Spatial Locality
    Yefei He, Feng Chen, Yuanyu He, Shaoxuan He, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
    [Paper][Code] ICML 2025

  • T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with Trajectory Stitching
    Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei Xiao, Jianfei Cai, Anima Anandkumar
    [Paper][Code][Project Page] ICLR 2025

  • Are Large Vision Language Models Good Game Players?
    Xinyu Wang, Bohan Zhuang, Qi Wu
    [Paper][Code] ICLR 2025

  • Neighboring Autoregressive Modeling for Efficient Visual Generation
    Yefei He*, Yuanyu He*, Shaoxuan He*, Feng Chen*, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
    [Paper][Code][Project Page] ICCV 2025

  • ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification
    Yefei He, Feng Chen, Jing Liu, Wenqi Shao, Hong Zhou, Kaipeng Zhang, Bohan Zhuang
    [Paper] ICCV 2025

  • Frequency-Aware Autoregressive Modeling for Efficient High-Resolution Image Synthesis
    Zhuokun Chen, Jugang Fan, Zhuowei Yu, Bohan Zhuang, Mingkui Tan
    [Paper][Code] ICCV 2025

  • Channel Merging: Preserving Specialization for Merged Experts
    Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan Zhuang
    [Paper] AAAI 2025 (Oral)

  • Motion Anything: Any to Motion Generation
    Zeyu Zhang, Yiran Wang, Wei Mao, Danning Li, Rui Zhao, Biao Wu, Zirui Song, Bohan Zhuang, Ian Reid, Richard Hartley
    [Paper][Project]

2024

  • MiniCache: KV Cache Compression in Depth Dimension for Large Language Models
    Akide Liu, Jing Liu, Zizheng Pan, Yefei He, Gholamreza Haffari, Bohan Zhuang
    [Paper][Code] NeurIPS 2024

  • ZipCache: Accurate and Efficient KV Cache Quantization with Salient Token Identification
    Yefei He, Luoming Zhang, Weijia Wu, Jing Liu, Hong Zhou, Bohan Zhuang
    [Paper][Code] NeurIPS 2024

  • MVSplat360: Feed Forward 360° Scene Synthesis from Sparse Views
    Yuedong Chen, Chuanxia Zheng, Haofei Xu, Bohan Zhuang, Andrea Vedaldi, Tat-Jen Cham, Jianfei Cai
    [Paper][Project Page] NeurIPS 2024

  • QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Models
    Jing Liu, Ruihao Gong, Xiuying Wei, Zhiwei Dong, Jianfei Cai, Bohan Zhuang
    [Paper][Code] ICLR 2024

  • Object-Aware Inversion and Reassembly for Image Editing
    Zhen Yang, Ganggui Ding, Wen Wang, Hao Chen, Bohan Zhuang, Chunhua Shen
    [Paper][Project Page] ICLR 2024

  • EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models
    Hefei He, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
    [Paper][Code] ICLR 2024 (Spotlight)

  • GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI
    Pengcheng Chen*, Jin Ye*, Guoan Wang*, Yanjun Li, Zhongying Deng, Wei Li, Tianbin Li, Haodong Duan, Ziyan Huang, Yanzhou Su, Benyou Wang, Shaoting Zhang, Bin Fu, Jianfei Cai, Bohan Zhuang, Eric J Seibel, Junjun He, Yu Qiao
    [Paper][Project Page] NeurIPS 2024 Datasets and Benchmarks Track

  • LongVLM: Efficient Long Video Understanding via Large Language Models
    Yuetian Weng, Mingfei Han, Haoyu He, Xiaojun Chang, Bohan Zhuang
    [Paper][Code] ECCV 2024 (Oral)

  • MVSplat: Efficient 3D Gaussian Splatting from Sparse Multi-View Images
    Yuedong Chen, Haofei Xu, Chuanxia Zheng, Bohan Zhuang, Marc Pollefeys, Andreas Geiger, Tat-Jen Cham, Jianfei Cai
    [Paper][Project Page] ECCV 2024 (Oral)

  • Stitched ViTs are Flexible Vision Backbones
    Zizheng Pan, Jing Liu, Haoyu He, Jianfei Cai, Bohan Zhuang
    [Paper][Code] ECCV 2024

  • Motion Mamba: Efficient and Long Sequence Motion Generation
    Zeyu Zhang, Akide Liu, Ian Reid, Richard Hartley, Bohan Zhuang, Hao Tang
    [Paper][Project Page] ECCV 2024

  • Efficient Stitchable Task Adaptation
    Haoyu He, Zizheng Pan, Jing Liu, Jianfei Cai, Bohan Zhuang
    [Paper][Code] CVPR 2024

  • ModaVerse: Efficiently Transforming Modalities with LLMs
    Xinyu Wang, Bohan Zhuang, Qi Wu
    [Paper][Code] CVPR 2024

  • LoRAPrune: Structured Pruning Meets Low-Rank Parameter-Efficient Fine-Tuning
    Mingyang Zhang, Hao Chen, Chunhua Shen, Zhen Yang, Linlin Ou, Xinyi Yu, Bohan Zhuang
    [Paper][Code] ACL Findings 2024

  • SAM-Med3D-MoE: Towards a Non-Forgetting Segment Anything Model via Mixture of Experts for 3D Medical Image Segmentation
    Guoan Wang*, Jin Ye*, Junlong Cheng, Tianbin Li, Zhaolin Chen, Jianfei Cai, Junjun He, Bohan Zhuang
    [Paper] MICCAI 2024

  • ME-Switch: A Memory-Efficient Expert Switching Framework for Large Language Models
    Jing Liu, Ruihao Gong, Mingyang Zhang, Yefei He, Jianfei Cai, Bohan Zhuang
    [Paper]

  • Enhancing Perception Capabilities of Multimodal LLMs with Training-Free Fusion
    Zhuokun Chen, Jinwu Hu, Zeshuai Deng, Yufeng Wang, Bohan Zhuang, Mingkui Tan
    [Paper]

  • Evaluating and Advancing Multimodal Large Language Models in Ability Lens
    Feng Chen, Chenhui Gou, Jing Liu, Yang Yang, Zhaoyang Li, Jiyuan Zhang, Zhenbang Sun, Bohan Zhuang, Qi Wu
    [Paper]

2023

  • Stitchable Neural Networks
    Zizheng Pan; Jianfei Cai; Bohan Zhuang
    [Paper][Project Page] CVPR 2023 (Highlight)

  • PTQD: Accurate Post-Training Quantization for Diffusion Models
    Yefei He, Luping Liu, Jing Liu, Weijia Wu, Hong Zhou, Bohan Zhuang
    [Paper][Code] NeurIPS 2023

  • Mask Propagation for Efficient Video Semantic Segmentation
    Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Xiaojun Chang, Bohan Zhuang
    [Paper][Code] NeurIPS 2023

  • Second-Order Degradation and Reconstruction for Test-Time Image Super-Resolution
    Zeshuai Deng, Zhuokun Chen, Shuaicheng Niu, Thomas H. Li, Bohan Zhuang, Mingkui Tan
    [Paper][Code] NeurIPS 2023

  • Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
    Haoyu He, Jianfei Cai, Jing Zhang, Dacheng Tao, Bohan Zhuang
    [Paper][Code] ICCV 2023 (Oral)

  • BiViT: Extremely Compressed Binary Vision Transformer
    Yefei He, Zhenyu Lou, Luoming Zhang, Hong Zhou, Bohan Zhuang
    [Paper] ICCV 2023

  • Dynamic Focus-aware Positional Queries for Semantic Segmentation
    Haoyu He, Jianfei Cai, Zizheng Pan, Jing Liu, Jing Zhang, Dacheng Tao, Bohan Zhuang
    [Paper][Code] CVPR 2023

  • End-to-end One-shot Human Parsing
    Haoyu He, Jing Zhang, Bohan Zhuang, Jianfei Cai, Dacheng Tao
    [Paper][Code] TPAMI 2023

  • Single-path Bit Sharing for Automatic Loss-aware Model Compression
    Jing Liu, Bohan Zhuang, Peng Chen, Yong Guo, Chunhua Shen, Jianfei Cai, Mingkui Tan
    [Paper] TPAMI 2023

  • Pruning Self-attentions into Convolutional Layers in Single Path
    Haoyu He, Jing Liu, Zizheng Pan, Jianfei Cai, Jing Zhang, Dacheng Tao, Bohan Zhuang
    [Paper][Code] TPAMI 2023

  • A Survey on Efficient Training of Transformers
    Bohan Zhuang, Jing Liu, Zizheng Pan, Haoyu He, Yuetian Weng, Chunhua Shen
    [Paper] IJCAI 2023

2022

  • EcoFormer: Energy-Saving Attention with Linear Complexity
    Jing Liu, Zizheng Pan, Haoyu He, Jianfei Cai, Bohan Zhuang
    [Paper][Code] NeurIPS 2022 (Spotlight)

  • Fast Vision Transformers with HiLo Attention
    Zizheng Pan, Jianfei Cai, Bohan Zhuang
    [Paper][Code] NeurIPS 2022 (Spotlight)

  • Automated Progressive Learning for Efficient Training of Vision Transformers
    Changlin Li, Bohan Zhuang, Guangrun Wang, Xiaodan Liang, Xiaojun Chang, Yi Yang
    [Paper] CVPR 2022

  • Less is More: Pay Less Attention in Vision Transformers
    Zizheng Pan, Bohan Zhuang, Haoyu He, Jing Liu, Jianfei Cai
    [Paper][Code] AAAI 2022

  • An Efficient Spatio-Temporal Pyramid Transformer for Action Detection
    Yuetian Weng, Zizheng Pan, Mingfei Han, Xiaojun Chang, Bohan Zhuang
    [Paper] ECCV 2022

  • Structured Binary Neural Networks for Image Recognition
    Bohan Zhuang, Chunhua Shen, Mingkui Tan, Peng Chen, Lingqiao Liu, Ian Reid
    [Paper] IJCV 2022

2021

  • Mesa: A Memory-saving Training Framework for Transformers
    Zizheng Pan, Peng Chen, Haoyu He, Jing Liu, Jianfei Cai, Bohan Zhuang
    [Paper][Code]

  • Sharpness-aware Quantization for Deep Neural Networks
    Jing Liu, Jianfei Cai, Bohan Zhuang
    [Paper][Code]

  • Scalable Visual Transformers with Hierarchical Pooling
    Zizheng Pan, Bohan Zhuang, Jing Liu, Haoyu He, Jianfei Cai
    [Paper][Code] ICCV 2021

  • FATNN: Fast and Accurate Ternary Neural Networks
    Peng Chen, Bohan Zhuang*, Chunhua Shen
    [Paper][Code] ICCV 2021

  • Effective Training of Convolutional Neural Networks with Low-bitwidth Weights and Activations
    Bohan Zhuang, Mingkui Tan, Jing Liu, Lingqiao Liu, Ian Reid, Chunhua Shen
    [Paper][Code] TPAMI 2021

  • Discrimination-aware Network Pruning for Deep Model Compression
    Jing Liu*, Bohan Zhuang*, Zhuangwei Zhuang*, Yong Guo, Junzhou Huang, Jinhui Zhu, Mingkui Tan
    [Paper][Code] TPAMI 2021

  • AQD: Towards Accurate Quantized Object Detection
    Peng Chen*, Jing Liu*, Bohan Zhuang, Mingkui Tan, Chunhua Shen
    [Paper][Code] CVPR 2021 (Oral)