Qizhen Weng 翁祈桢

AI Infra Team Lead. Research Scientist. Ph.D. in CSE from HKUST.

photo_full.jpg

My research interests encompass AI Infrastructure, Machine Learning Systems, and Cloud Computing, with a particular emphasis on enhancing GPU cluster efficiency and optimizing training performance for large-scale generative models, such as large language models (LLMs), multimodal LLMs (MLLMs), and diffusion transformers (DiTs).

Since 2024, I have been leading the AI Infra Team at the Institute of Artificial Intelligence, China Telecom (TeleAI), where I oversee initiatives to advance AI system capabilities. Prior to this, I joined the Shanghai AI Laboratory in 2022 as a Systems Researcher, contributing to the systems for large language model training and inference. Earlier, I gained valuable experience as a Research Intern at Alibaba Cloud & Alibaba Group, where I focused on GPU cluster management and AI job scheduling for over two years, beginning in 2020.

I received my Ph.D. in Computer Science and Engineering from The Hong Kong University of Science and Technology in 2022, under the guidance of Prof. Wei Wang. I also hold a B.Eng. degree from Shanghai Jiao Tong University in 2017 and enriched my academic journey with a study period at UC Berkeley in 2015.

Awards


News & Highlights

Feb 22, 2025 💡Openings: I’m currently recruiting highly motivated students who can intern in Shanghai for 3+ months. If you’re excited about advancing AI through LLM/MLLM/DiT, please drop me an email with your CV. Experience with deep learning frameworks, distributed systems, or CUDA programming is a plus but not required.

Selected Publications (Full List)

  1. 2025.NSDI-Prism-Yang-preview.png
    GPU-Disaggregated Serving for Deep Learning Recommendation Models at Scale
    Lingyun Yang, Yongchen Wang, Yinghao Yu, Qizhen Weng, Jianbo Dong, Kan Liu, Chi Zhang, Yanyi Zi, Hao Li, Zechao Zhang, and 12 more authors
    In 22nd USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2025
  2. 2024.arXiv-LLM Survey-Duan-preview.png
    Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
    Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, and 6 more authors
    arXiv preprint arXiv:2407.20018, 2024
  3. 2024.arXiv-InternLM2-Cai-preview.svg
    InternLM2 Technical Report
    Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, and 90 more authors
    arXiv preprint arXiv:2403.17297, 2024
  4. 2023.arXiv-CaraServe-Li-preview.png
    CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
    Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, and Wei Wang
    arXiv preprint arXiv:2401.11240, 2024
  5. 2023.ATC-FGD-Weng-preview.png
    Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent
    Qizhen Weng, Lingyun Yang, Yinghao Yu, Wei Wang, Xiaochuan Tang, Guodong Yang, and Liping Zhang
    In 2023 USENIX Annual Technical Conference (ATC), 2023
  6. 2022.NSDI-MLaaS-Weng-preview.png
    MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters
    Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding
    In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2022
  7. 2020.SC-Metis-Wang_Weng-preview.png
    Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters at Scale
    Luping Wang, Qizhen Weng, Wei Wang, Chen Chen, and Bo Li
    In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020
  8. 2020.SoCC-LBBSP-Chen-preview.png
    Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments
    Chen Chen, Qizhen Weng, Wei Wang, Baochun Li, and Bo Li
    In 11th ACM Symposium on Cloud Computing (SoCC), 2020