Qizhen Weng 翁祈桢

Systems Researcher. Ph.D. from HKUST.

photo_full.jpg

My research interests broadly span AI Infrastructure, Machine Learning Systems, and Cloud Computing, with a special focus on GPU cluster management and large-scale model training, inference, and fine-tuning.

I joined Shanghai AI Laboratory after receiving the Ph.D. degree (2022) in Computer Science and Engineering from The Hong Kong University of Science and Technology, supervised by Prof. Wei Wang. I had been a Research Intern (2020) on cluster management in Alibaba for over two years. Before that, I obtained the B.Eng. degree (2017) from Shanghai Jiao Tong University and studied (2015) in UC Berkeley.


News & Highlights

Mar 21, 2024 💡Openings: I’m currently recruiting highly motivated students who can intern in Shanghai for 3+ months. If you’re excited about advancing AI through large language models (e.g., InternLM [en/zh]), please fill in this form. Experience with deep learning frameworks, distributed systems, or reinforcement learning is a plus but not required.

Selected Publications (Full)

  1. 2024.arXiv-LLM Survey-Duan-preview.png
    Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
    Jiangfei Duan, Shuo Zhang, Zerui Wang, Lijuan Jiang, Wenwen Qu, Qinghao Hu, Guoteng Wang, Qizhen Weng, Hang Yan, Xingcheng Zhang, and 6 more authors
    arXiv preprint arXiv:2407.20018, 2024
  2. 2024.arXiv-InternLM2-Cai-preview.svg
    InternLM2 Technical Report
    Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, and 90 more authors
    arXiv preprint arXiv:2403.17297, 2024
  3. 2023.arXiv-CaraServe-Li-preview.png
    CaraServe: CPU-Assisted and Rank-Aware LoRA Serving for Generative LLM Inference
    Suyi Li, Hanfeng Lu, Tianyuan Wu, Minchen Yu, Qizhen Weng, Xusheng Chen, Yizhou Shan, Binhang Yuan, and Wei Wang
    arXiv preprint arXiv:2401.11240, 2024
  4. 2023.ATC-FGD-Weng-preview.png
    Beware of Fragmentation: Scheduling GPU-Sharing Workloads with Fragmentation Gradient Descent
    Qizhen Weng, Lingyun Yang, Yinghao Yu, Wei Wang, Xiaochuan Tang, Guodong Yang, and Liping Zhang
    In 2023 USENIX Annual Technical Conference (ATC), 2023
  5. 2022.NSDI-MLaaS-Weng-preview.png
    MLaaS in the Wild: Workload Analysis and Scheduling in Large-Scale Heterogeneous GPU Clusters
    Qizhen Weng, Wencong Xiao, Yinghao Yu, Wei Wang, Cheng Wang, Jian He, Yong Li, Liping Zhang, Wei Lin, and Yu Ding
    In 19th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2022
  6. 2020.SC-Metis-Wang_Weng-preview.png
    Metis: Learning to Schedule Long-Running Applications in Shared Container Clusters at Scale
    Luping Wang, Qizhen Weng, Wei Wang, Chen Chen, and Bo Li
    In International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020
  7. 2020.SoCC-LBBSP-Chen-preview.png
    Semi-Dynamic Load Balancing: Efficient Distributed Learning in Non-Dedicated Environments
    Chen Chen, Qizhen Weng, Wei Wang, Baochun Li, and Bo Li
    In 11th ACM Symposium on Cloud Computing (SoCC), 2020