Home

I'm a Principal Researcher at Microsoft. My research interests lie in the broad area of programmability, performance, and scalability in parallel runtime, and machine learning systems, with a special emphasis on the intersection of novel machine learning algorithms and AI systems. I joined Microsoft Research at Redmond as a Senior Research Software Development Engineer in 2016. Before joining Microsoft, I received my Ph.D. from the Computer Science and Engineering department of Ohio State University.  While I was at OSU, I was a member of the Programming Languages and Software Systems (PLaSS) group, working with Prof. Michael D. Bond on building efficient and scalable runtime systems with strong semantics for parallel programs. More information about my experience can be found on LinkedIn.

I have moved to new homepage (https://minjiazhang.github.io/). 

This website is no longer actively updated.

News:

    • 12-9-2023: Our paper on enabling efficient DNN training via data efficient optimizations has been accepted at AAAI 2024!
    • 12-7-2023: Our paper on enabling efficient DNN training on preemptible instances has been accepted at NSDI 2024!
    • 8-15-2023: Our paper on cost-effective on-device continual learning has been accepted at MobiCom 2023!
    • 7-15-2023: Our paper on adversarial fine-tuning efficiency optimizations has been accepted at ECAI 2023!
    • 1-21-2023: Our paper on compressed communication for large-scale training 0/1 Adam has been accepted at ICLR 2023!
    • 11-7-2022: Our paper on fast and accurate vector search via intra-query parallelism has been accepted at PPoPP 2023!
    • 09-20-2022: Our paper on large-scale GNN training on a single-node machine has been accepted at ASPLOS 2023!
    • 09-14-2022: Three papers have been accepted at NeurIPS 2022! 2665 out of 10411 submissions are accepted.
    • 07-08-2022: Our paper on large-scale DNN training on spot instances has been accepted at NSDI 2023! 50 out of 272 submissions are accepted.
    • 06-13-2022: Our paper on large-scale inference for Transformer models has been accepted at SC 2022! 81 out of 320 submissions are accepted.
    • 05-15-2022: Our paper on advancing the next generation of AI via Mixture-of-Experts has been accepted at ICML 2022! 1117 out of 5630 submissions are accepted.
    • 2-24-2022: Our paper on continual learning has been accepted at DAC 2022!
    • 12-1-2021: Our paper on adversarial data augmentation for knowledge distillation has been accepted at AAAI 2022! 1349 out of 9251 submissions are accepted.
    • 10-11-2021: Our paper on graph sampling and pruning for nearest neighbor search has been accepted at WSDM 2022! 159 out of 786 submissions are accepted.
    • 9-28-2021: Our paper on semi-structured sparsity for compressing Transformer networks has been accepted at NeurIPS 2021.
    • ...

Research Interests: 

Efficient Large-Scale DNN training

Ultra-fast Inference

Model Compression

DL Compilers

Data Efficiency

Large Language Models and Their Applications

  • NeurIPS AI4Science Workshop    "DeepSpeed4Science Initiative: Enabling Large-Scale Scientific Discovery through Sophisticated AI System Technologies",  Shuaiwen Leon Song, Bonnie Kruft, Minjia Zhang, Conglong Li, Shiyang Chen, Chengming Zhang, and all
  • Preprint                     "DeepSpeed-Chat: Easy, Fast and Affordable RLHF Training of ChatGPT-like Models at All Scales", Zhewei Yao, Reza Yazdani Aminabadi, Olatunji Ruwase, Samyam Rajbhandari, Xiaoxia Wu, Ammar Ahmad Awan, Jeff Rasley, Minjia Zhang, Conglong Li, Connor Holmes, Zhongzhu Zhou, Michael Wyatt, Molly Smith, Lev Kurilenko, Heyang Qin, Masahiro Tanaka, Shuai Che, Shuaiwen Leon Song, Yuxiong He
  • Preprint                   "DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention", Zhewei Yao, Xiaoxia Wu, Conglong Li, Minjia Zhang, Heyang Qi, Olatunji Ruwase, Ammar Ahmad Awan, Samyam Rajbhandari, Yuxiong He

Efficient Nearest Neighbor Methods and Their Applications in Large-scale Vector Search

Parallel Computing and Runtime

Distributed System

Technical Reports:

Patents:

Professional Service:

Awards:

Talks:

  • Presented work on "XTC: Extreme model compression made simple and efficient" at NeurIPS 2022
  • Invited talk on "Extreme Compression for Pre-trained Transformers Made Simple and Efficient" at Intel AI Group, July 28th 2022
  • Invited talk by Zhihao Jia on "DeepSpeed: The library to accelerate training and inference of DNN at scale" at CMU, April 18th 2022
  • Invited talk on "DeepSpeed: The library to accelerate training and inference of DNN at scale" at the Efficient Large-Scale AI Workshop as a part of MSR Project Green
  • Invited talk by Myeongjae Jeon on "DeepSpeed: The library to accelerate training and inference of DNN at scale" at UNIST, April 13th 2022
  • Invited lecture on "New algorithms for Approximate Nearest Neighbor Search Systems at Scale" at Kent State University, October 20, 2022
  • Presented work on graph sampling and pruning for nearest neighbor search at WSDM 2022
  • Invited talk on "DL Inference and Training Optimization Towards Speed and Scale" at Tsinghua AIR 2021
  • Invited keynote speech on "DL Inference and Training Optimization Towards Speed and Scale" at EMDC 2021
  • Presented work on DL inference through heterogeneous devices at IPDPS 2021
  • Presented work on "DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation" at ICLR 2021
  • Presented keynote speech on "DL Inference and Training Optimization Towards Speed and Scale" at EMDC 2021
  • Presented work on "Accelerating Training of Transformer-Based Language Models with Progressive Layer Dropping" at NeurIPS 2020
  • Presented work on "AdaTune: Adaptive Tensor Program Compilation Made Efficient" at NeurIPS 2020
  • Invited talk on "TVM@Microsoft" at the TVM and Deep Learning Compilation Conference 2019, Seattle, Washington, US [video][slides]
  • Presented work on "GRIP: Multi-Store Capacity-Optimized High-Performance Nearest Neighbor Search for Vector Search Engine" at CIKM 2019, Beijing, China
  • Presented work on "Accelerating Large Scale Deep Learning Inference through DeepCPU at Microsoft" at 2019 USENIX OpML, May 2019, Santa Clara, CA, USA
  • Presented work on "DeepCPU: Serving RNN-based Deep Learning Models 10x Faster" at 2018 USENIX Annual Technical Conference, July 2018, Boston, MA, USA
  • Invited talk on "DeepCPU: Deep Learning Serving Optimizations on CPUs" at the Deep Learning workshop at Microsoft TechFest 2018, March 2018, Redmond, WA, USA
  • Invited talk on "DeepCPU: Deep Learning Serving Optimizations on CPUs" at Microsoft Research Talk Series, February 2018, Redmond, WA, USA
  • Presented work on "DeepCPU: Deep Learning Serving Optimizations on CPUs" at Machine Learning, AI & Data Science Conference (MLADS) December 2017, Redmond, WA, USA
  • Presented work on detecting and tolerating region conflicts to support region snapshot isolation at ACM Student Research Competition, OOPSLA 2015, Pittsburg, PA, USA
  • Presented work on low-overhead and scalable software transactional memory with strong progress guarantees at the 20st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA