Tuo Zhao - Alchimia vos liberabit!

Recent News

Feb. 2026: I released pyhuge, the Python version of our R package huge for high-dimensional Gaussian copula graphical model estimation. The original package has been downloaded nearly 800,000 times on CRAN and is widely used in biomedical data analysis. The new implementation was developed via vibe coding with OpenAI Codex 5.3. See the PyPI release here and GitHub repository here.
Oct. 2025: Our NorMuon optimizer reached the #1 spot on the NanoGPT speedrun leaderboard, setting a new world record (merged as “New WR: Implement NorMuon on latest version” in Keller Jordan’s modded-nanogpt repo), and was highlighted publicly by Larry Dial. Congratulations to Zichong Li, Liming Liu, and our collaboration team at Microsoft!
Jun. 2025: Zichong Li, in collaboration with Microsoft, has led the release of the newest additions to the Phi model family—SlimMOE Framework and the Open-Source Models Phi-mini-MoE-instruct and Phi-Tiny-MoE-instruct. Collaborators include alumni Chen Liang and fellow group members Zixuan Zhang and Ilgee Hong.
Apr. 2025: Alexander Bukharin has successfully defended his Ph.D. Dissertation: Robust and Flexible Reward Modeling for LLM Alignment. He will join NVIDIA as a research scientist.
Apr. 2025: Qingru Zhang has successfully defended his Ph.D. Dissertation: On the Efficiency and Steerability of Self-Attention Mechanism of LLMs. He will join Microsoft as a research scientist.
June. 2024: Yan Li has successfully defended his Ph.D. Dissertation: Theories and Algorithms for Efficient and Robust Sequential Decision Making. He will join Department of Industrial and Systems Engineering at Texas A&M University as a tenure-track assistant professor in 2024 Fall.
Feb. 2024: Minshuo Chen has accepted an offer of tenure-track assistant professor position from Department of Industrial Engineering and Management Sciences at Northwestern University. He will start in 2024 Fall.
Nov. 2023: Chen Liang has successfully defended her Ph.D. Dissertation: On Parameter Efficiency of Neural Language Models. She will join Microsoft as a senior research scientist.
Oct. 2023: Prof. Shihao Yang and I co-organized Georgia Statistics Day 2023.
Apr. 2023: Simiao Zuo has successfully defended his Ph.D. Dissertation: On Training, Inference and Sample Efficiency of Language Models. He will join Microsoft as a research scientist.
Mar. 2023: Qingru Zhang's recent collabrative work with Microsoft Azure AI on parameter efficient fine-tuning is available on Hugging Face now. See more information here.

Oct. 2022: One Ph.D. position is available in my group. Prof. Yongsheng Chen in School of Civil and Environmental Engineering and I are recruiting a Ph.D. student to work on the interface of computational chemistry and machine learning. See more information here. Please contact us if you are interested and have a background in molecular dynamics simulation.

Sep. 2022: One Ph.D. position is available in my group. Prof. Hua Wang at ETH Zurich and I are recruiting a PhD student to work on the interface of modern circuit design and machine learning. See more information here. Please contact me if you are interested and have a background in electromagnetics, especially EM simulation.

Sep. 2022: Two Ph.D. positions are available in my group. See more information here. Please contact me if you are interested in deep learning theory or natural language processing.

Jul. 2022: Minshuo Chen has successfully defended his Ph.D. Dissertation: Representation and Statistical Properties of Deep Neural Networks for Structured Data. He will join Princeton University as a postdoctoral fellow.

Jul. 2022: Siawpeng Er has successfully defended his Ph.D. Dissertation: Deep Learning in Biomedical Informatics and Modern Circuit Design. He will join Home Depot as a Data Scientist.

Nov. 2021: Jiachen Yang has successfully defended his Ph.D. Dissertation: Cooperation in Multi-Agent Reinforcement Learning. He will join Lawrence Livermore National Laboratory as a Staff Research Scientist.

Jul. 2021: I am co-organizing The First Workshop on Evaluations and Assessments of Neural Conversation Systems (EANCS) (co-located with EMNLP 2021) with a group of researchers from Google, Amazon, Microsoft, Facebook, Georgia Tech, Virginia Tech and National Taiwan University.

Jul. 2021: Yujia Xie has successfully defended her Ph.D. Dissertation: On Computation and Applications of Optimal Transport. She will join Microsoft as a research scientist.

Apr. 2021: Zhehui Chen has successfully defended his Ph.D. Dissertation: Modern Statistical Methods for Optimization and Change-Point Detection. He will join Didi as a research scientist.

Apr. 2021: Haoming Jiang has successfully defended his Ph.D. Dissertation: Reducing Human Labor Cost in Deep Learning for Natural Language Processing. He will join Amazon as an applied research scientist.

Apr. 2021: Tianyi Liu has successfully defended his Ph.D. Dissertation: Theoretical Analysis of Stochastic Gradient Descent in Nonconvex Optimization. He will join Bytedance as a research scientist.

Mar. 2021: Minshuo Chen wrote a blog post for our recent results in Towards Understanding Hierarchical Learning: Benefits of Neural Representations.

Apr. 2020: Ethan Fang, Niao He, Junwei Lu, Zhaoran Wang, Zhuoran Yang and I are co-organizing an Online Seminar Series on Mathematical Foundation of Data Sciences. See more information here!

Dec. 2019: Haoming Jiang's recent collaborative work with Microsoft Dynamics 365 AI and Microsoft Research AI (paper, code) achieves state-of-the-art results in 5 of 9 GLUE benchmark tasks and an overall GLUE task performance 89.9, which outperforms all existing models.

Nov. 2019: Minshuo Chen wrote a blog post for our recent results in Nonparametric Regression on Low Dimensional Manifolds using Deep Neural Networks.

Oct. 2019: Prof. Wenjing Liao and I are co-organizing a mini-symposium on Machine Learning on Data with Low-dimensional Structures at the SIAM Conference on Mathematics of Data Sciences 2020.

Oct. 2019: Prof. Yajun Mei, Prof. Yao Xie and I co-organized Georgia Statistics Day 2019.

Show more▼

Preprints and Working Papers (^* indicates equal contributions, and ^‡ indicates advisees)

Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training
Zhenghao Xu^‡, Qin Lu, Changlong Yu, Tuo Zhao
Preprint available on arXiv [Link]
Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training
Ran Xu, Tianci Liu, Zihan Dong, Tong You, Ilgee Hong^‡, Carl Yang, Linjun Zhang, Tuo Zhao, Haoyu Wang
Preprint available on arXiv [Link]
Teach Diffusion Language Models to Learn from Their Own Mistakes
Liming Liu, Binxuan Huang, Xin Liu, Bing Yin and Tuo Zhao
Preprint available on arXiv [Link]
OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment
Tianci Liu, Ran Xu, Tony Yu, Ilgee Hong, Carl Yang, Tuo Zhao and Haoyu Wang
Preprint available on arXiv [Link]
NorMuon: Making Muon more efficient and scalable
Zichong Li, Liming Liu, Chen Liang, Weizhu Chen and Tuo Zhao
Preprint available on arXiv [Link]
IDEA Prune: An Integrated Enlarge-and-Prune Pipeline in Generative Language Model Pretraining
Yixiao Li^‡, Yixiao Li, Xianzhi Du, Ajay Jaiswal, Tao Lei, Tuo Zhao, Chong Wang and Jianyu Wang
Preprint available on arXiv [Link]

LLMs can generate a better answer by aggregating their own responses
Zichong Li^‡, Xinyu Feng, Yuheng Cai^‡, Zixuan Zhang^‡, Tianyi Liu, Chen Liang, Weizhu Chen, Haoyu Wang and Tuo Zhao
Preprint available on arXiv [Link]

Model Tells Itself Where to Attend: Faithfulness Meets Automatic Attention Steering
Qingru Zhang^‡, Xiaodong Yu, Chandan Singh, Xiaodong Liu, Liyuan Liu, Jianfeng Gao, Tuo Zhao, Dan Roth and Hao Cheng
Preprint available on arXiv [Link]

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLMs
Hao Kang, Qingru Zhang^‡, Souvik Kundu, Geonhwa Jeong, Zaoxing Liu, Tusha Krishna and Tuo Zhao
Preprint available on arXiv [Link]

Provable Benefits of Policy Learning from Human Preferences in Contextual Bandit Problems
Xiang Ji, Huazheng Wang, Minshuo Chen, Tuo Zhao and Mengdi Wang
Preprint available on arXiv [Link]

First-order Policy Optimization for Robust Markov Decision Process
George Lan, Yan Li^‡ and Tuo Zhao
Preprint available on arXiv [Link]

DiP-GNN: Discriminative Pre-Training of Graph Neural Networks
Simiao Zuo^‡, Haoming Jiang, Qingyu Yin, Xianfeng Tang, Bing Yin and Tuo Zhao
Preprint available on arXiv [Link]

Differentially Private Estimation of Hawkes Process
Simiao Zuo^‡, Tianyi Liu^‡, Tuo Zhao and Hongyuan Zha
Preprint available on arXiv [Link]

Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data
Yan Li^‡, Caleb Ju, Ethan Fang and Tuo Zhao
Preprint available on arXiv [Link]

Statistical Guarantees of Generative Adversarial Networks for Distribution Estimation
Minshuo Chen^‡, Wenjing Liao, Hongyuan Zha and Tuo Zhao (Alphabetical order)
Preprint available on arXiv [Link]

Show more▼

Publications (^* indicates equal contributions, ^# indicates alphabetical order, and ^‡ indicates advisees)

COSMOS: A hybrid adaptive optimizer for memory-efficient training of LLMs
Liming Liu^‡, Zhenghao Xu^‡, Zixuan Zhang^‡, Hao Kang, Zichong Li^‡, Chen Liang, Weizhu Chen and Tuo Zhao
International Conference on Learning Representations (ICLR), 2026 [arXiv]
ARMOR: High-Performance Semi-Structured Pruning via Adaptive Matrix Factorization
Lawrence Liu, Alexander Liu, Mengdi Wang, Tuo Zhao and Lin F. Yang
International Conference on Learning Representations (ICLR), 2026 [arXiv]
Tidal: Tackling Concept Drift in Provenance-based Advanced Persistent Threats Detection
Yuchen Zhou, Ning Yu, Tuo Zhao, Zhen Liu
New Ideas in Networked Systems (NINES), 2026[Link]
A Minimalist Example of Edge-of-Stability and Progressive Sharpening
Liming Liu^‡, Zixuan Zhang^‡, Simon Du and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2025 [arXiv]
AdaSPEC: Selective Knowledge Distillation for Efficient Speculative Decoders
Yuezhou Hu^‡*, Jiaxin Guo^‡*, Xinyu Feng, and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2025
Think-RM: Enabling Long-Horizon Reasoning in Generative Reward Models
Ilgee Hong^‡, Changlong Yu, Liang Qiu, Weixiang Yan, Zhenghao Xu^‡, Haoming Jiang, Qingru Zhang^‡, Qin Lu, Xin Liu, Chao Zhang, and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2025 [arXiv]
Ask a Strong LLM Judge when Your Reward Model is Uncertain
Zhenghao Xu^‡, Qin Lu, Qingru Zhang^‡, Liang Qiu, Ilgee Hong^‡, Changlong Yu, Wenlin Yao, Yao Liu, Haoming Jiang, Lihong Li, Hyokun Yun, and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2025
Doubly Robust Off-Policy Learning on Low-Dimensional Manifolds by Deep Neural Networks
Minshuo Chen^‡*, Hao Liu*, Wenjing Liao and Tuo Zhao
Accepted by Mathematics of Operations Research (with minor revision) [arXiv]
Good regularity creates large learning rate implicit biases: edge of stability, balancing, and catapult
Yuqing Wang, Zhenghao Xu^‡, Tuo Zhao and Molei Tao
Accepted by Journal of Machine Learning Research (with minor revision) [arXiv]
DORM: Preference Data Weights Optimization for Reward Modeling in LLM Alignment
Rongzhi Zhang, Chenwei Zhang, Xinyang Zhang, Liang Qiu, Haoming Jiang, Yuchen Zhuang, Qingru Zhang, Hyokun Yun, Xian Li, Bing Yin, Tuo Zhao and Chao Zhang
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2025
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming and Distillation
Zichong Li^‡, Chen Liang, Zixuan Zhang^‡, Ilgee Hong^‡, Young Jin Kim, Weizhu Chen and Tuo Zhao
Conference on Language Modeling (COLM), 2025 [arXiv]
Adversarial Training of Reward Models
Alexander Bukharin^‡, Haifeng Qian, Shengyang Sun, Adithya Renduchintala, Soumye Singhal, Zhilin Wang, Oleksii Kuchaiev, Olivier Delalleau and Tuo Zhao
Conference on Language Modeling (COLM), 2025 [arXiv]
Self-Rewarding PPO: Aligning Large Language Models with Demonstrations Only
Qingru Zhang^‡, Liang Qiu, Ilgee Hong^‡, Zhenghao Xu^‡, Tianyi Liu, Shiyang Li, Rongzhi Zhang, Zheng Li, Lihong Li, Bing Yin, Chao Zhang, Jianshu Chen, Haoming Jiang and Tuo Zhao
Conference on Language Modeling (COLM), 2025
NoWag: A Unified Framework for Shape Preserving Compression of Large Language Models
Lawrence Ray Liu, Inesh Chakrabarti, Yixiao Li^‡, Mengdi Wang, Tuo Zhao, Lin Yang
Conference on Language Modeling (COLM), 2025 [arXiv]
RoseRAG: Robust retrieval-augmented generation with small-scale LLMs via margin-aware preference optimization
Tianci Liu, Haoxiang Jiang, Tianze Wang, Ran Xu, Yue Yu, Linjun Zhang, Tuo Zhao and Haoyu Wang
Annual Meeting of the Association for Computational Linguistics (ACL), 2025 [arXiv]
Deep Reinforcement Learning with Hierarchical Preference Design
Alexander Bukharin^‡, Yixiao Li^‡, Pengcheng He, Weizhu Chen and Tuo Zhao
International Conference on Machine Learning (ICML), 2025 [arXiv]
Discriminative Finetuning of Generative LLMs without Reward Models and Preference Data
Siqi Guo, Ilgee Hong^‡, Vicente Balmaseda, Tuo Zhao and Tianbao Yang
International Conference on Machine Learning (ICML), 2025 [arXiv]
Robust Reinforcement Learning from Corrupted Human Feedback
Alexander Bukharin^‡, Ilgee Hong^‡, Haoming Jiang, Zichong Li^‡, Qingru Zhang^‡, Zixuan Zhang^‡ and Tuo Zhao^#
Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv]
Nonparametric Classification on Low Dimensional Manifolds using Overparameterized Convolutional Residual Networks
Zixuan Zhang^*‡, Kaiqi Zhang*, Minshuo Chen, Mengdi Wang, Tuo Zhao and Yuxiang Wang
Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv]
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
Ilgee Hong^‡*, Zichong Li^‡*, Alexander Bukharin^‡, Yixiao Li^‡, Haoming Jiang, Tianbao Yang and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv]
Provable Acceleration of Nesterov's Accelerated Gradient for Asymmetric Matrix Factorization and Linear Neural Networks
Zhenghao Xu^‡, Yuqing Wang, Tuo Zhao, Rachel Ward and Molei Tao
Conference on Neural Information Processing Systems (NeurIPS), 2024 [arXiv]
RoseLoRA: Row and Column-wise Sparse Low-rank Adaptation of Pre-trained Language Model for Knowledge Editing and Fine-tuning
Haoyu Wang, Tianci Liu. Tuo Zhao and Jing Gao
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [arXiv]
BlendFilter: Advancing Retrieval-Augmented LLMs via Query Generation Blending and Knowledge Filtering
Haoyu Wang, Tuo Zhao and Jing Gao
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [arXiv]
Data Diversity Matters for Robust Instruction Tuning
Alexander Bukharin^‡ and Tuo Zhao
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024 [arXiv]
Efficient Long Sequence Modeling via State Space Augmented Transformer
Simiao Zuo^*‡, Xiaodong Liu*, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao and Jianfeng Gao
Conference on Language Modeling (COLM), 2024 [arXiv]
Sample Complexity of Neural Policy Mirror Descent for Policy Optimization on Low-Dimensional Manifolds
Zhenghao Xu^‡, Xiang Ji, Minshuo Chen^‡, Mengdi Wang and Tuo Zhao
Accepted by Journal of Machine Learning Research (JMLR), 2024+ [arXiv]
Learning Generalizable Vision-Tactile Robotic Grasping Strategy for Deformable Objects via Transformer
Yunhai Han, Rahul Batra, Nathan Boyd, Tuo Zhao, Yu She, Seth Hutchinson and Ye Zhao
IEEE/ASME Transactions on Mechatronics (TMECH), 2024 [arXiv]
International Conference on Advanced Intelligent Mechatronics (AIM), 2023 (short version)
Score Matching-based Pseudolikelihood Estimation of Neural Marked Spatio-Temporal Point Process with Uncertainty Quantification
Zichong Li^‡, Qunzhi Xu, Zhenghao Xu^‡, Yajun mei, Tuo Zhao and Hongyuan Zha
International Conference on Machine Learning (ICML), 2024 [arXiv]
To Cool or not to Cool? Temperature Network Meets Large Foundation Models via DRO
Zi-Hao Qiu, Siqi Guo, Mao Xu, Tuo Zhao, Lijun Zhang and Tianbao Yang
International Conference on Machine Learning (ICML), 2024 [arXiv]
Tell Your Model Where to Attend: Post-hoc Attention Steering for LLMs
Qingru Zhang^‡, Chandan Singh, Liyuan Liu, Xiaodong Liu, Bin Yu, Jianfeng Gao, Tuo Zhao
International Conference on Learning Representations (ICLR), 2024 [arXiv]
LoftQ: LoRA-Fine-Tuning-Aware Quantization for LLMs
Yixiao Li^‡, Yifan Yu^‡, Chen Liang^‡, Pengcheng He, Nikos Karampatziakis, Weizhu Chen and Tuo Zhao
International Conference on Learning Representations (ICLR), 2024 [arXiv]
Deep Nonparametric Estimation of Operators between Infinite Dimensional Spaces
Hao Liu, Haizhao Yang, Minshuo Chen^‡, Tuo Zhao and Wenjing Liao
Journal of Machine Learning Research (JMLR), 2024[arXiv]
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer
Qingru Zhang^‡, Dhananjay Ram, Cole Hawkins, Sheng Zha and Tuo Zhao
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023 [arXiv]
HadSkip: Homotopic and Adaptive Layer Skipping of Pre-trained Language Models for Efficient Inference
Haoyu Wang, Yaqing Wang, Tianci Liu, Tuo Zhao and Jing Gao
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Robust Multi-Agent Reinforcement Learning via Adversarial Regularization: Theoretical Foundation and Stable Algorithms
Alexander Bukharin^‡, Yan Li^‡, Yue Yu, Qingru Zhang^‡, Zhehui Chen^‡, Simiao Zuo^‡, Chao Zhang, Songan Zhang and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2023 [arXiv]
Module-wise Adaptive Distillation for Multimodality Foundation Models
Chen Liang^‡, Jiahui Yu, Ming-Hsuan Yang, Matthew Brown, Yin Cui, Tuo Zhao, Boqing Gong and Tianyi Zhou
Conference on Neural Information Processing Systems (NeurIPS), 2023 [arXiv]
Model-Based Reparameterization Policy Gradient Methods: Theory and Practical Algorithms
Shenao Zhang, Boyi Liu, Zhaoran Wang and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2023 [arXiv]
Pivotal Estimation of Linear Discriminant Analysis in High Dimensions
Ethan Fang, Yajun Mei, Yuyang Shi, Qunzhi Xu and Tuo Zhao
Journal of Machine Learning Research (JMLR), 2023 [arXiv]
High Dimensional Binary Classification under Label Shift: Phase Transition and Regularization
Jiahui Cheng*, Minshuo Chen^*‡, Hao Liu, Tuo Zhao and Wenjing Liao
Sampling Theory, Signal Processing, and Data Analysis, 2023 [arXiv]
Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity
Yan Li^‡, George Lan and Tuo Zhao
Mathematical Programming Series Series A, 2024 [arXiv]
LightToken: a Task and Model-agnostic Lightweight Token Embedding Framework for Pre-trained Language Models
Haoyu Wang, Ruirui Li, Haoming Jiang, Zhengyang Wang, Xianfeng Tang, Bin Bi, Monica Cheng, Bing Yin, Yaqing Wang, Tuo Zhao and Jing Gao
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2023 [arXiv]
County augmented transformer for COVID‐19 state hospitalizations prediction
Siawpeng Er^‡, Shihao Yang and Tuo Zhao
Scientific Reports, 2023 [arXiv]
Context-Aware Query Rewriting for Improving Users' Search Experience on E-commerce Websites
Simiao Zuo^‡, Qingyu Yin, Haoming Jiang, Shaohui Xi, Bing Yin, Chao Zhang and Tuo Zhao
Annual Meeting of the Association for Computational Linguistics (ACL), 2023 [arXiv]
Effective Minkowski Dimension of Deep Nonparametric Regression: Function Approximation and Statistical Theories
Zixuan Zhang^‡, Minshuo Chen^‡, Mengdi Wang, Wenjing Liao and Tuo Zhao
International Conference on Machine Learning (ICML), 2023 [arXiv]
Machine Learning Force Fields with Data Cost Aware Training
Alexander Bukharin^‡, Tianyi Liu, Shengjie Wang, Simiao Zuo^‡, Weihao Gao, Wen Yan and Tuo Zhao
International Conference on Machine Learning (ICML), 2023 [arXiv]
SMURF-THP: Score Matching-based UnceRtainty quantiFication for Transformer Hawkes Process
Zichong Li^‡, Yanbo Xu, Simiao Zuo^‡, Haoming Jiang, Chao Zhang, Tuo Zhao and Hongyuan Zha
International Conference on Machine Learning (ICML), 2023 [arXiv]
LoSparse: Structured Compression of LLMs based on Low-Rank and Sparse Approximation
Yixiao Li*^‡, Yifan Yu*^‡, Qingru Zhang^‡, Chen Liang^‡, Pengcheng He, Weizhu Chen and Tuo Zhao
International Conference on Machine Learning (ICML), 2023 [arXiv]
Score Approximation, Estimation and Distribution Recovery of Diffusion Models on Low-Dimensional Data
Minshuo Chen^‡, Kaixuan Huang, Tuo Zhao and Mengdi Wang
International Conference on Machine Learning (ICML), 2023 [arXiv]
Less is More: Task-aware Layer-wise Distillation for Language Model Compression
Chen Liang^‡, Simiao Zuo^‡, Qingru Zhang^‡, Pengcheng He, Weizhu Chen and Tuo Zhao
International Conference on Machine Learning (ICML), 2023 [arXiv]
A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Jie Wang, Minshuo Chen^‡, Tuo Zhao, Wenjing Liao and Yao Xie
Information and Inference: A Journal of the IMA, 2023 [arXiv]
Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Xiang Ji, Minshuo Chen^‡, Mengdi Wang and Tuo Zhao
International Conference on Learning Representations (ICLR), 2023 [arXiv]
Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Qingru Zhang^‡, Minshuo Chen^‡, Alexander Bukharin^‡, Pengcheng He, Yu Cheng, Weizhu Chen and Tuo Zhao
International Conference on Learning Representations (ICLR), 2023 [arXiv]
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang^‡, Haoming Jiang, Zheng Li, Xianfeng Tang, Bing Yin and Tuo Zhao
International Conference on Learning Representations (ICLR), 2023 [arXiv]
Reinforcement Learning for Adaptive Mesh Refinement
Jiachen Yang^‡, Tarik Dzanic, Brenden Petersen, Jun Kudo, Ketan Mittal, Vladimir Tomov, Jean-Sylvain Camier, Tuo Zhao, Hongyuan Zha, Tzanio Kolev, Robert Anderson and Daniel Faissol
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023 [arXiv]
Block Policy Mirror Descent
George Lan, Yan Li^‡ and Tuo Zhao
SIAM Journal on Optimization (SIOPT), 33(3):2341-2378, 2023 [arXiv]

On Deep Generative Models for Approximation and Estimation of Distributions on Manifolds
Biraj Dahal, Alexander Havrilla, Minshuo Chen^‡, Tuo Zhao and Wenjing Liao
Conference on Neural Information Processing Systems (NeurIPS)，2022 [arXiv]

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Hao Liu, Minshuo Chen^‡, Siawpeng Er^‡, Wenjing Liao, Tong Zhang and Tuo Zhao
International Conference on Machine Learning (ICML), 2022 [arXiv]

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Qingru Zhang^‡, Simiao Zuo^‡, Chen Liang^‡, Alex Bukharin^‡, Pengcheng He, Weizhu Chen and Tuo Zhao
International Conference on Machine Learning (ICML), 2022 [arXiv]

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo^‡, Qingru Zhang^‡, Chen Liang^‡, Pengcheng He, Tuo Zhao and Weizhu Chen
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 [arXiv]

CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao and Chao Zhang
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 [arXiv]

Self-Training with Differentiable Teacher
Simiao Zuo^‡, Yue Yu, Chen Liang, Haoming Jiang, Siawpeng Er, Chao Zhang Tuo Zhao and Hongyuan Zha
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022 [arXiv]

Adversarially Regularized Policy Learning Guided by Trajectory Optimization
Zhigen Zhao, Simiao Zuo^‡, Tuo Zhao and Ye Zhao
Annual Learning for Dynamics & Control Conference (L4DC), 2022 [arXiv]

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Chen Liang^‡, Pengcheng He, Yelong Shen, Weizhu Chen and Tuo Zhao
Annual Meeting of the Association for Computational Linguistics (ACL), 2022 [arXiv]

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Chen Liang^‡, Haoming Jiang^‡, Simiao Zuo^‡, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen and Tuo Zhao
International Conference on Learning Representations (ICLR), 2022 [arXiv]

Frequency-aware SGD for Efficient Embedding Learning with Provable Benefits
Yan Li^‡, Dhruv Choudhary, Xiaohan Wei, Baichuan Yuan, Bhargav Bhushanam, Tuo Zhao and Guanghui Lan
International Conference on Learning Representations (ICLR), 2022 [arXiv]

Taming Sparsely Activated Transformer with Stochastic Experts
Simiao Zuo^‡, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao and Jianfeng Gao
International Conference on Learning Representations (ICLR), 2022 [arXiv]

Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang, Minshuo Chen^‡, Tuo Zhao and Molei Tao
International Conference on Learning Representations (ICLR), 2022 [arXiv]

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Tianyi Liu^‡, Yan Li^‡, Enlu Zhou and Tuo Zhao
International Conference on Artificial Intelligence and Statistics (AISTATS), 2022 [arXiv]

Adaptive Incentive Design with Multi-Agent Meta-Gradient Reinforcement Learning
Jiachen Yang^‡, Ethan Wang^‡, Rakshit Trivedi, Tuo Zhao and Hongyuan Zha
International Conference on Autonomous Agents and Multiagent Systems, 2022 [arXiv]

Nonparametric Regression on Low-Dimensional Manifolds using Deep ReLU Networks
Minshuo Chen^‡, Haoming Jiang^‡, Wenjing Liao and Tuo Zhao^#
Information and Inference: A Journal of the IMA, 2022 [arXiv, Poster]

Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL
Minshuo Chen^‡, Yan Li^‡, Zhuoran Yang, Zhaoran Wang and Tuo Zhao
Conference on Neural Information Processing (NeurIPS), 2021 [arXiv]

Towards Automatic Evaluation of Dialog Systems: A Model-Free Off-Policy Evaluation Approach
Haoming Jiang^‡, Bo Dai, Mengjiao Yang, Tuo Zhao and Wei Wei
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021 [arXiv]

Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach
Simiao Zuo^‡, Chen Liang^‡, Haoming Jiang^‡, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao and Tuo Zhao
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021 [arXiv]

Besov Function Approximation and Binary Classification on Low-Dimensional Manifolds Using Convolutional Residual Networks
Hao Liu, Minshuo Chen^‡, Tuo Zhao and Wenjing Liao
International Conference on Machine Learning (ICML), 2021 [arXiv]

How Important is the Train-Validation Split in Meta-Learning?
Yu Bai, Minshuo Chen^‡, Pan Zhou, Tuo Zhao, Jason D. Lee, Sham Kakade, Huan Wang and Caiming Xiong
International Conference on Machine Learning (ICML), 2021 [arXiv]

Fine-Tuning Pre-trained Language Models with Weak Supervision: A Contrastive-Regularized Self-Training Approach
Yue Yu*, Simiao Zuo^‡*, Haoming Jiang^‡, Wendi Ren, Tuo Zhao and Chao Zhang
Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2021 [arXiv]

Deep Learning Assisted End-to-End Synthesis of mm-Wave Passive Networks with 3D EM Structures: A Study on A Transformer-Based Matching Network
Siawpeng Er^‡, Edward Liu, Minshuo Chen^‡, Yan Li^‡, Yuqi Liu, Tuo Zhao and Hua Wang
International Microwave Symposium (IMS), 2021
[The Finalist of IMS 2021 Best Student Paper Competition]

Why Do Deep Residual Networks Generalize Better than Deep Feedforward Networks? -- A Neural Tangent Kernel Perspective
Kaixuan Huang^*‡, Yuqing Wang*, Molei Tao and Tuo Zhao
Conference on Neural Information Processing Systems (NeurIPS), 2020 [arXiv, Poster]

Deep Reinforcement Learning with Smooth and Robust Policy
Qianli Shen^*‡, Yan Li^*‡, Haoming Jiang^‡, Zhaoran Wang and Tuo Zhao
International Conference on Machine Learning (ICML), 2020 [arXiv]

Transformer Hawkes Process
Simiao Zuo^‡, Haoming Jiang^‡, Zichong Li^‡, Tuo Zhao and Hongyuan Zha
International Conference on Machine Learning (ICML), 2020 [arXiv]

BOND: Bert-Assisted Open-Domain Named Entity Recognition with Distant Supervision
Chen Liang^*‡, Yue Yu^*, Haoming Jiang^*‡, Siawpeng Er, Ruijia Wang, Tuo Zhao and Chao Zhang
SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2020 [arXiv]

SMART: Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization
Haoming Jiang^‡, Pengcheng He, Weizhu Chen, Xiaodong Liu, Jianfeng Gao and Tuo Zhao
Annual Meeting of the Association for Computational Linguistics (ACL), 2020 [arXiv]

Residual Network Based Direct Synthesis of EM Structures: A Study on One-to-One Transformers
David Munzer, Siawpeng Er^‡, Minshuo Chen^‡, Yan Li^‡, Naga Mannem, Tuo Zhao and Hua Wang
IEEE Radio Frequency Integrated Circuits Symposium (RFIC), 2020 [arXiv]

Implicit Bias of Gradient Descent based Adversarial Training on Separable Data
Yan Li^‡, Ethan Fang, Huan Xu and Tuo Zhao
International Conference on Learning Representations (ICLR), 2020 [arXiv, Poster]

Towards Understanding Hierarchical Learning: Benefits of Neural Representations
Minshuo Chen^*‡, Yu Bai, Jason Lee, Tuo Zhao, Huan Wang, Caiming Xiong and Richard Socher
Annual Conference on Neural Information Processing Systems (NeurIPS), 2020 [arXiv, Poster]

Differentiable Top-k Operator with Optimal Transport
Yujia Xie^‡, Hanjun Dai, Minshuo Chen^‡, Bo Dai, Tuo Zhao, Hongyuan Zha, Wei Wei and Tomas Pfister
Annual Conference on Neural Information Processing Systems (NeurIPS), 2020 [arXiv]

Calibrated Fine-Tuning for Pre-trained Language Models via Manifold Smoothing
Lingkai Kong, Haoming Jiang^‡, Yuchen Zhuang, Jie Lyu, Tuo Zhao and Chao Zhang
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020

Multi-Domain Neural Machine Translation with Word-Level Adaptive Layer-wise Domain Mixing
Haoming Jiang^‡, Chen Liang^‡, Chong Wang and Tuo Zhao
Annual Meeting of the Association for Computational Linguistics (ACL), 2020 [arXiv, Poster]

On Generalization Bounds of a Family of Recurrent Neural Networks
Minshuo Chen^‡, Xingguo Li^‡ and Tuo Zhao
International Conference on Artificial Intelligence and Statistics (AISTATS), 2020 [arXiv, Poster]

On Computation and Generalization of Generative Adversarial Imitation Learning
Minshuo Chen^‡, Yizhou Wang^‡, Tianyi Liu^‡, Zhuoran Yang, Xingguo Li^‡, Zhaoran Wang and Tuo Zhao
International Conference on Learning Representations (ICLR), 2020 [arXiv, Poster]

Efficient Approximation of Deep ReLU Networks for Functions on Low Dimensional Manifolds
Minshuo Chen^‡, Haoming Jiang^‡, Wenjing Liao and Tuo Zhao (Alphabetical order)
Annual Conference on Neural Information Processing Systems (NeurIPS), 2019 [arXiv, Poster]

Meta Learning with Relational Information for Short Sequences
Yujia Xie^‡, Haoming Jiang^‡, Feng Liu^‡, Tuo Zhao and Hongyuan Zha
Annual Conference on Neural Information Processing Systems (NeurIPS), 2019 [arXiv, Poster]

Towards Understanding the Importance of Shortcut Connections in Residual Networks
Tianyi Liu^*‡, Minshuo Chen^*‡, Mo Zhou^‡, Simon Du, Enlu Zhou and Tuo Zhao
Annual Conference on Neural Information Processing Systems (NeurIPS), 2019 [arXiv, Poster]

Online Factorization and Partition of Complex Networks From Random Walks
Lin Yang^‡, Vladimir Braverman, Tuo Zhao and Mengdi Wang
Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2019 [arXiv, Poster]

On Fast Convergence of Proximal Algorithms for SQRT-Lasso Optimization: Don't Worry About Its Nonsmooth Loss Function
Xingguo Li^‡, Haoming Jiang^‡, Jarvis Haupt, Raman Arora, Han Liu, Mingyi Hong and Tuo Zhao
Annual Conference on Uncertainty in Artificial Intelligence (UAI), 2019 [arXiv, Poster]

Towards Understanding the Importance of Noise in Training Neural Networks
Mo Zhou^*‡, Tianyi Liu^*‡, Yan Li^‡, Dachao Lin, Enlu Zhou and Tuo Zhao
International Conference on Machine Learning (ICML), 2019 [arXiv, Poster]

On Scalable and Efficient Computation of Large Scale Optimal Transport
Yujia Xie^‡, Minshuo Chen^‡, Haoming Jiang^‡, Tuo Zhao and Hongyuan Zha
International Conference on Machine Learning (ICML), 2019 [arXiv, Poster]

On Landscape of Lagrangian Functions and Stochastic Search for Constrained Nonconvex Optimization
Zhehui Chen^*‡, Xingguo Li^*‡, Lin Yang^‡, Jarvis Haupt and Tuo Zhao
International Conference on Artificial Intelligence and Statistics (AISTATS), 2019 [arXiv, Poster]

On Computation and Generalization of Generative Adversarial Networks under Spectrum Control
Haoming Jiang^‡, Zhehui Chen^‡, Minshuo Chen^‡, Feng Liu^‡, Dingding Wang and Tuo Zhao
International Conference on Learning Representations (ICLR), 2019 [arXiv, Poster]

Picasso: A Sparse Learning Library for High Dimensional Data Analysis in R and Python
Jason Ge^*‡, Xingguo Li^*‡, Haoming Jiang^‡, Han Liu, Tong Zhang, Mengdi Wang and Tuo Zhao
Journal of Machine Learning Research (JMLR), 20(44):1-5, 2019 [PDF, Software]
[2016 ASA Best Student Paper Award on Statistical Computing]

Misspecified Nonconvex Statistical Optimization for Sparse Phase Retrieval
Zhuoran Yang^*, Lin Yang^*‡, Ethan Fang, Tuo Zhao, Zhaoran Wang and Matey Neykov
Mathematical Programming Series B, 176(1-2):1-27, 2019 [arXiv]

Symmetry, Saddle Points and Global Optimization Landscape of Nonconvex Matrix Factorization
Xingguo Li^‡, Junwei Lu, Raman Arora, Jarvis Haupt, Han Liu, Zhaoran Wang and Tuo Zhao
IEEE Transactions on Information Theory, 65(6):3489-3514, 2019 [arXiv]

Dimensionality Reduction for Stationary Time Series via Stochastic Nonconvex Optimization
Minshuo Chen^‡, Lin Yang^‡, Mengdi Wang and Tuo Zhao
Annual Conference on Neural Information Processing Systems (NeurIPS), 2018 [arXiv]

Towards Understanding Acceleration Tradeoff between Momentum and Asynchrony in Distributed Nonconvex Stochastic Optimization
Tianyi Liu^‡, Shiyang Li^‡, Jianping Shi, Enlu Zhou and Tuo Zhao
Annual Conference on Neural Information Processing Systems (NeurIPS), 2018 [arXiv]

Physical Systems behind Optimization Algorithms
Lin Yang^‡, Raman Arora, Vladimir Braverman and Tuo Zhao
Annual Conference on Neural Information Processing Systems (NeurIPS), 2018 [arXiv]

Provable Gaussian Embedding with One Observation
Ming Yu, Zhuoran Yang, Tuo Zhao, Mladen Kolar and Zhaoran Wang
Annual Conference on Neural Information Processing Systems (NeurIPS), 2018 [arXiv]

On Local Quadratic Convergence of DC Proximal Newton Algorithm for Nonconvex Regularized Sparse Learning in High Dimensions
Xingguo Li^‡, Lin Yang^‡, Jason Ge^‡, Jarvis Haupt, Tong Zhang and Tuo Zhao
Annual Conference on Neural Information Processing Systems (NeurIPS), 2017 [arXiv]

Deep Hyperspherical Learning
Weiyang Liu, Yan-Ming Zhang, Xingguo Li, Zhiding Yu, Bo Dai, Tuo Zhao and Le Song
Annual Conference on Neural Information Processing Systems (NeurIPS), 2017 [arXiv]

Homotopy Parametric Simplex Method for Sparse Learning
Haotian Pang, Robert Vanderbei, Han Liu and Tuo Zhao
Annual Conference on Neural Information Processing Systems (NeurIPS), 2017 [arXiv]

On Faster Convergence of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization
Xingguo Li^*, Tuo Zhao^*, Raman Arora, Han Liu and Mingyi Hong
Journal of Machine Learning Research (JMLR), 18(4):1-24, 2018 [arXiv]
International Conference on Artificial Intelligence and Statistics (AISTATS), 2016 (short version)

Pathwise Coordinate Optimization for Nonconvex Sparse Learning: Algorithm and Theory
Tuo Zhao, Han Liu and Tong Zhang
The Annals of Statistics, 46(1):180-218, 2018 [arXiv, Software]

Online Multiview Learning: Dropping Convexity for Better Efficiency
Zhehui Chen^‡, Lin Yang^‡, Chris Li and Tuo Zhao
International Conference on Machine Learning (ICML), 2017 [arXiv]

NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization
Davood Hajinezhad, Mingyi Hong, Tuo Zhao and Zhaoran Wang
Annual Conference on Neural Information Processing Systems (NeurIPS), 2016 [arXiv]

Stochastic Variance Reduced Optimization for Nonconvex Sparse Learning
Xingguo Li^*, Tuo Zhao^*, Raman Arora, Han Liu and Jarvis Haupt
International Conference on Machine Learning (ICML), 2016 [arXiv]

Accelerated Path-following Iterative Shrinkage Thresholding Algorithm
Tuo Zhao and Han Liu
Journal of Computational and Graphical Statistics (JCGS), 25(4):1272-1296, 2016 [PDF]

A Nonconvex Optimization Framework for Low Rank Matrix Factorization
Tuo Zhao, Zhaoran Wang and Han Liu
Annual Conference on Neural Information Processing Systems (NeurIPS), 2015 [PDF]

Calibrated Multivariate Regression with Application to Neural Semantic Basis Discovery
Han Liu, Lie Wang and Tuo Zhao (Alphabetical order)
Journal of Machine Learning Research (JMLR), 16(8):1579-1606, 2015 [PDF, Software]
Annual Conference on Neural Information Processing Systems (NeurIPS), 2014 (short version)
[2016 INFORMS SAS Best Paper Award on Data Mining]

The "flare" Package for High-dimensional Sparse Linear Regression in R
Xingguo Li^*, Tuo Zhao^*, Xiaoming Yuan and Han Liu
Journal of Machine Learning Research (JMLR), 16(3):553-557, 2015 [Software, Vignette, PDF]

Accelerated Mini-batch Randomized Coordinate Descent Method
Tuo Zhao^*, Mo Yu^*, Yiming Wang, Raman Arora and Han Liu
Annual Conference on Neural Information Processing Systems (NeurIPS), 2014 [PDF]

Calibrated Precision Matrix Estimation for High Dimensional Elliptical Distributions
Tuo Zhao and Han Liu
IEEE Transactions on Information Theory, 60(12):7874-7887, 2014 [PDF]
Annual Conference on Neural Information Processing Systems (NeurIPS), 2013 (short version)

Positive Semidefinite Rank-based Correlation Matrix Estimation with Application to Semiparametric Graph Estimation
Tuo Zhao, Kathryn Roeder and Han Liu
Journal of Computational and Graphical Statistics (JCGS), 23(4):895-922, 2014 [PDF]
Annual Conference on Neural Information Processing Systems (NeurIPS), 2012 (short version)

Sparse Covariance Matrix Estimation with Eigenvalue Constraints
Han Liu, Lie Wang and Tuo Zhao (Alphabetical order)
Journal of Computational and Graphical Statistics (JCGS), 23(2):439-459, 2014 [PDF]

CODA: High Dimensional Copula Discriminant Analysis
Fang Han, Tuo Zhao and Han Liu
Journal of Machine Learning Research (JMLR), 14(2):629-671, 2013 [PDF]

Automated Diagnoses of Attention Deficit Hyperactive Disorder using Magnetic Resonance Imaging
Ani Eloyan, Tuo Zhao, et al.
Frontiers in Systems Neuroscience, 6(61):1-9, 2012 [PDF, Winner of INDI ADHD-200 Global Competition]

Patterns and rates of exonic de novo mutations in autism spectrum disorders
Benjamin Neale, Tuo Zhao, et al.
Nature, 485:242-245, 2012 [PDF, News from New York Times]

The "huge" Package for High-dimensional Undirected Graph Estimation in R
Tuo Zhao, Han Liu, Kathryn Roeder, John Lafferty and Larry Wasserman
Journal of Machine Learning Research (JMLR), 13(4):1059-1062, 2012 [PDF, Software, Vignette]

Sparse Additive Machine
Tuo Zhao and Han Liu
International Conference on Artificial Intelligence and Statistics (AISTATS), 2012 [PDF, Software]

Show more▼

Software Packages

Picasso: Pathwise Calibrated Sparse Shooting Algorithm
with Jason Ge, Xinguo Li, Haoming Jiang, Han Liu, Tong Zhang and Mengdi Wang
[ GitHub (R & Python), Download (CRAN, R), Download (PyPI, Python) ]
PRIMAL: PaRametric sImplex Method for spArse Learning
with Qianli Shen, Zichong Li, Yujia Xie
[ GitHub (R), Download (CRAN, R) ]
Flare: Family of Lasso Regression
with Xinguo Li, Lie Wang, Xiaoming Yuan and Han Liu
[Download (CRAN, R)]
Huge: High-dimensional Undirected Graph Estimation
with Haoming Jiang, Xinyu Fei, Xingguo Li, Han Liu, Kathryn Roeder, John Lafferty and Larry Wasserman
[ GitHub (R & Python), Download (CRAN, R), Download (PyPI, Python) ]
SAM: Sparse Additive Modeling
with Haoming Jiang, Yukun Ma, Xinguo Li, Han Liu and Kathryn Roeder
[GitHub (R), Download (CRAN, R)]

Selected Awards and Honors

Best Paper Finalist, Efficient Natural Language and Speech Processing (ENLSP) Workshop at NeurIPS [2024]
Best Student Paper Finalist, International Microwave Symposium [2021]
Google Faculty Research Award [2020]
2016 INFORMS SAS Best Paper Award on Data Mining [2016]
2016 ASA Best Student Paper Award on Statistical Computing [2016]
Baidu Fellowship [2015]
Siebel Scholarship [2014, Siebel Scholar Profile]
Google Summer of Code Award [2011-2013]
Winner of INDI ADHD-200 Global Competition [2011]

Alchemists in My Group

Yixiao Li -- Ph.D. Student, ISyE, Georgia Tech (2022.8--Present)
Zhenghao Xu -- Ph.D. Student, ISyE, Georgia Tech (2022.8--Present, Coadvised by Molei Tao)
Zixuan Zhang -- Ph.D. Student, ISyE, Georgia Tech (2022.8--Present)
Ilgee Hong -- Ph.D. Student, ISyE, Georgia Tech (2023.8--Present)
Zichong Li -- Ph.D. Student, ISyE, Georgia Tech (2023.8--Present)
Former Visiting Student, Georgia Tech (2019.7--2019.9)
Liming Liu -- Ph.D. Student, ISyE, Georgia Tech (2024.8--Present)

FLASH Alumni

Alexander Bukharin -- Ph.D. in Machine Learning, Georgia Tech (2021.8--2025.4)
Current Position: Research Scientist, NVIDIA
Qingru Zhang -- Ph.D. in Machine Learning, Georgia Tech (2021.8--2025.4)
Current Position: Research Scientist, Microsoft
Yan Li -- Ph.D. in Operations Research, Georgia Tech (2018.12--2024.7)
Current Position: Assistant Professor of ISE, Texas A&M University
Chen Liang -- Ph.D. in Machine Learning, Georgia Tech (2018.8--2023.11)
Current Position: Senior Research Scientist, Microsoft
Simiao Zuo -- Ph.D. in Machine Learning, Georgia Tech (2019.8--2023.4)
Current Position: Senior Research Scientist, Microsoft
Minshuo Chen -- Ph.D. in Machine Learning, Georgia Tech (2017.6--2022.7)
Current Position: Assistant Professor of IEMS, Northwestern University
Siawpeng Er -- Ph.D. in Bioinformatics, Georgia Tech (2019.8--2022.7)
Current Position: Lead Data Scientist, Home Depot
Jiachen Yang -- Ph.D. in Machine Learning, Georgia Tech (2020.01--2021.12)
Current Position: Co-Founder, Simular.ai
Yujia Xie -- Ph.D. in Computational Science and Engineering, Georgia Tech (2018.12--2021.8)
Current Position: Principal Research Scientist, Microsoft
Zhehui Chen -- Ph.D. in Industrial Engineering, Georgia Tech (2016.8--2021.4)
Current Position: Senior Software Development Engineer, Google
Haoming Jiang -- Ph.D. in Machine Learning, Georgia Tech (2017.8--2021.4)
Current Position: Member of Technical Staff, OpenAI
Tianyi Liu -- Ph.D. in Operations Research, Georgia Tech (2017.9--2021.4, Coadvised by Enlu Zhou)
Current Position: Senior Research Scientist, Amazon
Xingguo Li -- Visiting Student, Georgia Tech (2017.3--2018.6)
Current Position: Quantitative Researcher, Radix Trading LLC
Lin Yang -- Visiting Student, Georgia Tech (2017.3--2017.6)
Current Position: Associate Professor of ECE, University of California Los Angeles
Yuheng Cai -- Graduate Researcher, Georgia Tech (2024.7--2025.4)
Current Position: Software Development Engineer, Google
Yuezhou Hu -- Visiting Student, Georgia Tech (2024.7--2024.9)
Current Position: Ph.D. Student, University of California, Berkeley
Jiaxin Guo -- Visiting Student, Georgia Tech (2024.7--2024.9)
Current Position: Undergraduate Student, Tsinghua University
Yifan Yu -- Undergraduate Student Researcher, Georgia Tech (2021.8--2024.5)
Current Position: Ph.D. Student, University of Illinois Urbana-Champaign
Ethan Wang -- Undergraduate Student Researcher, Georgia Tech (2020.01--2021.11, Coadvised by Hongyuan Zha)
Current Position: Software Development Engineer, Jane Street
Jie Lyu -- Undergraduate Student Researcher, Georgia Tech (2020.1--2020.5)
Current Position: Senior Machine Learning Engineer, Meta
Xinyu Fei -- Visiting Student, Georgia Tech (2018.7--2018.9)
Current Position: Research Scientist, Amazon
Mo Zhou -- Visiting Student, Georgia Tech (2018.7--2018.9)
Current Position: Postdoctoral Fellow, University of Washington
Yizhou Wang -- Visiting Student, Georgia Tech (2019.1--2019.5)
Current Position: Research Scientist, Adobe
Kaixuan Huang -- Visiting Student, Georgia Tech (2019.7--2019.9)
Current Position: Research Scientist, ByteDance
Qianli Shen -- Visiting Student, Georgia Tech (2019.7--2019.9)
Current Position: Research Scientist, Alibaba Tongyi Lab

About Alchemy

Back When We were Kids
Ali Rahimi - NeurIPS 2017 Test-of-Time Award Presentation [Link]
My Take on Ali Rahimi's "Test of Time" Award Talk at NeurIPS
Quoted from Yann LeCun's Facebook [Link]
Ali Rahimi's Response to Yann LeCun
Quoted from Ali Rahimi's Facebook [Link]
An Addendum to Alchemy
Quoted from Ben Recht's Blog [Link]
The Role of Theory in Deep Learning
Quoted from David McAllester's Blog [Link]

Teaching

Basic Statistical Methods ISYE3030 -- 2019 Summer, 2019 Fall, 2020 Spring, 2020 Fall, Georgia Tech
Advanced Machine Learning ISYE8803 -- 2018 Spring, 2019 Spring, 2020 Fall, Georgia Tech
Introduction to Machine Learning ISYE4803 -- 2018 Fall, Georgia Tech
Machine Learning ISYE6740/CSE6740/CS7641 -- 2017 Spring, Fall, Georgia Tech

NSF Projects

IIS-1717916: Topics in Temporal Marked Point Processes: Granger Causality, Imperfect Observations and Intervention (2017.9 - 2021.8) [Link]
DMS-2012652: Deep Neural Networks for Structured Data: Regression, Distribution Estimation, and Optimal Transport (2020.9-2024.8) [Link]
IIS-2008334: Go Beyond Short-term Dependency and Homogeneity: A General-Purpose Transformer Recipe for Multi-Domain Sequential Data Analysis (2020.9-2025.8) [Link]
DMS-2134037: Bridging Statistical Hypothesis Tests and Deep Learning for Reliability and Computational Efficiency (2022.1-2024.12) [Link]
IIS-2226152: RI: Small: Taming Massive Pre-trained Models under Label Scarcity via an Optimization Lens (2022.9-2026.8) [Link]

Contact
Tuo Zhao
H. Milton Stewart School of Industrial and Systems Engineering
Groseclose 344
755 Ferst Dr. NW
Atlanta, GA 30332
Email: tourzhao (at) gatech (dot) edu