Lanxiang Hu

I'm a PhD student at UCSD, fortunately advised by Prof. Hao Zhang and Prof. Tajana Šimunić Rosing. My research interest is in building efficient and reliable AI models and systems at scale.

I am currently working on efficient reasoning, as well as trustworthy AI evaluations.

Before joining UCSD, I have spent wonderful time working with top-notch researchers in the field of ML systems, including working as a visiting reseach intern with Prof. Song Han. I completed my undergrad degree at UC Berkeley with majors in CS and Physics.

Email / Scholar / Twitter / Linkedin / Github

News

[2025/06] Lmgame Bench now supports model evaluation with and without gaming harness support (agentic workflow).
[2025/04] Our latest Lmgame Bench leaderboard and GamingAgent are now live! An exciting milestone toward evaluation of multi-turn interactive LLM/VLM agents.
[2025/02] We are excited to announce Game Arena for AI evaluations while playing games. Our game, AI Space Escape, is live on now Roblox!
[2025/01] Our papers on Game Arena and LongPack are accepted to ICLR 2025.
[2024/06] I joined Snowflake AI research for summer internship.
[2024/05] Our papers on CLLMs and OSD are accepted to ICML 2024.
[2024/03] A new family of parallel decoders, CLLMs are released. Try our codebase for 2~3 times inference speedup!

Selected Publications

* denotes for equal contribution. † denotes for significant contribution.

	Lmgame-Bench: How Good are LLMs at Playing Games? Lanxiang Hu, Mingjia Huo, Yuxuan Zhang†, Haoyang Yu†, Eric P. Xing, Ion Stoica, Tajana Rosing, Haojian Jin, Hao Zhang NeurIPS in submission, 2025 arxiv / code / website / We introduce lmgame-bench that evaluates latest large models with games and addresses evaluation challenges by providing scaffolds. We present quantitative analysis of the relationship between model gaming performance and results on existing benchmarks.
	GameArena: Evaluating LLM Reasoning through Live Computer Games Lanxiang Hu, Qiyu Li, Anze Xie, Nan Jiang, Ion Stoica, Haojian Jin, Hao Zhang ICLR*, 2025 arxiv / code / website / We design and build a incentivized dynamic benchmarks to evaluate AI reasoning abilities extending beyond math and coding.
	Scaling Long Context Training Data by Long-Distance Referrals Yonghao Zhuang, Lanxiang Hu, Longfei Yun, Souvik Kundu, Zhengzhong Liu, Eric P. Xing, Hao Zhang ICLR, 2025 arxiv / We show long distance referral is important to long context training, and design data pipeline to scale up constructing such data.
	TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs Lanxiang Hu, Tajana Rosing, Hao Zhang ACL, 2025 arxiv / code / we introduce an algorithm to progressively prune MHA and MLP layers during domain-specific SFT to achieve up to 5.7x speedup and 60% less memory consumption in comparison with state-of-the-art model compression algorithms.
	TurboSpec: Adaptive Speculation for Leveraging Both Inter- and Intra-Request Parallelisms in LLM Serving Xiaoxuan Liu, Jongseok Park, Lanxiang Hu, Cade Daniel, Woosuk Kwon, Zhuohan Li, Chen Zhang, Kuntai Du, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang SOSP in submission, 2025 arxiv / We introduce TruboSpec on LLM serving system that dynamically adjust speculation lengths in continous batching, delivering latency reductions up to 3.16x compared to non-speculative baselines.
	SensorQA: A Question Answering Benchmark for Daily-Life Monitoring Benjamin Reichman, Xiaofan Yu, Lanxiang Hu, Jack Truxal, Atishay Jain, Rushil Chandrupatla, Tajana Šimunić Rosing, Larry Heck ACM SenSys, 2025 arxiv / we introduce SensorQA, the first human-created question-answering (QA) dataset for long-term time-series sensor data for daily life monitoring.
	CLLMs: Consistency Large Language Models Siqi Kou, Lanxiang Hu, Zhezhi He, Zhijie Deng, Hao Zhang ICML, 2024 arxiv / code / website / We show LLMs can be trained to operate LLMs as highly efficient parallel decoders with 2.4x to 3.4x speedup across a variety of benchmarks.
	Online Speculative Decoding Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang ICML, 2024 arxiv / We introduce online speculative decoding algorithm (OSD) with improved responsiveness, speculation accuracy and compatibility with LLM serving systems.
	PockEngine: Sparse and Efficient Fine-tuning in a Pocket Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han 56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023 arxiv / website / We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices through sparse backpropagation and compile-time optimizations.