Lanxiang Hu
I'm a PhD student at UCSD, fortunately advised by Prof. Hao Zhang and Prof. Tajana Šimunić Rosing. My research interest is in building efficient and reliable AI models and systems at scale.
I am currently working on reasoning for text2SQL and science, as well as AI evaluations.
Before joining UCSD, I have spent wonderful time working with top-notch researchers in the field of ML systems, including working as a visiting reseach intern with Prof. Song Han. I completed my undergrad degree at UC Berkeley with majors in CS and Physics.
Email /
Scholar /
Twitter /
Linkedin /
Github
|
News
- [2025/02] We are excited to announce Game Arena for AI evaluations while playing games. Our game, AI Space Escape, is live on now Roblox!
- [2025/01] Our papers on Game Arena and Long Pack are accepted to ICLR 2025.
- [2024/06] I joined Snowflake AI research for summer internship.
- [2024/05] Our papers on CLLMs and OSD are accepted to ICML 2024.
- [2024/03] A new family of parallel decoders, CLLMs are released. Try our codebase for 2~3 times inference speedup!
|
* denotes for equal contribution.
|
|
GameArena: Evaluating LLM Reasoning through Live Computer Games
Lanxiang Hu*, Qiyu Li*, Anze Xie*, Nan Jiang, Ion Stoica, Haojian Jin, Hao Zhang
ICLR, 2025
arxiv /
code /
website /
We design and build a incentivized dynamic benchmarks to evaluate AI reasoning abilities extending beyond math and coding.
|
|
Scaling Long Context Training Data by Long-Distance Referrals
Yonghao Zhuang*, Lanxiang Hu*, Longfei Yun, Souvik Kundu, Zhengzhong Liu, Eric P. Xing, Hao Zhang
ICLR, 2025
arxiv /
We show long distance referral is important to long context training, and design data pipeline to scale up constructing such data.
|
|
TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs
Lanxiang Hu, Tajana Rosing, Hao Zhang
ACL in submission, 2025
arxiv /
code /
we introduce an algorithm to progressively prune MHA and MLP layers during domain-specific SFT to achieve up to 5.7x speedup and 60% less memory consumption in comparison with state-of-the-art model compression algorithms.
|
|
TurboSpec: Adaptive Speculation for Leveraging Both Inter- and Intra-Request Parallelisms in LLM Serving
Xiaoxuan Liu, Jongseok Park, Lanxiang Hu, Cade Daniel, Woosuk Kwon, Zhuohan Li, Chen Zhang, Kuntai Du, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
OSDI in submission, 2025
arxiv /
We introduce TruboSpec on LLM serving system that dynamically adjust speculation lengths in continous batching, delivering latency reductions up to 3.16x compared to non-speculative baselines.
|
|
SensorQA: A Question Answering Benchmark for Daily-Life Monitoring
Benjamin Reichman*, Xiaofan Yu*, Lanxiang Hu, Jack Truxal, Atishay Jain, Rushil Chandrupatla, Tajana Šimunić Rosing, Larry Heck
ACM SenSys, 2025
arxiv /
we introduce SensorQA, the first human-created question-answering (QA) dataset for long-term time-series sensor data for daily life monitoring.
|
|
CLLMs: Consistency Large Language Models
Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
ICML, 2024
arxiv /
code /
website /
We show LLMs can be trained to operate LLMs as highly efficient parallel decoders with 2.4x to 3.4x speedup across a variety of benchmarks.
|
|
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
ICML, 2024
arxiv /
We introduce online speculative decoding algorithm (OSD) with improved responsiveness, speculation accuracy and compatibility with LLM serving systems.
|
|
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023
arxiv /
website /
We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices through sparse backpropagation and compile-time optimizations.
|
|