profile photo

Lanxiang Hu

I'm a PhD student at UCSD, fortunately advised by Prof. Hao Zhang and Prof. Tajana Šimunić Rosing. My research interest is in building efficient and reliable AI models and systems at scale.

I am currently working on reasoning for text2SQL and science, as well as AI evaluations.

Before joining UCSD, I have spent wonderful time working with top-notch researchers in the field of ML systems, including working as a visiting reseach intern with Prof. Song Han. I completed my undergrad degree at UC Berkeley with majors in CS and Physics.

Email  /  Scholar  /  Twitter  /  Linkedin  /  Github

News

  • [2025/02] We are excited to announce Game Arena for AI evaluations while playing games. Our game, AI Space Escape, is live on now Roblox!
  • [2025/01] Our papers on Game Arena and Long Pack are accepted to ICLR 2025.
  • [2024/06] I joined Snowflake AI research for summer internship.
  • [2024/05] Our papers on CLLMs and OSD are accepted to ICML 2024.
  • [2024/03] A new family of parallel decoders, CLLMs are released. Try our codebase for 2~3 times inference speedup!

Selected Publications

* denotes for equal contribution.

project image

GameArena: Evaluating LLM Reasoning through Live Computer Games


Lanxiang Hu*, Qiyu Li*, Anze Xie*, Nan Jiang, Ion Stoica, Haojian Jin, Hao Zhang
ICLR, 2025
arxiv / code / website /

We design and build a incentivized dynamic benchmarks to evaluate AI reasoning abilities extending beyond math and coding.

project image

Scaling Long Context Training Data by Long-Distance Referrals


Yonghao Zhuang*, Lanxiang Hu*, Longfei Yun, Souvik Kundu, Zhengzhong Liu, Eric P. Xing, Hao Zhang
ICLR, 2025
arxiv /

We show long distance referral is important to long context training, and design data pipeline to scale up constructing such data.

project image

TrimLLM: Progressive Layer Dropping for Domain-Specific LLMs


Lanxiang Hu, Tajana Rosing, Hao Zhang
ACL in submission, 2025
arxiv / code /

we introduce an algorithm to progressively prune MHA and MLP layers during domain-specific SFT to achieve up to 5.7x speedup and 60% less memory consumption in comparison with state-of-the-art model compression algorithms.

project image

TurboSpec: Adaptive Speculation for Leveraging Both Inter- and Intra-Request Parallelisms in LLM Serving


Xiaoxuan Liu, Jongseok Park, Lanxiang Hu, Cade Daniel, Woosuk Kwon, Zhuohan Li, Chen Zhang, Kuntai Du, Xiangxi Mo, Alvin Cheung, Zhijie Deng, Ion Stoica, Hao Zhang
OSDI in submission, 2025
arxiv /

We introduce TruboSpec on LLM serving system that dynamically adjust speculation lengths in continous batching, delivering latency reductions up to 3.16x compared to non-speculative baselines.

project image

SensorQA: A Question Answering Benchmark for Daily-Life Monitoring


Benjamin Reichman*, Xiaofan Yu*, Lanxiang Hu, Jack Truxal, Atishay Jain, Rushil Chandrupatla, Tajana Šimunić Rosing, Larry Heck
ACM SenSys, 2025
arxiv /

we introduce SensorQA, the first human-created question-answering (QA) dataset for long-term time-series sensor data for daily life monitoring.

project image

CLLMs: Consistency Large Language Models


Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
ICML, 2024
arxiv / code / website /

We show LLMs can be trained to operate LLMs as highly efficient parallel decoders with 2.4x to 3.4x speedup across a variety of benchmarks.

project image

Online Speculative Decoding


Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
ICML, 2024
arxiv /

We introduce online speculative decoding algorithm (OSD) with improved responsiveness, speculation accuracy and compatibility with LLM serving systems.

project image

PockEngine: Sparse and Efficient Fine-tuning in a Pocket


Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023
arxiv / website /

We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices through sparse backpropagation and compile-time optimizations.





© 2024 Lanxiang Hu. Design and source code from Jon Barron's website, powered by Jekyll.