Lanxiang Hu

I'm a PhD student at UCSD, fortunately advised by Prof. Hao Zhang and Prof. Tajana Šimunić Rosing. My goal is to build efficient, reliable and secure large-scale ML systems and models using both today's and next-generation technology.

I am recently working on algorithms and systems for efficient LLM serving and inference, as well as large model training in memory-constrained scenarios and heterogeous settings.

Before joining UCSD, I have spent wonderful time working with top-notch researchers in the field of ML systems, including working as a visiting reseach intern with Prof. Song Han. I completed my undergrad degree at UC Berkeley with majors in CS and Physics.

Email / Scholar / Twitter / Linkedin / Github

News

[2024/05] Our paper on CLLMs and OSD are accepted to ICML 2024.
[2024/03] A new family of parallel decoders, CLLMs are released. Try our codebase for 2~3 times inference speedup!

Research

CLLMs: Consistency Large Language Models

Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
ICML, 2024
arxiv / code / website /

We show LLMs can be trained to operate LLMs as highly efficient parallel decoders with 2.4x to 3.4x speedup across a variety of benchmarks.

Online Speculative Decoding

Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
ICML, 2024
arxiv /

We introduce online speculative decoding algorithm (OSD) with improved responsiveness, speculation accuracy and compatibility with LLM serving systems.

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023
arxiv / website /

We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices through sparse backpropagation and compile-time optimizations.