Lanxiang Hu
I'm a PhD student at UCSD, fortunately advised by Prof. Hao Zhang and Prof. Tajana Šimunić Rosing. My goal is to build efficient, reliable and secure large-scale ML systems and models using both today's and next-generation technology.
I am recently working on algorithms and systems for efficient LLM serving and inference, as well as large model training in memory-constrained scenarios and heterogeous settings.
Before joining UCSD, I have spent wonderful time working with top-notch researchers in the field of ML systems, including working as a visiting reseach intern with Prof. Song Han. I completed my undergrad degree at UC Berkeley with majors in CS and Physics.
Email /
Scholar /
Twitter /
Linkedin /
Github
|
News
- [2024/05] Our paper on CLLMs and OSD are accepted to ICML 2024.
- [2024/03] A new family of parallel decoders, CLLMs are released. Try our codebase for 2~3 times inference speedup!
|
|
CLLMs: Consistency Large Language Models
Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
ICML, 2024
arxiv /
code /
website /
We show LLMs can be trained to operate LLMs as highly efficient parallel decoders with 2.4x to 3.4x speedup across a variety of benchmarks.
|
|
Online Speculative Decoding
Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
ICML, 2024
arxiv /
We introduce online speculative decoding algorithm (OSD) with improved responsiveness, speculation accuracy and compatibility with LLM serving systems.
|
|
PockEngine: Sparse and Efficient Fine-tuning in a Pocket
Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023
arxiv /
website /
We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices through sparse backpropagation and compile-time optimizations.
|
|