profile photo

Lanxiang Hu

I'm a PhD student at UCSD, fortunately advised by Prof. Hao Zhang and Prof. Tajana Šimunić Rosing. My goal is to build efficient, reliable and secure large-scale ML systems and models using both today's and next-generation technology.

I am recently working on algorithms and systems for efficient LLM serving and inference, as well as large model training in memory-constrained scenarios and heterogeous settings.

Before joining UCSD, I have spent wonderful time working with top-notch researchers in the field of ML systems, including working as a visiting reseach intern with Prof. Song Han. I completed my undergrad degree at UC Berkeley with majors in CS and Physics.

Email  /  Scholar  /  Twitter  /  Linkedin  /  Github

News

  • [2024/03] A new family of parallel decoders, CLLMs are released. Try our codebase for 2~3 times inference speedup!

Research

project image

CLLMs: Consistency Large Language Models


Siqi Kou*, Lanxiang Hu*, Zhezhi He, Zhijie Deng, Hao Zhang
submitted to ICML, 2024
arxiv / code / website /

We show LLMs can be trained to operate LLMs as highly efficient parallel decoders with 2.4x to 3.4x speedup across a variety of benchmarks.

project image

Online Speculative Decoding


Xiaoxuan Liu, Lanxiang Hu, Peter Bailis, Ion Stoica, Zhijie Deng, Alvin Cheung, Hao Zhang
submitted to ICML, 2024
arxiv /

We introduce online speculative decoding algorithm (OSD) with improved responsiveness, speculation accuracy and compatibility with LLM serving systems.

project image

PockEngine: Sparse and Efficient Fine-tuning in a Pocket


Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
56th IEEE/ACM International Symposium on Microarchitecture (MICRO-56), 2023
arxiv / website /

We introduce PockEngine: a tiny, sparse and efficient engine to enable fine-tuning on various edge devices through sparse backpropagation and compile-time optimizations.





© 2024 Lanxiang Hu. Design and source code from Jon Barron's website, powered by Jekyll.