1d ago

PreFT accelerates multi-adapter LLM inference by restricting LoRA adapters to the prefill stage

It avoids decode-phase memory bottlenecks with minimal performance loss.

26185325.0K

——0——

Original post

#678@ARYAMAN2020OP

Andrew Lanpouthakoun@ASLANPOUTHAKOUN

New paper!! Prefill and decode represent very different inference workloads; when we try to serve many LoRA adapters at once, inference slows down a ton during decode because we are memory bound :( What if we didn’t need those adapters at decode? We introduce Prefill-Only Fine Tuning (PreFT), adapters that are only trained and applied at prefill. We show that this speeds up multi-adapter serving with limited loss in performance!

9:52 AM · May 28, 2026

Reposted by

#1776@INDUCTIONHEADS

PreFT accelerates multi-adapter LLM inference by restricting LoRA adapters to the prefill stage

Cluster engagement

Sentiment