Machine Learning

Designing the Next-GenerationFoundation Model Architecture for Edge AI



What if your phone could outpace bigger cloud models on the tasks you care about—and still keep quality high? We walk through how we built LFM2, our family of “Liquid Foundation Models,” by starting from the hardware up. Phones face tight memory and bandwidth, laptops have more RAM but similar bottlenecks, and cloud GPUs invite scale but at a cost. So instead of forcing one transformer to fit all, we engineered inference‑aware architectures that adapt to each device class.

We explain how our STAR evolutionary search explores a modular design space that mixes attention with state space models, linear attention, and convolutional components. Every candidate is judged in the real world: hardware‑in‑the‑loop measurements capture KV cache, activation memory, prefill throughput, and decode latency on a Galaxy S24, then models are trained and checked for perplexity and benchmark quality. Over generations, weak designs are pruned and strong ones recombine, moving toward an architecture that is not only smaller and faster but also competitive on MMLU, multilingual MMLU, and math.

The results are tangible. LFM2 outperforms larger transformer baselines like Llama variants on edge decode speed, and the gains transfer to laptops with AMD Ryzen CPUs. We’ve released open weights at multiple sizes (350M to 1.2B, plus 2.6B and an MoE variant) with integrations in llama.cpp and NPU paths for Qualcomm, with AMD support coming soon. Beyond text, we extend the same efficient backbone to vision‑language models for on‑device video understanding and to a native audio stack that interleaves text and audio tokens—reducing latency and simplifying the classic STT→LLM→TTS chain.

If you care about low latency AI, local privacy, and real‑world deployment on phones and PCs, this conversation is for you. Try the open‑weight models, fine‑tune on your data, and let us know how they run on your hardware. Subscribe, share with a teammate who ships on-device, and leave a review to help more builders find the show.

source

Authorization
*
*
Password generation