LFM2.5-350M is everywhere in LeemerChat. Before a message ever reaches a frontier model it passes through Liquid: intent routing, tool selection, conversation title generation, safety gating, the tiny classifications that make the product feel fast. The model is capable of 40,400 tokens per second; on the public free tier we throttle it to 200 tok/s — still faster than anything you will feel, and comfortably within our fair-use envelope for a free model.
It is also the model we run on the edge. An offline mode for LeemerChat is coming — the user downloads the weights once, and the whole thing runs in the browser, on-device, with no round-trip to our servers. No cloud, no API key, no latency floor. A 350M model with an 81MB minimum memory footprint is the only class of model where that is actually possible today.
We are honest about what it is not: not a writer, not a mathematician, not a coder. It is the fastest correct answer to a simple question, and the fastest correct routing decision to a hard one.