Models · Catalogue

Every open-weight model worth hosting.

Two different lists: the inference API (what you can call during open beta) and the Foundry inventory (bases and variants for fine-tuning in Europe). Hosted in Ireland on Nvidia H200s. OpenAI-compatible where the model allows it.

Inference API · open beta

Liquid and Gemma only — for now.

The public inference API is in open beta with two model IDs: a 350M Liquid speed tier and Gemma 4 26B A4B for depth. Both are free for the beta — request a key. The Foundry grid below is a separate list (fine-tuning); it is not what is live on the gateway today.

Request free access Read the full brief

Open beta · Free

Liquid LFM2.5-350M

alias · lfm2.5-350m-free

The speed tier. Routing, titles, and the offline mode coming to LeemerChat. Capable of 40,400 tok/s — we throttle the free tier to a still-blazing 200.

350M

params

32K

context

200 tok/s free

speed

Open beta · Free

Gemma 4 26B A4B

alias · gemma4-26b-a4b

The depth tier. Google DeepMind's MoE Gemma 4 — multimodal, reasoning-native, and more than ten times the active parameters of our speed model.

25.2B / 3.8B

params

256K

context

40+ tok/s

speed

Coming soon · Inference API

Qwen 3.6 and Granite 4.1.

After we stabilise the Liquid + Gemma open beta, the next public inference drop will include two Qwen 3.6 SKUs and IBM Granite-4.1-8B. Pricing below is planned for when they ship — not billed today.

Read the lab note

Coming soon · Inference API

Qwen3.6-27B

Model card · coming soon

Planned dense flagship in the Qwen 3.6 line: full 27B active parameters, aimed at agentic and repository-scale coding, with hybrid attention patterns for long prompts. Not on the public inference API today — the API open beta is Liquid and Gemma only.

Dense option: the full 27B stack activates every token — straightforward behaviour vs small-MoE routing.
Coding-forward release: long files, real repos, agentic “fix this build” work.
Public model cards: hybrid attention, ~256K-class context; we target 70–80 tok/s on our cluster profile.

Planned pricing

EUR · per 1M tokens

Input

€0.15

Output

€1.15

Target speed 70–80 tok/s · Context ~256K

Qwen 3.6 (GitHub)

Text / code-optimised · Apache 2.0 weights

Coming soon · Inference API · Multimodal

Qwen3.6-35B-A3B

Model card · coming soon

Sparse MoE with only ~3B parameters active per token, vision-language in the loop, and native long context at the 256K class. Planned for a future gateway release — not live on inference while we are in open beta with Liquid and Gemma.

Mixture of experts: ~3B active parameters per token — most of a 35B-class model in the weights, smaller inference bill.
Multimodal inputs and default context at the ~262K class (per upstream cards).
Soak tests on the same cluster profile land around 100–120 tok/s ahead of the dense 27B slot.

Planned pricing

EUR · per 1M tokens

Input

€0.15

Output

€1.00

Target speed 100–120 tok/s · Context ~256K

Model card (Hugging Face)

Vision + text · Apache 2.0 weights

Coming soon · Inference API + planned free tier

IBM Granite-4.1-8B

Model card · coming soon

IBM’s 8B instruct line: long-context, multilingual, tuned for RAG, agents, and OpenAI-style tool calling. A different trade-off from Qwen. Coming to the gateway after we stabilise the open beta on Liquid and Gemma.

Enterprise long-tail: RAG, classification, light agents, OpenAI-style function calling.
Dense 8B instruct with a long (≈128K) window and a multilingual post-training mix.
Planned free tier: 10M input + 5M output tokens / month; verified students get 5× those caps.

Planned pricing

EUR · per 1M tokens

Input

€0.10

Output

€0.50

Target speed — · Context ~128K

Model card (Hugging Face)

Free: 10M input + 5M output tokens / month. Verified students: 5× those limits. Details at request — see lab note.

Target speeds are for internal bring-up, not a public SLA. Want on the waitlist? Request access and name the model.

Foundry · fine-tuning

Model inventory.

Bases and variants you can work with in Foundry. This is not the list of what is live on the public inference API right now — that list is only Liquid and Gemma during the open beta. If you need a different base in production first, still talk to us.

Open Foundry Request access

LeemerLabs

In-house

leemerlabs/qwen3-4b-power4B · Reasoning-distilled · free EU endpoint · coming
LeemerGLM-106B-A22BMoE · 22B active · Vision · 96K ctx
Liquid-LeemerLabs-350M350M · LoRA-ready · Insane speed

Gemma

Google DeepMind · Dense · MoE

Gemma 4 E2B2.3B eff · Text, Image, Audio · 128K
Gemma 4 E4B4.5B eff · Text, Image, Audio · 128K
Gemma 4 26B A4B25.2B · 3.8B active · MoE · 256K · inference open beta
Gemma 4 31B30.7B dense · Vision · 256K

Qwen

Dense · MoE · Vision

Qwen3.6-27B27B dense · Agentic coding · ~256K ctx · coming
Qwen3.6-35B-A3B35B / 3B active · MoE-VL · ~256K ctx · coming
Qwen3-4B-Instruct-25074B · Instruct
Qwen3-8B-Base / Instruct8B
Qwen3-32B MoE32B · MoE
Qwen3-30B-A3B30B · 3B active · Base / Instruct
Qwen3-235B-A22B-Instruct-2507235B · 22B active · frontier
Qwen3-VL-30B-A3B-InstructVision · MoE
Qwen3-VL-235B-A22B-InstructVision · MoE · frontier

IBM Granite

IBM · Instruct

Granite-4.1-8B8B dense · Tool calling · ~128K ctx · free tier + paid · coming

LLaMA

Meta · Dense

Llama-3.2-1B / 3BSmall dense
Llama-3.1-8B / Instruct8B
Llama-3.1-70B70B
Llama-3.3-70B-Instruct70B · latest

GPT-OSS

OpenAI open-weights

GPT-OSS-20B20B · MoE
GPT-OSS-120B120B · MoE

DeepSeek

MoE reasoning

DeepSeek V3.1 BaseMoE
DeepSeek V3.1 InstructMoE

Moonshot AI

Frontier reasoning

Kimi K2 ThinkingReasoning
Kimi K2.5 BaseFrontier · Multimodal

Gateway capabilities

What the API does.

supported

OpenAI-compatible

Drop-in at api.leemerlabs.ie/v1. Existing SDKs just work.

supported

Streaming + tools

SSE streaming, tool calling, JSON mode, structured output.

supported

Vision inputs

For VL models, image uploads follow the OpenAI vision schema.

supported

Fine-tuning

Every open-weight base in the Foundry inventory below can be fine-tuned through Foundry — that list is not the same as the public inference API.

Served on

Nvidia H200

141 GB HBM3e · Waterford & Dublin

Every request served on European metal. Zero trans-Atlantic hops.

Request access Fine-tune on Foundry