Models · Catalogue

Every open-weight model worth hosting.

Two different lists: the inference API (what you can call during open beta) and the Foundry inventory (bases and variants for fine-tuning in Europe). Hosted in Ireland on Nvidia H200s. OpenAI-compatible where the model allows it.

Inference API · open beta

Liquid and Gemma only — for now.

The public inference API is in open beta with two model IDs: a 350M Liquid speed tier and Gemma 4 26B A4B for depth. Both are free for the beta — request a key. The Foundry grid below is a separate list (fine-tuning); it is not what is live on the gateway today.

Coming soon · Inference API

Qwen 3.6 and Granite 4.1.

After we stabilise the Liquid + Gemma open beta, the next public inference drop will include two Qwen 3.6 SKUs and IBM Granite-4.1-8B. Pricing below is planned for when they ship — not billed today.

Read the lab note
QwenQwen
Coming soon · Inference API

Qwen3.6-27B

Model card · coming soon

Planned dense flagship in the Qwen 3.6 line: full 27B active parameters, aimed at agentic and repository-scale coding, with hybrid attention patterns for long prompts. Not on the public inference API today — the API open beta is Liquid and Gemma only.

  • Dense option: the full 27B stack activates every token — straightforward behaviour vs small-MoE routing.
  • Coding-forward release: long files, real repos, agentic “fix this build” work.
  • Public model cards: hybrid attention, ~256K-class context; we target 70–80 tok/s on our cluster profile.

Planned pricing

EUR · per 1M tokens

Input

€0.15

Output

€1.15

Target speed 70–80 tok/s · Context ~256K

Text / code-optimised · Apache 2.0 weights

Qwen
Coming soon · Inference API · Multimodal

Qwen3.6-35B-A3B

Model card · coming soon

Sparse MoE with only ~3B parameters active per token, vision-language in the loop, and native long context at the 256K class. Planned for a future gateway release — not live on inference while we are in open beta with Liquid and Gemma.

  • Mixture of experts: ~3B active parameters per token — most of a 35B-class model in the weights, smaller inference bill.
  • Multimodal inputs and default context at the ~262K class (per upstream cards).
  • Soak tests on the same cluster profile land around 100–120 tok/s ahead of the dense 27B slot.

Planned pricing

EUR · per 1M tokens

Input

€0.15

Output

€1.00

Target speed 100–120 tok/s · Context ~256K

Vision + text · Apache 2.0 weights

IBM
Coming soon · Inference API + planned free tier

IBM Granite-4.1-8B

Model card · coming soon

IBM’s 8B instruct line: long-context, multilingual, tuned for RAG, agents, and OpenAI-style tool calling. A different trade-off from Qwen. Coming to the gateway after we stabilise the open beta on Liquid and Gemma.

  • Enterprise long-tail: RAG, classification, light agents, OpenAI-style function calling.
  • Dense 8B instruct with a long (≈128K) window and a multilingual post-training mix.
  • Planned free tier: 10M input + 5M output tokens / month; verified students get 5× those caps.

Planned pricing

EUR · per 1M tokens

Input

€0.10

Output

€0.50

Target speed · Context ~128K

Free: 10M input + 5M output tokens / month. Verified students: 5× those limits. Details at request — see lab note.

Target speeds are for internal bring-up, not a public SLA. Want on the waitlist? Request access and name the model.

Foundry · fine-tuning

Model inventory.

Bases and variants you can work with in Foundry. This is not the list of what is live on the public inference API right now — that list is only Liquid and Gemma during the open beta. If you need a different base in production first, still talk to us.

LeemerLabs

In-house

  • leemerlabs/qwen3-4b-power4B · Reasoning-distilled · free EU endpoint · coming
  • LeemerGLM-106B-A22BMoE · 22B active · Vision · 96K ctx
  • Liquid-LeemerLabs-350M350M · LoRA-ready · Insane speed

Gemma

Google DeepMind · Dense · MoE

  • Gemma 4 E2B2.3B eff · Text, Image, Audio · 128K
  • Gemma 4 E4B4.5B eff · Text, Image, Audio · 128K
  • Gemma 4 26B A4B25.2B · 3.8B active · MoE · 256K · inference open beta
  • Gemma 4 31B30.7B dense · Vision · 256K

Qwen

Dense · MoE · Vision

  • Qwen3.6-27B27B dense · Agentic coding · ~256K ctx · coming
  • Qwen3.6-35B-A3B35B / 3B active · MoE-VL · ~256K ctx · coming
  • Qwen3-4B-Instruct-25074B · Instruct
  • Qwen3-8B-Base / Instruct8B
  • Qwen3-32B MoE32B · MoE
  • Qwen3-30B-A3B30B · 3B active · Base / Instruct
  • Qwen3-235B-A22B-Instruct-2507235B · 22B active · frontier
  • Qwen3-VL-30B-A3B-InstructVision · MoE
  • Qwen3-VL-235B-A22B-InstructVision · MoE · frontier

IBM Granite

IBM · Instruct

  • Granite-4.1-8B8B dense · Tool calling · ~128K ctx · free tier + paid · coming

LLaMA

Meta · Dense

  • Llama-3.2-1B / 3BSmall dense
  • Llama-3.1-8B / Instruct8B
  • Llama-3.1-70B70B
  • Llama-3.3-70B-Instruct70B · latest

GPT-OSS

OpenAI open-weights

  • GPT-OSS-20B20B · MoE
  • GPT-OSS-120B120B · MoE

DeepSeek

MoE reasoning

  • DeepSeek V3.1 BaseMoE
  • DeepSeek V3.1 InstructMoE

Moonshot AI

Frontier reasoning

  • Kimi K2 ThinkingReasoning
  • Kimi K2.5 BaseFrontier · Multimodal

Gateway capabilities

What the API does.

supported

OpenAI-compatible

Drop-in at api.leemerlabs.ie/v1. Existing SDKs just work.

supported

Streaming + tools

SSE streaming, tool calling, JSON mode, structured output.

supported

Vision inputs

For VL models, image uploads follow the OpenAI vision schema.

supported

Fine-tuning

Every open-weight base in the Foundry inventory below can be fine-tuned through Foundry — that list is not the same as the public inference API.

Nvidia

Served on

Nvidia H200

141 GB HBM3e · Waterford & Dublin

Every request served on European metal. Zero trans-Atlantic hops.