Back to lab notes
Qwen
IBM

26 April 2026 · Roadmap

Qwen 3.6 and Granite 4.1: what is joining the EU gateway

Today the public inference API is in open beta with Liquid and Gemma only — free keys via request access. This post is about the next three inference SKUs, not the Foundry training catalogue, and the EUR pricing we are planning for when they ship.

Why these three, on European metal

EU customers keep asking for the same three things: long context without shipping prompts to the US, open weights they can audit and fine-tune, and price predictabilityin euro. The Qwen 3.6 line from Alibaba’s Qwen team and IBM’s Granite 4.1 family are Apache 2.0-class releases that make sense side by side: two different points on the Pareto frontier, plus a compact instruct model for the workloads where an 8B is enough.

Public model cards for Qwen3.6-27B and Qwen3.6-35B-A3B describe a coding-forward 3.6 generation with hybrid attention patterns and, on the 35B-A3B variant, roughly 3B active parameters per token, multimodal inputs, and a default context at the 262K class. Granite-4.1-8B is a different beast: a dense 8B instruct model with ~128K context, strong tool-calling, and a multilingual post-training mix — the sort of workhorse you drop behind a RAG or agent router when latency and cost matter as much as raw benchmark zingers.

The models in one glance

Qwen3.6-27B is the dense option: the full 27B stack activates every step. That is more compute per token than MoE, but the behaviour is easy to reason about, and the public release is aimed squarely at agentic coding— long files, real repos, “fix this build” work.

Qwen3.6-35B-A3B is a mixture of experts with a tiny active budget (~3B). You keep most of a 35B model’s capacity in the weights, but you pay a small-MoE inference bill — which is how we are getting into the 100–120 token/s range in soak tests, ahead of the dense 27B slot at 70–80 token/s on the same cluster profile.

IBM Granite-4.1-8B is our enterprise long-tail offer: RAG, classification, light agents, and OpenAI-style function calling, with a long (≈128K) window and a free tier that makes experiments cheap — including a allowance for verified students, because we would rather you broke things in our EU sandbox than in production without guardrails.

Planned pricing (when these ship, EUR)

Numbers below are target bands for when these models join the paid inference tier. They are not what you pay on the Liquid + Gemma open beta today. Everything is per 1M tokens.

ModelInputOutput
Qwen3.6-27B
€0.15
€1.15
Qwen3.6-35B-A3B
€0.15
€1.00
IBM Granite-4.1-8B
€0.10
€0.50

Granite-4.1-8B free tier (separate from the Qwen paid row): 10M input and 5M output tokens per month on the house. Verified students get five timesthose caps. Claiming the student bump is still manual for now — put “student tier” in the notes when you request access and we will walk you through verification.

How to get on the list

  1. Read the models catalogue for aliases when they go live.
  2. Request access with the workloads you want to run (context length, vision on/off, expected QPS). We are deliberately slow-rolling H200 capacity.
  3. If you are standardising on OpenAI clients, point them at the same EU endpoint you use today — when your key is allowlisted, the new model IDs will show up in the console email.

Roadmap

We will keep publishing numbers while the fleet burns in.

  • — Qwen 3.6 and Granite 4.1 sit on the same H200 story as the rest of the catalogue.
  • — Multimodal and long context are only useful if the bill and latency match — hence the two Qwen price bands.
  • — When preview pricing closes, you will get notice before the first production invoice.