Irish inference · EU GDPR · Nvidia H200
Free, local, European
inference.
LeemerLabs runs frontier open-weight models on Nvidia H200s in Ireland. An OpenAI-compatible gateway, generous free quotas, and a data path that never leaves the EU. Built to serve the Leemer Group — opened up for everyone else.
EU GDPR
Native, not bolted on
Ireland-hosted
Waterford + Dublin
Nvidia H200
Frontier inference
Free tier
No card required
POST /v1/chat/completions
{
"model": "lfm2.5-350m-free",
"messages": [
{ "role": "user",
"content": "Parse this JSON →" }
],
"stream": true
}Throughput
200tok/s free
capable of 40,400 on H100
Hardware
H200 · 141GB HBM3e
LFM2.5-350M
Our first hosted model
The Leemer Group
An ecosystem of products — built on one piece of infrastructure.
LeemerLabs exists because our own products outgrew generic APIs. Today the same infrastructure that powers LeemerChat, Foundry, and Critique is available to developers across Europe.
Flagship
LeemerChat
The flagship product of the Leemer Group. Every frontier model in one workspace — GPT-5.4, Claude Opus, Gemini 3, Kimi K2.5, GLM-5.1 — answering together through the KingLeemer consensus architecture.
10B
tokens processed
20+
frontier models
The reason for LeemerLabs
LeemerFoundry
Ireland's first custom LLM creation studio. Your data, your model, our GPUs. Fine-tune open-weights from 1B to 235B parameters on distributed infrastructure. We built LeemerLabs to serve Foundry at scale.
4 wks
data → deployment
235B
MoE supported
Newest from the Group
Critique.sh
GitHub-native AI pull request review with sandbox-backed analysis. The sandbox writes the final artifact. The app publishes it. Reviews become infrastructure — inspectable, repeatable, owned.
v3.1
sandbox-native
48h
release cadence
Why LeemerLabs
Inference built on the assumption that Europe matters.
Nvidia H200 · 141GB HBM3e
Hosted in Ireland · Zero trans-Atlantic routing
Primitive
Local inference in Ireland
Every request is served from European data centres. No trans-Atlantic hops, no silent routing through third-party clouds. Your data stays where the law says it should.
Primitive
EU GDPR protected by default
The architecture is GDPR-native. Data residency, purpose limitation, and deletion rights are primitives in the gateway — not policy documents.
Primitive
Powered by Nvidia H200
Frontier-grade accelerators with 141GB of HBM3e per card. Enough memory to run modern MoE models without the latency penalty of weight streaming.
Primitive
Free inference tier
A real free tier, not a trial. Generous limits, an OpenAI-compatible gateway, and a single public model alias to start against. Paid tiers are opt-in when scale demands it.
How a request flows
Five steps. Zero trans-Atlantic hops.
- 01
TLS 1.3 terminates in Dublin
Inside the EU boundary. Nothing cached at the edge.
- 02
Auth + rate limit
Metered record opens — token counts only, never body.
- 03
Dispatched to an H200 worker
Waterford or Dublin. Prompt is held in memory only.
- 04
Response streams back
Straight to your client. Worker memory is freed at end of request.
- 05
Record finalised
Zero-retention mode purges even the metered row after reconciliation.
Founder note
“The future of AI is not a single super-intelligence. It is systems that coordinate intelligence well.”
— Repath ‘Ray’ Khan · KingLeemer launch, Feb 2026
LeemerLabs is founded by Ray out of Waterford. An Ireland-based builder who has shipped seven AI products since 2023 — and who argues that coordination, sovereignty, and open weights are how serious organisations should operate.
Founding lineup
Our starting models.
Two founding models, one for speed and one for depth. Liquid LFM2.5-350M handles every hot-path request on LeemerChat and will soon run fully offline in your browser. Gemma 4 26B A4B picks up where Liquid stops — more than 10× the active parameters, still served at 40+ tokens per second.
Liquid LFM2.5-350M
alias · lfm2.5-350m-free
Routing, titles, safety gating — every fast path on LeemerChat. Capable of 40,400 tok/s on a single H100; we serve the free tier at a throttled but still blazing 200 tok/s. Offline mode in the browser is next.
350M
params
32K
context
200
tok/s free
Gemma 4 26B A4B
alias · gemma4-26b-a4b
Google DeepMind's MoE Gemma 4. 25.2B total, 3.8B active — more than 10× our speed model. Multimodal, 256K context, native thinking mode, sustained 40+ tok/s.
25.2B
total
3.8B
active
256K
context
Start building
OpenAI-compatible. EU-hosted. Free to start.
Point your client at https://api.leemerlabs.ie/v1, use your existing OpenAI SDK, and stay on European compute.