BornBench v4 · MIT licensed
A 150-question smoke test for coding agents.
One hundred fifty exact-scored original tasks across execution, hard-style algorithms, SWE patch judgment, repo discipline, tool use, terminal recovery, security, data systems, policy, web evidence, performance, and abstraction.
150
items
15
lanes
v4.0
exact score
MIT
license
Code execution
Python, Node, SQL, Go, Java, JavaScript, CSS, Ruby, TypeScript.
Algorithmic edge
Pagination, TTLs, races, graph rejection, heaps, rollback, sliding windows.
LeetCode hard style
Original hard-style exact-output tasks across dynamic programming, matching, graphs, and intervals.
Patch reasoning
SWE-style smallest safe fix under regressions and product constraints.
Repo context
Generated files, package managers, CI matrices, migrations, dirty worktrees.
Tool protocol
Valid calls, stale SHAs, retries, tool authority, tool-output injection.
Terminal ops
Shell semantics, permissions, Git, Docker layer cache, pipefail, and archive safety.
Security
JWTs, HMAC, SSRF, XSS, authz, dependency confusion, CSRF, prompt injection.
Data reasoning
Leaderboard math, calibration, token cost, leakage, latency, tie-breaks.
Database and distributed
SQL nulls, isolation, quorums, vector clocks, CRDTs, Redis, Kafka.
Web evidence
GAIA/BrowseComp-style source freshness, disambiguation, and citation traps.
Long context
Instruction precedence, release caveats, immutable policy, stale docs.
Agent policy
Refunds, airline changes, collateral damage, confirmation, API-policy conflict.
Performance and concurrency
Complexity, locks, cache semantics, backpressure, N+1s, stampedes.
Abstract reasoning
ARC-style symbolic transfer under concise constraints.
V4 runner · 150 items
BornScore first. Raw exact stays visible.
The runner below loads the BornBench v4 exact-answer suite: difficulty-weighted accuracy, lane balance, hard-item accuracy, and final-answer reliability. Raw exact score remains beside it.
OpenRouter runner
Leaderboard
Loading benchmark
| Model | BornScore | Done | Tokens | Cost | Time | State |
|---|---|---|---|---|---|---|
| Paste a key, confirm the model list, and run BornBench. | ||||||