Born
Applied Intelligence

Born-9B dossier · preview release status

The preview winner
is still v2.

This page is the single release record for Born-9B: what the current preview actually is, how it beat base Qwen, where later runs failed to replace it, and what still has to change before a general release deserves a stronger claim.

Public preview

rk500/Born-9B-Qwen3.5-9B-Preview

promoted

0.9244

local weighted score

23 / 25

fixed held-out pass count

0.8559

SWE proxy score

22 / 25

SWE proxy pass count

Born-9B is tuned for coding-agent closure, not decorative reasoning. The preferred behavior is simple: `Plan`, `Patch or Code`, `Checks`, and `Result`, with compact visible rationale and no hidden chain-of-thought targets.

The public preview is the v2 generated-expanded LoRA on top of Qwen/Qwen3.5-9B, not v3, v4, v2-recovery, or preview recovery.

The two numbers that matter most are 0.9244 on the fixed local gate and 0.8559 on the local SWE proxy, with both caveated as project-local probes.

Born-9B is optimized for coding-agent closure: Plan, Patch or Code, Checks, and Result, with compact visible rationale and no hidden <think> targets.

Later runs are transparent diagnostics, not silent upgrades. That honesty is part of the value of this page.

Full timeline

Every important turn,
in one sequence.

13 May 2026

v0 proved the loop, not the quality

The first public LoRA shipped on top of Qwen/Qwen3.5-9B after a small 225-row, roughly 48k-token proof mix on RTX 6000 Ada. It tied base Qwen on the tiny sanity probe, which was useful because the pipeline existed, not because the score was strong.

15 May 2026

v1 got bigger and still lost

The training mix jumped to 6,673 SFT rows and roughly 8.45M tokens. A TeichAI continuation was added, but the held-out 25-task gate still came in below base Qwen: 0.8166 versus 0.8511, with both at 22/25 passes.

15–16 May 2026

v2 became the preview winner

Generated-expanded v2 reached 7,097 rows and roughly 9.11M tokens, plus 85 targeted recovery rows and 14 real DPO pairs. After fixing the evaluator to score the implementation in “Patch or Code:” instead of later fenced checks, v2 became the promoted preview at 0.9244 and 23/25.

16 May 2026

The first post-v2 attempts failed promotion

v3 repair-focus regressed to 0.8003 and 20/25. v4 DPO recovery regressed to 0.8291 and 18/25. v2-recovery improved pass count to 24/25 but only reached 0.9119, so it was preserved and not promoted.

16 May 2026

The exact-code hotfixes still missed the same narrow failure

v2.2 and v2.3 targeted the remaining exact-code contract miss directly, but the chunked iterable/string materialization behavior still failed. The repo keeps those runs as evidence that a same-style SFT patch was not enough.

16 May 2026

v2 also won on the local SWE proxy

On the project’s SWE-bench Verified issue-resolution proxy, Born-9B v2 scored 0.8559 and 22/25 versus base Qwen at 0.7561 and 19/25. This is explicitly a local proxy, not an official Docker-harness SWE result.

17 May 2026

Preview recovery trained cleanly and still stayed below v2

A targeted DeepSeek recovery pass trained cleanly on RTX 6000 Ada, but the interrupted held-out eval only reached 0.8703 over 22 completed tasks. It could not mathematically beat v2, so v2 remained the public preview.

17 May 2026

Public preview identity was locked

The release decision stayed simple: upload the proven v2 adapter as rk500/Born-9B-Qwen3.5-9B-Preview, keep later runs transparent, and shift the general-release plan toward narrow executable recovery work rather than another broad data swell.

Scoreboard

The page keeps the misses.

That is the right way to read the model. Later adapters can be useful diagnostics and still fail to replace the public preview.

Qwen 3.5-9B base

Corrected internal baseline.

0.8511 · 22 / 25

Born-9B v0

Proof artifact, not a win claim.

tiny probe tie

Born-9B v1 + TeichAI

Larger, but still below base.

0.8166 · 22 / 25

Born-9B Preview v2

Current promoted preview checkpoint.

0.9244 · 23 / 25

Born-9B v2-recovery

Higher pass count, lower weighted score.

0.9119 · 24 / 25

Preview recovery

Clean train, non-promoted result.

0.8703 partial · 19 / 22

The SWE figure is a project-local proxy, not an official Docker-harness SWE-bench score. The internal gate is a fixed held-out pack. Both remain useful because they are stable enough to block fake promotion.

Remaining gaps

Where the preview
still needs work.

A narrow exact-code contract failure remains, especially around chunked iterable and string materialization behavior.

Tool-use closure is better than it was, but still weaker than the exact-code slice and still important for the general-release path.

Generic “more data” continuation is now considered lower quality than targeted executable recovery work.

General release path

Precision,
not scale.

Keep v2 as the immutable anchor instead of treating every later run as the new baseline.

Scale executable exact-code recovery rows with real tests, not broad style-heavy continuation data.

Scale tool-use and provider-closure rows around concrete commands, stop conditions, output parsing, and final-result reporting.

Keep rehearsal rows from the promoted v2 mix so recovery work does not erase the current win.

Cap visible-thinking and generic reasoning imports unless they prove themselves on the same held-out gate.

Origin story

The lore now lives
inside the release.

The first meaningful Born-9B story is not triumph. It is restraint: the project published the tie, kept the caveat visible, and treated that honesty as a prerequisite for the next run.

The second story is that the model got better when the work got narrower. v2 became the turning point not because the project found a grand theory, but because it found a better mix of executable, high-signal data and then fixed the evaluator bug that had been misreading the answer format.

The third story is that Born now behaves like a real release system. Later runs can train successfully, look interesting, and still fail promotion. That is the discipline the preview page is trying to show.