03 / benchmark

Evaluation.

210-task Solana / Anchor benchmark across 13 categories. Measuring code generation quality for on-chain programs, CPI patterns, PDA derivation, and adversarial robustness.

91.4%

192 / 210 passed · +7.5 pts vs previous 30B model · 100% HumanEval

0target·90100
Base modelQwen2.5-Coder-7B-Instruct (7B dense)
TrainingSFT on 270K records from 500+ repos
LoRA rank32 (alpha=64)
HumanEval20/20 (100%)
Eval dateApr 22, 2026
Benchmark210-task Solana/Anchor · 13 categories
Anchor Constraints15/15

100%

SPL Token Ops10/10

100%

CPI Patterns10/10

100%

Error Handling10/10

100%

Tx Construction9/10

90%

PDA Derivation13/15

87%

Adversarial6/10

60%

Tests model resistance to deprecated APIs, hallucinated patterns, and known Solana anti-patterns. Low scores are expected and indicate room for RLHF alignment.