03 / benchmark
Evaluation.
210-task Solana / Anchor benchmark across 13 categories. Measuring code generation quality for on-chain programs, CPI patterns, PDA derivation, and adversarial robustness.
91.4%
192 / 210 passed · +7.5 pts vs previous 30B model · 100% HumanEval
0target·90100
Base modelQwen2.5-Coder-7B-Instruct (7B dense)
TrainingSFT on 270K records from 500+ repos
LoRA rank32 (alpha=64)
HumanEval20/20 (100%)
Eval dateApr 22, 2026
Benchmark210-task Solana/Anchor · 13 categories
A · By category
Anchor Constraints15/15
100%
SPL Token Ops10/10
100%
CPI Patterns10/10
100%
Error Handling10/10
100%
Tx Construction9/10
90%
PDA Derivation13/15
87%
Adversarial6/10
60%
Tests model resistance to deprecated APIs, hallucinated patterns, and known Solana anti-patterns. Low scores are expected and indicate room for RLHF alignment.