HELIX was validated against IBM Granite 4.0 Hybrid Small running on AMD EPYC 9254 at IBM Fusion. Every number is a measurement on identical hardware — the only variable is HELIX.
These benchmarks show what HELIX does to your model — not the model's own performance. The deltas are what you gain.
What this means: If you're running Granite 4.0 Small today without HELIX, adding HELIX gives you these exact gains — same model, same hardware.
| Metric | No HELIX | + HELIX | Delta |
|---|---|---|---|
| GSM8K Accuracy | 90.60% | 92.42% | +1.82pp |
| HumanEval pass@1 | ~83–84% | 93.90% | ~+10pp |
| MMLU STEM | — | 71.50% | 18 domains |
| Throughput | 6.8 tok/s | 14.4 tok/s | 2.1× faster |
| Active params/token | 9B (MoE) | 1.125B | ~87.5% reduction |
| Completion errors | Present | 0 | Zero all runs |
| Telemetry | None | 100% | Per-token UTS |
Hardware: AMD EPYC 9254 shared pod, IBM Fusion HPC. Multiple concurrent workloads — conservative, production-realistic. Validated March 2026.
Accuracy improves because irrelevant parameter activations inject noise into generation. HELIX eliminates that noise — it does not change the model, only which parameters execute. Fewer irrelevant activations = cleaner signal = better output. This is "addition by subtraction."
Note: Video shows HELIX with full per-token UTS telemetry logging enabled. Production deployments without logging run materially faster. The 2.1× figure is the logged configuration — unlogged is higher.