Audited performance · speed + trust

Performance you can measure.
Trust as a new engine column.

Most inference engines publish tok/s only. HELIX adds a trust surface: a 0–100% confidence score at under 1% compute (near-zero added cost on GPU). Serving numbers below are from the official vLLM GuideLLM harness on AMD EPYC 9254 — 640/640 requests, zero errors M.

HELIX runs across GPU, CPU, edge, and IoT. These CPU serving rows are one measured lane — not the whole product identity. Prefer residual risk at a review budget when quoting trust results.

940ms

p50 TTFT at c=1

12.9×

faster TTFT than Ollama (c=4)

100%

valid JSON, every tier

98.5%

of FP16 accuracy at 4-bit

Primary matrix

HELIX v1.7 vs stock Ollama

System	Conc.	TTFT p50	TTFT p95	Valid JSON	Tokens / req
HELIX v1.7 #58	c=1	940 ms	1,089 ms	100%	~85 complete
HELIX v1.7 #58	c=2	1,687 ms	1,946 ms	100%	~87 complete
HELIX v1.7 #58	c=4	2,469 ms	3,053 ms	100%	~87 complete
Stock Ollama	c=1	8,416 ms	10,622 ms	~84%	~16 fragment
Stock Ollama	c=2	16,262 ms	17,088 ms	~84%	~16 fragment
Stock Ollama	c=4	31,816 ms	34,531 ms	~84%	~16 fragment

Ollama returns ~16-token fragments at 32K (full Mamba2 SSM state rebuild); HELIX returns complete schema-valid extractions. M

TTFT advantage

Conc.	HELIX	Ollama	Advantage
c=1	940 ms	8,416 ms	8.95×
c=2	1,687 ms	16,262 ms	9.64×
c=4	2,469 ms	31,816 ms	12.89×

Why it holds up

PARALLEL_SLOTS=4. Four KV-warmed inference slots serve concurrent requests in parallel — a 6.9× c=4 TTFT improvement over single-slot builds.
Strict NUMA pinning. All 22 threads pinned to one CCD and its local DDR5 controllers — no cross-die Infinity Fabric traffic.
100% JSON compliance. Logit-bias grammar-masked sampling constrains decoding to your schema — automated pipelines never halt.
Near-zero TTFT variance. TTFT mean ≈ p50 (942 ms ≈ 940 ms) — the OOM and serialization tails are eliminated.

Run the same benchmark in your environment.

Book a call for the full technical report and a sovereign HELIX pod on your hardware.

Book a call

Performance you can measure.Trust as a new engine column.

HELIX v1.7 vs stock Ollama

Run the same benchmark in your environment.

Performance you can measure.
Trust as a new engine column.