HELIX v1.2 — Inference Engine Technology

The HELIX Inference Engine

HELIX is not a model. It is an inference engine that integrates with your existing LLM — making it run faster, more accurately, on CPU infrastructure you already own.

No model replacement · No retraining · No GPU dependency · Works with Granite, Llama, Mistral, Qwen, and other MoE architectures

Two Customer Scenarios

HELIX delivers value across both cloud migration and on-premises performance uplift.

Scenario 1

Cloud / API → On-Premises

Running inference via cloud API or GPU-as-a-service and want to move on-premise — either for data sovereignty, cost control, or regulatory compliance. HELIX makes your chosen model sovereign-CPU-ready without performance compromise.

  • Same model, no more API bills or data exposure
  • Air-gapped, non-root, sovereign deployment
  • Deploy on existing AMD EPYC / OpenShift estate
Scenario 2

Run a Bigger Model on the Same Hardware

Already running a small or tiny model on CPU and want better quality — but can't afford GPU or more hardware. HELIX lets you step up (e.g. Granite Tiny → Granite Small) and stay on CPU, with HELIX keeping the compute cost the same or lower.

  • Larger, smarter model — same CPU budget
  • ~90% active parameter reduction keeps it fast
  • Accuracy improves — noise reduction is a byproduct
Without HELIX

Your LLM on CPU — As-Is

  • All parameters execute every token — the vast majority are irrelevant noise for that specific input
  • Memory bandwidth saturated moving parameters that contribute nothing to this token
  • Slow throughput forces GPU dependency or quantization — permanent quality loss
  • Irrelevant activations inject noise — accuracy lower than it should be
With HELIX

Your LLM + HELIX Precision Filtering

  • HELIX selects only the most relevant ~10% of active parameters per token — signal, not noise
  • Model is never modified. Full parameter set preserved — filtering is at execution, not structure
  • ~90% less data movement → 2.1× throughput on same CPU — no hardware changes
  • Noise elimination improves accuracy — validated on Granite 4.0 Small at IBM Fusion

"The model is not made smaller. The execution is made more precise."

Integration Process

HELIX integrates with your existing LLM through a structured mapping and rebuild process. Your model and infrastructure stay in place.

1

Model Mapping

HELIX analyzes your model's architecture — layer structure, attention heads, MoE routing — and builds a precision filter map specific to your model.

2

Rebuild & Package

The inference runtime is rebuilt with HELIX's precision filtering layer integrated. Output is a production-ready image — your model, HELIX-optimised, in your target format.

3

Deploy & Run

Drop the image into your OpenShift, Docker, or SIF environment. No code changes. No infrastructure rebuild. Immediate performance uplift.

20 Pre-Built Deployment Images

HELIX ships as a catalog of 20 pre-built images covering common enterprise deployment targets. No build pipeline required — pull the image, configure your model path, deploy in minutes.

OpenShift / Kubernetes

OCI-compatible. Non-root execution. Liveness, readiness, and startup probes. IBM Fusion HPC validated. Restricted PodSecurity compliant.

Docker / Podman

Standard Docker images for on-premise, edge, or hybrid environments. Multi-arch. Compose-compatible. Air-gap pull available.

Singularity / SIF

For HPC clusters without Docker. Single-file deployment. Verified on UNSW HPC infrastructure. Apptainer compatible.

Non-root executionAir-gap compatibleRead-only configIBM Fusion testedUNSW HPC testedAMD EPYC optimisedKubernetes probesMulti-arch

What's Under the Hood

Three patent-pending core systems working together.

Patent Pending

Unified Truth Score (UTS)

Per-token uncertainty metric combining semantic entropy with geometric distance from the truthfulness manifold. 100% telemetry coverage on every token generated.

Patent Pending

Truthfulness Manifold

Pre-computed geometric structure encoding structural coherence — syntax, logic, causality. Constructed once during model mapping, used at inference time for precision slice selection.

Patent Pending

MoE Precision Slicing

On MoE architectures, HELIX selects the optimal expert slice per token. Validated on Granite 4.0 (9B→1.125B active) and Qwen3-30B-A3B. Architecture-independent implementation.

Supported Model Architectures

  • Mixture-of-Experts (MoE)
  • Dense transformers
  • IBM Granite family
  • Llama / Meta
  • Mistral / Mixtral
  • Qwen / Alibaba
  • GGUF-compatible models

Hardware & Infrastructure

  • AMD EPYC (validated: 9254, 9374)
  • Intel Xeon Scalable
  • ARM Neoverse (NVIDIA Vera-class)
  • IBM Power / Fusion HPC
  • Any x86-64 Linux CPU estate
  • Kubernetes / OpenShift
  • HPC SLURM environments