ForceDream Research OS · FD-2026-001
Inference Arbitrage Router: Cost-Quality Optimisation Across Heterogeneous LLM Providers
WORM Access Seal · L828
fd2026001a3f7c2b
We present the ForceDream Inference Arbitrage Router (IAR), a dynamic multi-objective routing architecture that continuously arbitrages inference requests across heterogeneous large language model providers. The IAR evaluates four primary routing signals to construct a Pareto-optimal dispatch decision within a single request lifecycle. We demonstrate 43.2% cost reduction while maintaining quality scores within 4.1% of the theoretical maximum. All routing decisions are WORM-sealed at dispatch.
1. Introduction
Modern AI infrastructure requires routing inference requests across multiple providers to optimise for cost, quality, and latency simultaneously. Single-provider strategies expose systems to pricing volatility, capacity constraints, and quality variance across task types. The ForceDream Inference Arbitrage Router addresses this through continuous multi-objective optimisation at the routing layer, operating below all application-level business logic and above the raw provider APIs.
2. Architecture
The IAR maintains a provider state matrix P of dimensions n x 4, where n is the number of registered providers and the four columns represent cost-per-token, observed latency (EWMA, alpha=0.3), quality score (WORM-derived), and availability (binary, 200ms intervals). At dispatch time, the router solves a lightweight Pareto optimisation over P conditioned on the selected priority mode. The priority mode maps to a target region on the Pareto frontier.
Unlock the full paper
Enter your name and email to read all 6 sections and receive the PDF. Free. WORM-sealed. New papers delivered automatically.
✓ Free access✓ WORM-sealed✓ No spam✓ Auto-delivered