Free public testing
Use hosted open-weight models to see the VGC workflow, scorecard, sample cost read, and claim gates without entering credentials.
Run free hosted open-model tests here, then request testing tokens when you want expanded commercial/provider comparisons in the protected live lab.
Why care: instead of forcing one model everywhere, VGC turns model choice into operating leverage you can see. Access boundary: public runs use hosted open-weight models; commercial/provider runs require protected access, testing tokens, server-side credentials, or an approved BYOK workspace.
Use hosted open-weight models to see the VGC workflow, scorecard, sample cost read, and claim gates without entering credentials.
Request testing tokens for OpenAI, Claude, Gemini, DigitalOcean, private connector, or larger repeat batteries with ledger-backed reporting.
Provider keys stay server-side or in an authenticated workspace vault. The public demo never collects provider API keys.
Protected benchmark-map high-water evidence, dated in the current claim scorecard; not a guarantee from one free public sample run.
Protected benchmark-derived posture. Hard-dollar claims require provider usage or billing reconciliation.
Measured protected p99/floor evidence, not a universal provider latency claim.
Measured bounded fit-preservation evidence from protected claim artifacts, not a broad truth guarantee.
This is the fastest way to understand the page and why it matters. Run one prompt across real hosted models, then watch VGC show where the route, savings, and operating advantage come from.
Use compare mode for side-by-side raw model results, or switch to architecture-selected modes to let VGC route the task.
Paste a real use case, use the current preset, or generate another sample prompt to fit the story you want to show.
The server runs the selected models, captures the outputs, and updates the run monitor and event feed as work completes.
The scoreboard and delta view show whether the VGC route improves on a flat raw-model choice for the current task.
Start here. This is the operator-facing workflow for the run: the current prompt, the latest response, and the exact text VGC is routing through the benchmark surface.
The current prompt will appear here.
Run the live benchmark to capture a response.
Then watch the run happen. This section is the live state layer for the hosted benchmark: progress, per-model bars, and what the architecture is doing right now.
This event log keeps the benchmark legible. It shows model starts, completions, timeouts, and stop events so the demo feels like a transparent product run, not a black box.
This is the private-lab commercial view. It answers which provider lane should come next, how it unlocks, and what private-preview posture we should use while the public demo stays clean.
This is the private reporting layer behind the benchmark. It shows recent recorded runs, average savings signals, and the model lanes winning most often on this host.
This is the VGC layer in measurable form: inference savings, token-cost reduction, latency improvement, and fit preservation, tied back to the current prompt, route, and model fingerprint. When control mode is baseline core, treat this section as the no-layer reference before comparing against a governed pass.
This is the current proof ledger for visible claim posture. Public demo hosts summarize session and public-gate evidence; the protected live lab carries detailed commercial/provider rows.
This is the production scoreboard contract: primary rows carry work, cost, truth, and p99/floor latency gates; secondary rows carry checkpoint, prefix, session, and attribution evidence without lowering the protected hold-line benchmark.
These packets keep OpenAI, Claude, DigitalOcean fixed, router, and mechanism evidence separated by profile so strong results are not blended into broader claims or dropped when the benchmark map changes.
This is the authoritative launch-readiness view. It lists each lane against the scope, comparator, truth, work, cost, p99/floor latency, GUI evidence, and hold-line gates.
This shows whether the selector can react to repeat-backed tail, spike, and dip signals before the next comparable call. It is mechanism proof only until paired live A/B runs prove p99/floor, token, work, cost, and semantic outcomes.
This shows the current promotion shape for turning tail and spike reaction into commercial evidence. It stays a rehearsal until deterministic rows are replaced with live paired A/B provider rows.
This separates harmless comparator-fast floor events from VGC-tail events and shows which signals are predictive enough for tail-watch gating.
VGC is the adaptive control layer. This catalog is only a distribution wrapper: publish starter profiles for common model surfaces, then send customer-specific lanes through the live lab before custom sidecar or SDK work.
This is the bridge from lab proof to deployment: a candidate VGC profile with the control defaults, metering boundary, promotion gates, and resource-allocation read needed before sidecar, SDK, or plugin rollout.
Enter a workload amount and choose one measured lane. The calculator extrapolates the same lane percentage over that volume so savings stay tied to the benchmark result that produced them.
This is the production-readiness guardrail. It shows what VGC can claim now, what is still proxy signal, where measurement may be altering the response, and which gates must pass before a broader public or commercial claim.
This shows the practical ceiling we can currently justify and the tests that would extend it. It keeps today's exact-lane wins visible while showing why stricter ClaimGateV2 production claims are still held.
This gives the fastest visual answer to why the architecture matters on the current run: what a flat raw choice would optimize for, what VGC routes instead, and where the measurable deltas actually show up.
This is the explicit with-VGC versus baseline-core read for each lane on the current run. It shows where VGC is active, where it is only matching a governed lane, and where a lane is still running in neutral-core posture.
This shows the full VGC operating path on the current run: anticipation before generation, governance during generation, mirror/audit after generation, and whether the trajectory/forecast layer is inline or still lab-attached.
Both open-weight and commercial lanes are currently benchmarked as request/response runs. This section keeps the honest boundary in view: host-local runtime control versus remote provider API opacity.
This shows whether the current comparison is an isolated lane or a mixed live packet, and whether VGC is using a packet-aware commercial posture on the focus lane.
This separates what VGC is optimizing for from what happened to bounded fit on the run. Use it to tune for budget, latency, or stability while keeping the fit cage transparent and isolated from the savings claim.
This runs the same selected packet through explicit VGC objectives so we can compare budget, latency, and stability on one surface instead of assuming that token trimming will automatically become latency savings.
This is the clean human-in-the-loop view. It keeps the response text, fit cage, claim posture, and direct runtime numbers together without filtering the underlying row into a prettier story.
This flags where VGC is active but flat, where a provider is saturating the fit proxy, and where controller activity may be outrunning measurable payoff.
The raw models matter, but the product value comes from the VGC layer above them. This section explains what a flat model choice would look like versus the routed VGC decision.
Each row begins as a raw model execution on the same prompt. VGC then evaluates that raw output and turns it into a routing, savings, and deployment decision surface.
| Model | Role | Raw Latency (s) | Visible / Work Tokens | Router Policy | Raw Chars | VGC Action Fires | VGC Action Delta | VGC Substitution Fires | Baseline Substitution | VGC Sub Delta | VGC Fit Preservation | VGC Deployment Use |
|---|
These cards translate the benchmark into model-specific operating choices. They help the person running the demo show why an agentic system should switch models by task instead of forcing every request through one compromise model.
This live lab keeps the full Specialized Stack in view. We are not trading away the architecture to get the demo.
Each strip shows controller fit preservation, structure fit preservation, and lane events over the evaluated token sequence for that model.
This lab is already model-in-the-loop and live. The hosted version runs on the server so the person receiving the demo only needs a browser. The architecture, benchmark backend, and model runtime all stay inside the hosted product surface.
What we gain: real prompt-matched model switching, live benchmark packets, grounded model-role comparisons, and a place for prospects to test their own use cases on the spot.
Current boundary: public demo hosts run open/public lanes and hide private connector controls; the protected live lab can run configured commercial/provider lanes, with authenticated GUI E2E still required before public claim promotion.
This is the direct backend packet for operators and debugging. Some internal field names still use the legacy truth-preservation schema even though the demo surface now presents that metric as fit preservation.
Run the live benchmark to inspect the packet.