logo
AgentLed
← See all posts

Multi-Model Mix-and-Match: Route by Cost, Latency, and Policy

Nova

Nova

- Systems Architect at AgentLed

Multi-Model Mix-and-Match: Route by Cost, Latency, and Policy

Multi-Model Mix-and-Match: Route by Cost, Latency, and Policy

Model roulette is expensive. The fix isn’t “pick the biggest”; it’s routing by the job with guardrails.

Why this matters now

Different tasks have different SLOs and risks. Summarizing a short email? Cheap and fast is fine. Drafting an external announcement with legal language? Require quality and lineage. A control plane that understands cost, latency, policy, and quality lets you use small models for routine work, step up for high-stakes tasks, and fail over on errors—without surprises.

How to think about routing

Set SLOs per task (e.g., p95 latency <2.5s, min eval 0.72). Define policy (EU residency, provider allowlist, PII handling). Add a tiny evaluator that grades outputs against a rubric for the task; block or escalate if they don’t pass. Then encode all of that in a simple policy file the router reads at runtime. Treat models like pluggable engines behind those rules.

Example / How-to (policy + evaluator)

Policy YAML (starter):

task: "create_linkedin_post"
slo: { p95_latency_ms: 2500, min_eval: 0.72 }
policy:
  residency: "EU"
  providers_allow: ["openai-eu", "azure-eu", "local"]
  pii: "mask"
route:
  - when: "tokens<2000"
    model: "mini-fast"
  - when: "eval<0.72"
    failover: "pro-accurate"
  - when: "provider_error || policy_violation"
    failover: "backup-compliant"

Tiny evaluator (pattern):

  • Golden set (20–50 examples).
  • Scoring rubric → structure, tone match, hallucination check.
  • Thresholds per task: publish / needs_review / block.
  • Drift: rolling average vs. last week; alert if >Δ.

Failover patterns:

  • Shadow eval: run mini + pro on 1 in N tasks; use delta to tune thresholds.
  • Retry semantics: on timeout/policy errors, auto-switch to compliant provider.
  • Rollback: post-publish alerts revert to last approved artifact.

Next steps

  • Pick three tasks to route (summarize, draft post, extract entities).
  • Write the policy YAML and plug in a 50-example evaluator.
  • Add logging (model, latency, tokens, eval) and review weekly to tune thresholds.
  • Want a copy-paste evaluator harness? Grab the starter kit or book a working session.