AI Model Routing Platform

A routing layer that chooses the right AI model based on task type, cost, latency, user plan, and required output.

In ProgressHardOpenRouterModel RoutingUsage MeteringAI SaaSPrompt Precheck

Overview

A model gateway that turns a user's prompt into a policy decision: which assistant family should handle it, which model should be called, what budget applies, and how usage should be shown back to the user.

Problem

AI products often start as one chat box backed by one model. That breaks down when tasks vary by cost, latency, quality, modality, safety category, and user plan.

Core users

AI app builders who need predictable cost control
Premium users expecting high-quality assistant behavior
Admins configuring model access and plan limits
Support engineers investigating usage and provider failures

MVP scope

Prompt intake and prompt precheck
Assistant family mapping for task/product structure
Model policy engine with plan-aware routing
Provider adapter for OpenRouter or direct LLM APIs
Usage and cost tracker
Fallback when a provider fails
Admin model activation and plan request flow

Non-goals

Do not build custom model training in MVP
Do not expose raw provider cost directly as the product price
Do not route every request through an expensive classifier if simple rules work
Do not promise perfect task classification

Core system components

Prompt precheck and category filter
Assistant family mapping
Model policy engine
Provider adapters
Usage metering
Cost display multiplier
Plan limits and entitlements
Admin overrides and activation workflow
Safety/category filters

Suggested architecture

Flow: prompt intake -> prompt classifier/precheck -> assistant recommendation -> model policy engine -> provider adapter -> usage/cost tracker -> response stream.
Frontend: assistant picker, plan state, precheck feedback, streaming response UI, and usage display.
Backend: model gateway API that owns routing policies, provider calls, retries, usage records, and cost calculations.
Database: assistants, models, providers, plans, entitlements, prompt checks, model calls, usage records, costs, and conversations.
Workers: optional async jobs for analytics, usage aggregation, and failed-call reconciliation.
External APIs: OpenRouter or direct model providers.

Data model

Assistant: id, name, family, defaultTaskTypes, enabled
Model: id, providerId, name, contextWindow, supportsImages, inputCost, outputCost, latencyTier
ModelProvider: id, name, apiBaseUrl, status
Plan: id, name, monthlyCredits, allowedModels, maxContext
UserEntitlement: id, userId, planId, overrides
PromptCheck: id, conversationId, category, risk, recommendedAssistant, createdAt
ModelCall: id, userId, conversationId, provider, model, status, latencyMs, tokens
UsageRecord: id, userId, type, quantity, displayUnits, createdAt
CostRecord: id, modelCallId, rawCost, displayCost, multiplier
Conversation: id, userId, assistantId, title, createdAt

API design

POST /api/ai/precheck - classify prompt and recommend assistant
POST /api/ai/chat - route and stream a model response
GET /api/ai/models - list available models for the user's plan
POST /api/admin/models/:id/activate - enable or disable a model
GET /api/usage - show user-facing usage
GET /api/admin/model-calls - inspect raw model calls and costs
POST /api/plan-requests - request access to a higher plan or model family

Key technical challenges

Cost control without making the product feel cheap
Matching task type to model capability
Fallbacks when providers fail or stream errors occur
Streaming output while usage is still being counted
Separating raw backend cost from user-facing usage
Abuse prevention and prompt category filtering
Keeping routing decisions explainable enough for debugging

Tradeoffs

Start with rule-based routing and assistant families before adding learned routing.
Keep raw cost internal and expose user-facing usage units to avoid confusing users.
Use a precheck only when it improves the UX or policy decision.
Centralize provider calls so usage and failures are observable in one place.

Security considerations

Rate-limit chat, precheck, and image requests.
Validate user entitlement before selecting premium models.
Store provider keys server-side only.
Log enough for debugging while avoiding sensitive prompt exposure in admin views.
Apply safety/category filters before model calls.
Add abuse detection for repeated high-cost or policy-breaking prompts.

Scaling path

Start with one provider adapter and rule-based policy.
Add multiple providers and fallback chains.
Add per-plan concurrency, budget caps, and model health checks.
Add usage aggregation and admin analytics.
Add learned routing from historical quality/cost outcomes.

Observability

Metrics for model latency, error rate, fallback rate, token volume, and raw cost.
Trace each chat request through precheck, route decision, provider call, and usage record.
Admin dashboards for provider health and expensive users.
Audit logs for admin model activation and plan overrides.

Future features

Learned routing
Provider health scoring
Per-assistant memory and retrieval
Model quality feedback loop
Budget simulation before model calls
Admin model marketplace