The Problem
Running every user query through high-end models (like GPT-4) is prohibitively expensive and slow. A simple “Hello” or “What is 2+2?” does not require a $20/month model.
The Solution
I built Project Omnitrix, a Intent-Aware AI Gateway (smart middleware) that dynamically routes traffic to the “Right Model for the Job.”
Technology: Golang, Gin, Aho-Corasick Algorithm, Ollama, Groq.
- Reflex Layer (Zero Latency): Uses the Aho-Corasick algorithm to instantly resolve static intents (greetings, blocked words) with 0ms latency, bypassing LLMs entirely.
- The Brain (Classifier): Uses a lightweight model (Phi-3 Mini) to classify complex prompts into intent buckets (Coding, Creative, Math etc.).
- Business Engine: The Resolver Pattern automatically switches between providers based on the logic matrix. It decouples the intent from the execution.
- Tiered Quality of Service:
- Free Tier: Routes to efficient local models (e.g., Gemma-2B, Phi-3).
- Premium Tier: Routes to state-of-the-art cloud models (e.g., Llama-3-70B on Groq) for superior performance.
Key Features
- Cost Optimization: Drastically reduces API bills by offloading simple queries to smaller models.
- Resiliency: Implemented the Circuit Breaker pattern to automatically fallback to robust models if the primary provider fails.
- Tech Stack: Golang, Gin, Aho-Corasick Algorithm, Ollama, Groq Cloud