Model Routing
Intelligent routing between local LLMs and cloud APIs
Overview
Model Routing intelligently directs inference requests to the optimal model based on task complexity, hardware availability, and cost efficiency. It seamlessly switches between local Ollama models and cloud APIs.
Routing Strategy
Complexity-Based
Local Fallback
Always Available
Cloud Providers
4 Supported
Auto-Scaling
Dynamic
Routing Architecture
Task Analysis
Classify complexity
Hardware Check
VRAM availability
Router Decision
Select optimal model
Model Execution
Local or Cloud
Response
Return result
Complexity Classification
Trivial
Local Only
Simple Q&A
Low
Local Preferred
Basic tasks
Medium
Balanced
Code review
High
Cloud Preferred
Complex analysis
Critical
Cloud Only
Multi-step reasoning
Local LLM Setup (Ollama)
Auto-Managed
Ollama is automatically installed and managed by SingularCore. No manual setup required. Models are downloaded on-demand through the Model Hub interface.
Configure Local Models
Edit config/models.yaml:
local_models:
enabled: true
ollama_url: "http://localhost:11434"
default_model: "llama3.2"
auto_manage: true # Ollama is auto-installed
models:
- name: "llama3.2"
display_name: "Llama 3.2"
size: "2GB"
quant: "Q4_K_M"
context: 128000
- name: "deepseek-r1:14b"
display_name: "DeepSeek R1 14B"
size: "9GB"
quant: "Q4_K_M"
context: 128000Download Models via Model Hub
- Navigate to
/modelsin the dashboard - Go to the Local Models tab
- Search for models like "llama3.2" or "deepseek-r1"
- Click Pull Model to download
- Monitor download progress in real-time
Verify Models
Check available models in the dashboard:
- Open Model Hub at
/models - View Installed Models section
- Verify models show as "Available"
- Test in chat interface
Cloud API Setup
Google Gemini
Get API key from Google AI Studio
Set environment variable:
export GOOGLE_API_KEY="your-api-key-here"Anthropic Claude
Get API key from Anthropic Console
Set environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."OpenRouter
Get API key from OpenRouter
Set environment variable:
export OPENROUTER_API_KEY="sk-or-..."Complete Configuration
Create or edit config/models.yaml:
# Model Routing Configuration
model_routing:
enabled: true
strategy: "complexity_based"
fallback_to_local: true
# Complexity thresholds
thresholds:
trivial: 0.2 # Local only
low: 0.4 # Local preferred
medium: 0.6 # Balanced
high: 0.8 # Cloud preferred
critical: 1.0 # Cloud only
# Provider priority
providers:
local:
priority: 1
enabled: true
google:
priority: 2
enabled: true
models:
- "gemini-2.5-flash"
- "gemini-2.5-pro"
openai:
priority: 3
enabled: true
models:
- "gpt-4o"
- "gpt-4o-mini"
anthropic:
priority: 4
enabled: true
models:
- "claude-3-5-sonnet-20241022"
openrouter:
priority: 5
enabled: true
# Local Ollama Configuration
local_models:
enabled: true
ollama_url: "http://localhost:11434"
default_model: "llama3.2"
timeout: 300
models:
- name: "llama3.2"
display_name: "Llama 3.2"
size: "2GB"
quant: "Q4_K_M"
context: 128000
max_tokens: 8192
- name: "deepseek-r1:14b"
display_name: "DeepSeek R1 14B"
size: "9GB"
quant: "Q4_K_M"
context: 128000
max_tokens: 8192Environment Variables
Create .env file in project root:
# Google Gemini GOOGLE_API_KEY=your-google-api-key # OpenAI OPENAI_API_KEY=sk-your-openai-key # Anthropic ANTHROPIC_API_KEY=sk-ant-your-anthropic-key # OpenRouter OPENROUTER_API_KEY=sk-or-your-openrouter-key # Local Ollama (optional, defaults to localhost:11434) OLLAMA_URL=http://localhost:11434
Security Warning
Never commit API keys to version control. Add .env to your .gitignore file.
Testing Your Setup
Test Local Model
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'Test Cloud API
curl https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEYVerify in Dashboard
- Start SingularCore:
python core/brain/main.py - Open
http://localhost:3000/models - Check that configured models appear
- Test inference in chat interface
Troubleshooting
Ollama Connection Refused
- Ensure SingularCore has auto-started Ollama (check logs)
- Check port 11434 is not blocked
- Verify OLLAMA_URL in config
API Key Invalid
- Verify key is copied correctly (no extra spaces)
- Check key has not expired
- Ensure billing is enabled (for paid tiers)
Out of VRAM
- Use lower quantization (Q4 instead of Q8)
- Switch to cloud API for large models
- Close other GPU applications
Next Steps
Now that your models are configured, explore the Model Hub to download and manage models.
Documentation Menu
Model Routing
Intelligent routing between local LLMs and cloud APIs
Overview
Model Routing intelligently directs inference requests to the optimal model based on task complexity, hardware availability, and cost efficiency. It seamlessly switches between local Ollama models and cloud APIs.
Routing Strategy
Complexity-Based
Local Fallback
Always Available
Cloud Providers
4 Supported
Auto-Scaling
Dynamic
Routing Architecture
Task Analysis
Classify complexity
Hardware Check
VRAM availability
Router Decision
Select optimal model
Model Execution
Local or Cloud
Response
Return result
Complexity Classification
Trivial
Local Only
Simple Q&A
Low
Local Preferred
Basic tasks
Medium
Balanced
Code review
High
Cloud Preferred
Complex analysis
Critical
Cloud Only
Multi-step reasoning
Local LLM Setup (Ollama)
Auto-Managed
Ollama is automatically installed and managed by SingularCore. No manual setup required. Models are downloaded on-demand through the Model Hub interface.
Configure Local Models
Edit config/models.yaml:
local_models:
enabled: true
ollama_url: "http://localhost:11434"
default_model: "llama3.2"
auto_manage: true # Ollama is auto-installed
models:
- name: "llama3.2"
display_name: "Llama 3.2"
size: "2GB"
quant: "Q4_K_M"
context: 128000
- name: "deepseek-r1:14b"
display_name: "DeepSeek R1 14B"
size: "9GB"
quant: "Q4_K_M"
context: 128000Download Models via Model Hub
- Navigate to
/modelsin the dashboard - Go to the Local Models tab
- Search for models like "llama3.2" or "deepseek-r1"
- Click Pull Model to download
- Monitor download progress in real-time
Verify Models
Check available models in the dashboard:
- Open Model Hub at
/models - View Installed Models section
- Verify models show as "Available"
- Test in chat interface
Cloud API Setup
Google Gemini
Get API key from Google AI Studio
Set environment variable:
export GOOGLE_API_KEY="your-api-key-here"Anthropic Claude
Get API key from Anthropic Console
Set environment variable:
export ANTHROPIC_API_KEY="sk-ant-..."OpenRouter
Get API key from OpenRouter
Set environment variable:
export OPENROUTER_API_KEY="sk-or-..."Complete Configuration
Create or edit config/models.yaml:
# Model Routing Configuration
model_routing:
enabled: true
strategy: "complexity_based"
fallback_to_local: true
# Complexity thresholds
thresholds:
trivial: 0.2 # Local only
low: 0.4 # Local preferred
medium: 0.6 # Balanced
high: 0.8 # Cloud preferred
critical: 1.0 # Cloud only
# Provider priority
providers:
local:
priority: 1
enabled: true
google:
priority: 2
enabled: true
models:
- "gemini-2.5-flash"
- "gemini-2.5-pro"
openai:
priority: 3
enabled: true
models:
- "gpt-4o"
- "gpt-4o-mini"
anthropic:
priority: 4
enabled: true
models:
- "claude-3-5-sonnet-20241022"
openrouter:
priority: 5
enabled: true
# Local Ollama Configuration
local_models:
enabled: true
ollama_url: "http://localhost:11434"
default_model: "llama3.2"
timeout: 300
models:
- name: "llama3.2"
display_name: "Llama 3.2"
size: "2GB"
quant: "Q4_K_M"
context: 128000
max_tokens: 8192
- name: "deepseek-r1:14b"
display_name: "DeepSeek R1 14B"
size: "9GB"
quant: "Q4_K_M"
context: 128000
max_tokens: 8192Environment Variables
Create .env file in project root:
# Google Gemini GOOGLE_API_KEY=your-google-api-key # OpenAI OPENAI_API_KEY=sk-your-openai-key # Anthropic ANTHROPIC_API_KEY=sk-ant-your-anthropic-key # OpenRouter OPENROUTER_API_KEY=sk-or-your-openrouter-key # Local Ollama (optional, defaults to localhost:11434) OLLAMA_URL=http://localhost:11434
Security Warning
Never commit API keys to version control. Add .env to your .gitignore file.
Testing Your Setup
Test Local Model
curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'Test Cloud API
curl https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEYVerify in Dashboard
- Start SingularCore:
python core/brain/main.py - Open
http://localhost:3000/models - Check that configured models appear
- Test inference in chat interface
Troubleshooting
Ollama Connection Refused
- Ensure SingularCore has auto-started Ollama (check logs)
- Check port 11434 is not blocked
- Verify OLLAMA_URL in config
API Key Invalid
- Verify key is copied correctly (no extra spaces)
- Check key has not expired
- Ensure billing is enabled (for paid tiers)
Out of VRAM
- Use lower quantization (Q4 instead of Q8)
- Switch to cloud API for large models
- Close other GPU applications
Next Steps
Now that your models are configured, explore the Model Hub to download and manage models.