Documentation Menu
Back to Docs

Model Routing

Intelligent routing between local LLMs and cloud APIs

Overview

Model Routing intelligently directs inference requests to the optimal model based on task complexity, hardware availability, and cost efficiency. It seamlessly switches between local Ollama models and cloud APIs.

Routing Strategy

Complexity-Based

Local Fallback

Always Available

Cloud Providers

4 Supported

Auto-Scaling

Dynamic

Routing Architecture

1

Task Analysis

Classify complexity

2

Hardware Check

VRAM availability

Router Decision

Select optimal model

3

Model Execution

Local or Cloud

4

Response

Return result

Complexity Classification

Trivial

Local Only

Simple Q&A

Low

Local Preferred

Basic tasks

Medium

Balanced

Code review

High

Cloud Preferred

Complex analysis

Critical

Cloud Only

Multi-step reasoning

Local LLM Setup (Ollama)

Auto-Managed

Ollama is automatically installed and managed by SingularCore. No manual setup required. Models are downloaded on-demand through the Model Hub interface.

1

Configure Local Models

Edit config/models.yaml:

local_models:
  enabled: true
  ollama_url: "http://localhost:11434"
  default_model: "llama3.2"
  auto_manage: true  # Ollama is auto-installed
  
  models:
    - name: "llama3.2"
      display_name: "Llama 3.2"
      size: "2GB"
      quant: "Q4_K_M"
      context: 128000
      
    - name: "deepseek-r1:14b"
      display_name: "DeepSeek R1 14B"
      size: "9GB"
      quant: "Q4_K_M"
      context: 128000
2

Download Models via Model Hub

  1. Navigate to /models in the dashboard
  2. Go to the Local Models tab
  3. Search for models like "llama3.2" or "deepseek-r1"
  4. Click Pull Model to download
  5. Monitor download progress in real-time
3

Verify Models

Check available models in the dashboard:

  1. Open Model Hub at /models
  2. View Installed Models section
  3. Verify models show as "Available"
  4. Test in chat interface

Cloud API Setup

G

Google Gemini

1

Get API key from Google AI Studio

2

Set environment variable:

export GOOGLE_API_KEY="your-api-key-here"
O

OpenAI

1

Get API key from OpenAI Dashboard

2

Set environment variable:

export OPENAI_API_KEY="sk-..."
A

Anthropic Claude

1

Get API key from Anthropic Console

2

Set environment variable:

export ANTHROPIC_API_KEY="sk-ant-..."
R

OpenRouter

1

Get API key from OpenRouter

2

Set environment variable:

export OPENROUTER_API_KEY="sk-or-..."

Complete Configuration

Create or edit config/models.yaml:

# Model Routing Configuration
model_routing:
  enabled: true
  strategy: "complexity_based"
  fallback_to_local: true
  
  # Complexity thresholds
  thresholds:
    trivial: 0.2    # Local only
    low: 0.4        # Local preferred
    medium: 0.6     # Balanced
    high: 0.8       # Cloud preferred
    critical: 1.0   # Cloud only
  
  # Provider priority
  providers:
    local:
      priority: 1
      enabled: true
    google:
      priority: 2
      enabled: true
      models:
        - "gemini-2.5-flash"
        - "gemini-2.5-pro"
    openai:
      priority: 3
      enabled: true
      models:
        - "gpt-4o"
        - "gpt-4o-mini"
    anthropic:
      priority: 4
      enabled: true
      models:
        - "claude-3-5-sonnet-20241022"
    openrouter:
      priority: 5
      enabled: true

# Local Ollama Configuration
local_models:
  enabled: true
  ollama_url: "http://localhost:11434"
  default_model: "llama3.2"
  timeout: 300
  
  models:
    - name: "llama3.2"
      display_name: "Llama 3.2"
      size: "2GB"
      quant: "Q4_K_M"
      context: 128000
      max_tokens: 8192
      
    - name: "deepseek-r1:14b"
      display_name: "DeepSeek R1 14B"
      size: "9GB"
      quant: "Q4_K_M"
      context: 128000
      max_tokens: 8192

Environment Variables

Create .env file in project root:

# Google Gemini
GOOGLE_API_KEY=your-google-api-key

# OpenAI
OPENAI_API_KEY=sk-your-openai-key

# Anthropic
ANTHROPIC_API_KEY=sk-ant-your-anthropic-key

# OpenRouter
OPENROUTER_API_KEY=sk-or-your-openrouter-key

# Local Ollama (optional, defaults to localhost:11434)
OLLAMA_URL=http://localhost:11434

Security Warning

Never commit API keys to version control. Add .env to your .gitignore file.

Testing Your Setup

Test Local Model

curl http://localhost:11434/api/generate -d '{"model":"llama3.2","prompt":"Hello"}'

Test Cloud API

curl https://generativelanguage.googleapis.com/v1beta/models?key=$GOOGLE_API_KEY

Verify in Dashboard

  1. Start SingularCore: python core/brain/main.py
  2. Open http://localhost:3000/models
  3. Check that configured models appear
  4. Test inference in chat interface

Troubleshooting

Ollama Connection Refused

  • Ensure SingularCore has auto-started Ollama (check logs)
  • Check port 11434 is not blocked
  • Verify OLLAMA_URL in config

API Key Invalid

  • Verify key is copied correctly (no extra spaces)
  • Check key has not expired
  • Ensure billing is enabled (for paid tiers)

Out of VRAM

  • Use lower quantization (Q4 instead of Q8)
  • Switch to cloud API for large models
  • Close other GPU applications

Next Steps

Now that your models are configured, explore the Model Hub to download and manage models.