Inference System

Oxyde’s inference engine transforms internal agent state into high-quality LLM prompts — and routes them intelligently across multiple providers. This is the heart of Oxyde’s autonomy.

🔧 Prompt Construction

Every prompt is built dynamically based on:

  • 💬 Current input (e.g. player message)

  • 🧠 Recalled memories (ranked by importance)

  • 😶‍🌫️ Emotional state (6D vector)

  • 🎯 Active goals

  • 🧍 Agent name/persona

Example (simplified):

Agent: Velma
Emotional state: curious, calm
Top memory: "Marcus gave me the map"
Current goal: explore ruins

Prompt: You are Velma, a curious NPC currently exploring ruins. Remember that Marcus gave you the map. The player just asked: “What’s down that tunnel?”

🔀 Multi-LLM Routing

Oxyde supports dynamic routing to:

Provider
Status

OpenAI

✅ Supported

Groq

✅ Supported

Anthropic

✅ Supported

xAI

✅ Supported

Local (GGUF)

✅ Supported

Routing logic lives in llm_service.rs. It selects a provider based on:

  • ⚡ Latency benchmarks

  • 💸 Cost settings

  • 🎯 Prompt type or agent profile

  • 🛠️ API key availability


🧩 Plug-and-Play Model Interface

To add a new model:

  1. Implement the LLMProvider trait:

trait LLMProvider {
    fn generate(&self, prompt: &str) -> String;
}
  1. Register it in the router dispatch map

Oxyde does not hardcode OpenAI logic — you can drop in anything.


🔁 Async Inference Pipeline

  • Batched, cancellable, and thread-safe

  • Uses Rust tokio + reqwest

  • Futureproofed for streaming output and live voice


⚙️ Configuring Inference

Set routing preferences in config.json:

"llm_profile": {
  "preferred_provider": "groq",
  "fallbacks": ["openai", "local"]
}

📊 Benchmarks (placeholder)

Provider
Latency (avg)
Token cost ($/1k)

Groq

35ms

$0.003

OpenAI

250ms

$0.006

Local

~100ms

free

(Benchmarks depend on prompt length and model. Replace with real data if needed.)


  • 4. Configuring Agents → llm_profile

  • 8. API Reference → llm_service.rs