candela-local
candela-local is a lightweight binary that runs on a developer’s machine. It provides:
- Unified model discovery — one endpoint for local and cloud models
- Smart routing — automatically sends requests to the right backend
- Runtime management — start/stop Ollama, pull models, manage state
- Local observability — capture every LLM call to SQLite with zero cloud dependencies
Operating Modes
Section titled “Operating Modes”🏠 Solo Mode
Section titled “🏠 Solo Mode”For: Individual developers who want to run local models with full observability and zero cloud dependencies.
# ~/.candela.yaml — Solo Modeport: 8181lm_studio_port: 1234runtime_backend: ollamaWhat you get:
- Local models via Ollama/vLLM/LM Studio on
:1234 - Embedded observability — every call traced to
~/.candela/traces.db - Management UI at
http://localhost:8181/_local/ - Model pulling, health monitoring, backend discovery
- No cloud account, no authentication, no remote server needed
☁️ Solo + Cloud Mode
Section titled “☁️ Solo + Cloud Mode”For: Individual developers who want local and cloud models (Gemini, Claude) without deploying a Candela server. Uses Google ADC — the same identity you already have.
# ~/.candela.yaml — Solo + Cloudruntime_backend: ollama
providers: - name: google models: [gemini-2.5-pro, gemini-2.0-flash] - name: anthropic models: [claude-sonnet-4-20250514, claude-3-haiku]
vertex_ai: project: my-gcp-project region: us-central1Prerequisites:
gcloud auth application-default loginWhat you get: Everything from Solo Mode, plus cloud models merged into /v1/models, smart routing (local stays local, cloud routes to Vertex AI), and all calls traced to SQLite.
Architecture:
JetBrains / Cline / curl │ ▼ LM Compat (:1234) /v1/models → local + cloud models /v1/chat/completions │ ├── local model ──▶ Ollama / vLLM │ │ │ spanCapture │ │ └── cloud model ──▶ pkg/proxy ──▶ Vertex AI (Google ADC) │ ▼ SpanProcessor → SQLite (traces.db)🌐 Team Mode
Section titled “🌐 Team Mode”For: Teams that need budgeting, governance, and RBAC via a shared Candela cloud backend.
# ~/.candela.yaml — Team Modeport: 8181lm_studio_port: 1234runtime_backend: ollama
remote: https://candela-xxx.a.run.appaudience: "12345678.apps.googleusercontent.com"What you get: Everything from Solo Mode, plus cloud models routed through the Candela server with automatic OIDC auth injection via ADC, team-wide cost tracking, and budget enforcement.
Installation
Section titled “Installation”go install github.com/candelahq/candela/cmd/candela-local@latestgit clone https://github.com/candelahq/candela.gitcd candelanix developgo run ./cmd/candela-localFull Config Reference
Section titled “Full Config Reference”# ── Required ──runtime_backend: ollama # ollama | vllm | lmstudio
# ── Optional: Network ──port: 8181 # main proxy port (default: 8181)lm_studio_port: 1234 # LM compat listener (default: 1234)
# ── Optional: Direct Cloud (Solo + Cloud) ──providers: # omit for local-only solo mode - name: google models: [gemini-2.5-pro] - name: anthropic models: [claude-sonnet-4-20250514]
vertex_ai: project: my-gcp-project # required when providers is set region: us-central1 # default: us-central1
# ── Optional: Team Mode (omit for Solo) ──remote: https://candela-xxx.run.app # Candela server URLaudience: "12345678.apps..." # IAP audience for OIDC auth
# ── Optional: Advanced ──local_upstream: http://localhost:11434 # explicit local runtime URLstate_db_path: ~/.candela/state.db # runtime state persistenceSmart Routing
Section titled “Smart Routing”| Request model | Mode | Where it runs |
|---|---|---|
llama3.2:3b | Any | Local (Ollama) — always preferred |
gemini-2.5-pro | Solo + Cloud | Vertex AI (direct, via ADC) |
claude-sonnet-4-20250514 | Solo + Cloud | Vertex AI Anthropic |
gpt-4o | Team | Cloud (via Candela server) |
Management UI
Section titled “Management UI”Access at http://localhost:8181/_local/:
| Card | Description |
|---|---|
| Health | Runtime status, start/stop controls, uptime |
| Models | Loaded models with size, family, quantization |
| Pull Model | Download new models with progress tracking |
| Traces | Recent LLM calls with tokens, cost, duration |
| Backends | Auto-detected runtimes with install hints |
| Settings | State DB path, reset |
IDE Integration
Section titled “IDE Integration”- Settings → AI Assistant → Enable “LM Studio”
- URL is pre-configured to
http://localhost:1234— just works! - Select any model from the dropdown (local + cloud)
{ "models": [{ "title": "Candela Local", "provider": "openai", "apiBase": "http://localhost:1234/v1", "model": "llama3.2:3b" }]}curl http://localhost:1234/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gemini-2.5-pro", "messages": [{"role": "user", "content": "Hello!"}] }'Troubleshooting
Section titled “Troubleshooting”| Symptom | Cause | Fix |
|---|---|---|
| ”model not found locally and no remote server configured” | Solo Mode + unknown model | Add providers for cloud models |
| ”vertex_ai.project is required” | providers set but no project | Add vertex_ai.project to config |
| ”failed to get Google ADC” | ADC not configured | Run gcloud auth application-default login |
| ”audience is required when remote is set” | Missing audience | Add IAP audience to config |
| Traces card shows “Traces not available” | Team Mode | Expected — check cloud dashboard |
No models in /v1/models | Runtime not started | Start Ollama: ollama serve |