Quick Start
-
Start candela-local
Terminal window candela-localThis starts two listeners:
:8181— Management UI + proxy:1234— LM-compatible endpoint (works with JetBrains, Cline, Continue)
-
Send a request through Candela
Route LLM requests through candela-local instead of calling providers directly:
Terminal window curl http://localhost:1234/v1/chat/completions \-H "Content-Type: application/json" \-d '{"model": "llama3.2:3b","messages": [{"role": "user", "content": "Hello, Candela!"}]}'from openai import OpenAIclient = OpenAI(base_url="http://localhost:1234/v1",api_key="candela", # placeholder — candela-local handles auth)response = client.chat.completions.create(model="llama3.2:3b",messages=[{"role": "user", "content": "Hello, Candela!"}],)print(response.choices[0].message.content)from google.adk.agents import Agentfrom google.adk.models import Geminiagent = Agent(model=Gemini(model="gemini-2.0-flash",base_url="http://localhost:8181/proxy/google",),name="my_agent",instruction="You are a helpful assistant.",)client := openai.NewClient(option.WithBaseURL("http://localhost:1234/v1"),option.WithAPIKey("candela"),) -
View the trace
Open http://localhost:8181/_local/ and check the Traces card:
- ⏱️ Latency — End-to-end request duration and TTFB
- 📊 Token usage — Input and output token counts
- 💰 Cost estimate — Based on provider pricing
- 🔗 Trace context — W3C Trace Context propagation
-
Discover models
Terminal window curl http://localhost:1234/v1/models | jq '.data[].id'Returns all available models — local (Ollama) and cloud (Gemini, Claude) merged into one list.