Skip to content

Quick Start

  1. Start candela-local

    Terminal window
    candela-local

    This starts two listeners:

    • :8181 — Management UI + proxy
    • :1234 — LM-compatible endpoint (works with JetBrains, Cline, Continue)
  2. Send a request through Candela

    Route LLM requests through candela-local instead of calling providers directly:

    Terminal window
    curl http://localhost:1234/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{
    "model": "llama3.2:3b",
    "messages": [{"role": "user", "content": "Hello, Candela!"}]
    }'
  3. View the trace

    Open http://localhost:8181/_local/ and check the Traces card:

    • ⏱️ Latency — End-to-end request duration and TTFB
    • 📊 Token usage — Input and output token counts
    • 💰 Cost estimate — Based on provider pricing
    • 🔗 Trace context — W3C Trace Context propagation
  4. Discover models

    Terminal window
    curl http://localhost:1234/v1/models | jq '.data[].id'

    Returns all available models — local (Ollama) and cloud (Gemini, Claude) merged into one list.