Skip to content

Operations Runbook

Day-to-day operations guide for running Candela in production on Google Cloud.

Terminal window
# Local
curl http://localhost:8181/healthz
# Production (requires auth)
curl -H "Authorization: Bearer $(gcloud auth print-identity-token)" \
https://candela-xxx.a.run.app/healthz

Response:

{"status": "ok"}
{"status": "error", "detail": "..."}

MetricSourceAlert Threshold
Request latency (p99)Cloud Run metrics> 5s
Error rate (5xx)Cloud Run metrics> 5%
Container startup timeCloud Run metrics> 30s
BigQuery write errorsApplication logsAny
Auth failures"all auth strategies failed"> 10/min
Circuit breaker trips"circuit breaker tripped"Any
Budget thresholds"🔔 budget alert"At 80%, 90%, 100%
Span buffer full"span processor buffer full"Any
Tetragon audit stream"tetragon audit stream"Disconnected
gRPC audit sink errors"audit sink write failed"Any
Terminal window
# Budget threshold alert
gcloud logging metrics create candela-budget-alert \
--description="Candela budget threshold reached" \
--log-filter='resource.type="cloud_run_revision"
AND textPayload=~"budget alert"'
# Circuit breaker alert
gcloud logging metrics create candela-circuit-breaker \
--description="Candela circuit breaker tripped" \
--log-filter='resource.type="cloud_run_revision"
AND textPayload=~"circuit breaker tripped"'

Candela uses slog with JSON output:

FieldDescription
providerLLM provider name
modelModel name
tokensTotal token count
cost_usdCalculated cost
latencyRequest duration
user_idAuthenticated user
request_idUnique request ID

Terminal window
PROJECT=your-gcp-project
REGION=us-central1
# Build and push
gcloud builds submit --project $PROJECT -f deploy/cloudbuild.yaml .
# Deploy
gcloud run services update candela \
--project $PROJECT --region $REGION \
--image $REGION-docker.pkg.dev/$PROJECT/candela/candela-server:latest
Terminal window
# List revisions
gcloud run revisions list --project $PROJECT --region $REGION --service candela
# Route 100% traffic to a previous revision
gcloud run services update-traffic candela \
--project $PROJECT --region $REGION \
--to-revisions=candela-00042-abc=100

-- Total cost by user, last 7 days
SELECT
user_id,
SUM(gen_ai_cost_usd) as total_cost,
COUNT(*) as call_count,
SUM(gen_ai_total_tokens) as total_tokens
FROM `candela.spans`
WHERE start_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY user_id
ORDER BY total_cost DESC
OptimizationImpactStatus
Time partitioning (start_time, DAY)~70% scan cost reduction✅ Configured
Clustering (project_id, trace_id)~50% for filtered queries✅ Configured
Partition expirationStorage savingsSet in Terraform
BI Engine reservationSub-second dashboardsEnable in BQ console

  1. Check Cloud Run logs: gcloud run logs read --project $PROJECT --service candela
  2. Common causes:
    • Missing env vars → check entrypoint.sh substitution
    • Firestore connection failed → check project ID and IAM
    • BigQuery auth failed → check service account roles
  1. Filter logs by provider to identify slow upstream
  2. Check circuit breaker state in logs
  3. Check BigQuery slot usage (if using BQ as reader)
  4. Check Cloud Run instance count (may need min-instances > 0)
  1. Check Firestore budgets/{userId} document
  2. Verify period_start is in the current period
  3. Inspect grants/ subcollection for grant absorption
  4. Search logs for "failed to deduct spend"
  1. Check upstream provider status (OpenAI, Vertex AI, Anthropic)
  2. Look for "circuit breaker tripped" logs
  3. Check ADC token refresh: "failed to get ADC token"
  4. Verify vertex_ai.project_id and region in config
  1. Verify Tetragon is running: kubectl get pods -n kube-system -l app.kubernetes.io/name=tetragon
  2. Check gRPC audit stream connection: search logs for "tetragon audit stream"
  3. Inspect MultiSink routing: each audit event should fan out to all configured sinks
  4. If events are missing, check CloseSend() / graceful shutdown logs for premature stream termination
  5. Verify TracingPolicy is applied: kubectl get tracingpolicies

Updating Model Pricing & Adding New Models

Section titled “Updating Model Pricing & Adding New Models”

When you want to add new models (like Gemini 3.5 Flash) or update built-in model pricing, you have two options:

Option A: Update Code Defaults (Requires Build & Redeploy)

Section titled “Option A: Update Code Defaults (Requires Build & Redeploy)”

This is the recommended approach for adding new models long-term so that the proxy ships with correct built-in default rates.

  1. Modify Defaults: Update the list of models in pkg/costcalc/calculator.go within loadDefaults().
  2. Write Tests: Add test cases checking the pricing calculation logic in pkg/costcalc/calculator_test.go.
  3. Run Tests: Verify correctness locally:
    Terminal window
    go test ./pkg/costcalc -v
  4. Build and Redeploy: Run the build pipeline and redeploy to Google Cloud Run:
    Terminal window
    # Build the container image
    gcloud builds submit --project $PROJECT -f deploy/cloudbuild.yaml .
    # Redeploy the Cloud Run service to apply the update
    gcloud run services update candela \
    --project $PROJECT --region $REGION \
    --image $REGION-docker.pkg.dev/$PROJECT/candela/candela-server:latest

Option B: Configure Runtime Overrides (No Code Changes Required)

Section titled “Option B: Configure Runtime Overrides (No Code Changes Required)”

You can override model pricing or add temporary support for a new model without rebuilding/redeploying code by modifying your active configuration:

  1. Config File (config.yaml): Add per-model overrides under the pricing.models block:

    pricing:
    models:
    - provider: google
    model: gemini-3.5-flash
    input_per_million: 0.40 # Negociated rate (List: $0.50)
    output_per_million: 2.40 # Negociated rate (List: $3.00)

    Note: If you update config.yaml for a deployed service, redeploy or restart the Cloud Run service to load the new config.

  2. Runtime Configuration Endpoint: You can dynamically update configuration parameters and pricing overrides instantly without service restarts:

    Terminal window
    curl -X POST http://localhost:8181/_local/api/config \
    -H "Content-Type: application/json" \
    -d '{"pricing": {"models": [{"provider": "google", "model": "gemini-3.5-flash", "input_per_million": 0.40, "output_per_million": 2.40}]}}'

All backends auto-provision their schema on startup:

BackendStrategyNotes
DuckDBAuto CREATE TABLENo manual migrations
SQLiteAuto CREATE TABLENo manual migrations
BigQueryAuto schema updateColumn additions are backward-compatible
FirestoreSchema-lessField additions are backward-compatible