Skip to content

Budget Enforcement

Budget enforcement is Candela’s most mature governance capability. Every LLM request passes through a real-time budget gate that checks, deducts, and alerts — ensuring no user or automation exceeds their approved spending limits. This is not a dashboard metric; it’s an active enforcement control that blocks requests at the proxy layer.

Every LLM request passes through a two-phase budget check:

Request arrives
┌─────────────┐ Budget exhausted?
│ Pre-flight │────────────────────────▶ HTTP 402
│ Budget Gate │ "budget exhausted"
└──────┬──────┘
│ ✅ Allowed
┌─────────────┐
│ LLM Call │ tokens counted,
│ (proxied) │ cost calculated
└──────┬──────┘
┌─────────────┐ Threshold crossed?
│ Deduct & │────────────────────────▶ 🔔 Alert
│ Notify │ (80%, 90%, 100%)
└─────────────┘

Key design decisions:

  • Pre-flight check is soft — it checks if any budget remains but doesn’t estimate cost (impossible before the call)
  • Deduction is synchronous — prevents billing bypass on crash
  • Grants are spent first (waterfall: earliest-expiring grant → daily budget)
  • Service accounts skip budget checks — they have no budget entries

Set a default daily budget for all new users in your server config:

# config.yaml (candela-server)
users:
default_daily_budget_usd: 5.00 # applied to auto-provisioned users

Admins can manage budgets via the ConnectRPC UserService:

Terminal window
# Set a $10/day budget for a user
buf curl --protocol connect \
https://candela.example.com/candela.v1.UserService/SetBudget \
-d '{
"user_id": "alice@example.com",
"limit_usd": 10.0
}'

Developers can check their own budget:

Terminal window
# Authenticated as the current user
buf curl --protocol connect \
https://candela.example.com/candela.v1.UserService/GetMyBudget

Returns remaining budget across grants + daily allocation.


Grants are one-time budget bonuses with expiration dates. They’re consumed before the daily budget (waterfall order: earliest-expiring grant first).

Terminal window
buf curl --protocol connect \
https://candela.example.com/candela.v1.UserService/CreateGrant \
-d '{
"user_id": "alice@example.com",
"amount_usd": 50.0,
"reason": "hackathon sprint",
"starts_at": "2026-05-04T00:00:00Z",
"expires_at": "2026-05-11T00:00:00Z"
}'

When a $0.50 LLM call completes:

  1. Check active grants (sorted by expires_at ascending)
  2. Deduct from earliest-expiring grant until spent or grant exhausted
  3. Remaining cost → daily budget
  4. All updates are transactional
Cost: $0.50
├── Grant A ($0.30 remaining, expires May 5) → deduct $0.30
├── Grant B ($2.00 remaining, expires May 10) → deduct $0.20
└── Daily budget → $0.00 (grants covered it)

Alerts fire when a user’s daily spend crosses configurable thresholds. Defaults:

ThresholdWhen
80%Warning — approaching limit
90%Critical — nearly exhausted
100%Blocked — budget fully spent
ChannelStatusHow
Structured logs✅ Built-inCloud Logging → alert policy
Slack🔜 PlannedWebhook integration
Microsoft Teams🔜 PlannedWebhook integration

Alerts are deduplicated — each threshold fires at most once per period per user.

The log-based notifier emits structured warnings that can trigger GCP alert policies:

jsonPayload.message = "🔔 budget alert: 90% threshold reached"

Per-user rate limiting prevents runaway automation from draining budgets:

SettingDefaultScope
rate_limit60 req/minPer user

Rate limits use minute-window counters with a 2-minute TTL.

Terminal window
buf curl --protocol connect \
https://candela.example.com/candela.v1.UserService/UpdateUser \
-d '{
"id": "alice@example.com",
"rate_limit": 120
}'

Every admin action is logged to an immutable audit collection:

ActionLogged
create_user
set_budget
reset_spend
create_grant
revoke_grant
deactivate_user
reactivate_user
delete_user✅ (global collection — survives deletion)
Terminal window
buf curl --protocol connect \
https://candela.example.com/candela.v1.UserService/ListAuditLog \
-d '{"user_id": "alice@example.com", "limit": 20}'