Errors & limits

Errors use an OpenAI-style envelope:


{
  "error": {
    "message": "Human-readable description",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

HTTP status matrix

Status	`code`	When
`401`	`invalid_api_key`	Missing, malformed, or revoked API key
`402`	`insufficient_quota`	Inference budget exhausted for the billing period (Gateway plans)
`403`	`account_disabled`	Account not allowed to use inference
`403`	`insufficient_permissions`	Plan or feature not enabled
`403`	`insufficient_security_classification`	Model provider below key classification
`429`	`rate_limit_exceeded`	Per-minute request limit or concurrent stream cap
`400`	`invalid_request_error`	Invalid JSON or unsupported parameters
`400`	`model_not_found`	Unknown or disabled model ID
`500`	`internal_error`	Unexpected server error

Rate limits

Per-account limits apply to keep the service stable for all users:

Requests per minute — maximum completion and model-list requests in a rolling window.
Concurrent streaming — maximum in-flight streaming completions at once.

When a limit is exceeded, the API returns 429 with rate_limit_exceeded. Retry after a short delay; respect Retry-After when the response includes it.

Budget (402)

On Gateway plans, API Access and the web chat share one inference budget. When remaining budget is zero, completions return 402 with insufficient_quota.

On BYOK plans, Steinkauz does not enforce an inference budget. You may still receive errors from your upstream providers if their quotas or billing limits are reached.

CORS

API Access is intended for server-to-server use. Do not expose API keys in front-end code or call the API directly from a browser.

Retries

Retry 429 with exponential backoff and respect Retry-After when present.
Do not retry 401, 402, or 403 without fixing credentials, budget, or configuration.
500 may be retried sparingly.