Skip to Content
API AccessErrors & Limits

Errors & limits

Errors use an OpenAI-style envelope:

{ "error": { "message": "Human-readable description", "type": "invalid_request_error", "code": "invalid_api_key" } }

HTTP status matrix

StatuscodeWhen
401invalid_api_keyMissing, malformed, or revoked API key
402insufficient_quotaInference budget exhausted for the billing period (Gateway plans)
403account_disabledAccount not allowed to use inference
403insufficient_permissionsPlan or feature not enabled
403insufficient_security_classificationModel provider below key classification
429rate_limit_exceededPer-minute request limit or concurrent stream cap
400invalid_request_errorInvalid JSON or unsupported parameters
400model_not_foundUnknown or disabled model ID
500internal_errorUnexpected server error

Rate limits

Per-account limits apply to keep the service stable for all users:

  • Requests per minute — maximum completion and model-list requests in a rolling window.
  • Concurrent streaming — maximum in-flight streaming completions at once.

When a limit is exceeded, the API returns 429 with rate_limit_exceeded. Retry after a short delay; respect Retry-After when the response includes it.

Budget (402)

On Gateway plans, API Access and the web chat share one inference budget. When remaining budget is zero, completions return 402 with insufficient_quota.

On BYOK plans, Steinkauz does not enforce an inference budget. You may still receive errors from your upstream providers if their quotas or billing limits are reached.

CORS

API Access is intended for server-to-server use. Do not expose API keys in front-end code or call the API directly from a browser.

Retries

  • Retry 429 with exponential backoff and respect Retry-After when present.
  • Do not retry 401, 402, or 403 without fixing credentials, budget, or configuration.
  • 500 may be retried sparingly.
Last updated on