Errors & limits
Errors use an OpenAI-style envelope:
{
"error": {
"message": "Human-readable description",
"type": "invalid_request_error",
"code": "invalid_api_key"
}
}HTTP status matrix
| Status | code | When |
|---|---|---|
401 | invalid_api_key | Missing, malformed, or revoked API key |
402 | insufficient_quota | Inference budget exhausted for the billing period (Gateway plans) |
403 | account_disabled | Account not allowed to use inference |
403 | insufficient_permissions | Plan or feature not enabled |
403 | insufficient_security_classification | Model provider below key classification |
429 | rate_limit_exceeded | Per-minute request limit or concurrent stream cap |
400 | invalid_request_error | Invalid JSON or unsupported parameters |
400 | model_not_found | Unknown or disabled model ID |
500 | internal_error | Unexpected server error |
Rate limits
Per-account limits apply to keep the service stable for all users:
- Requests per minute — maximum completion and model-list requests in a rolling window.
- Concurrent streaming — maximum in-flight streaming completions at once.
When a limit is exceeded, the API returns 429 with rate_limit_exceeded. Retry after a short delay; respect Retry-After when the response includes it.
Budget (402)
On Gateway plans, API Access and the web chat share one inference budget. When remaining budget is zero, completions return 402 with insufficient_quota.
On BYOK plans, Steinkauz does not enforce an inference budget. You may still receive errors from your upstream providers if their quotas or billing limits are reached.
CORS
API Access is intended for server-to-server use. Do not expose API keys in front-end code or call the API directly from a browser.
Retries
- Retry
429with exponential backoff and respectRetry-Afterwhen present. - Do not retry
401,402, or403without fixing credentials, budget, or configuration. 500may be retried sparingly.
Last updated on