Skip to main content
This is the authoritative reference for client-side error handling on Corvex. The error envelope shape and the code values below are stable for the Alpha contract. New codes may be added in an additive, non-breaking way as the platform evolves.

Error envelope

The endpoint family decides the envelope shape (no content negotiation): the OpenAI-compatible endpoints return the OpenAI-style envelope below, and the Anthropic-compatible endpoints return the Anthropic-style envelope. The stable code catalog is shared by both.

OpenAI endpoints

All Corvex OpenAI-family inference endpoints (/v1/chat/completions, /v1/completions, /v1/embeddings, /v1/models, /v1/responses) return errors as Content-Type: application/json with this shape:
{
  "error": {
    "code": "invalid_api_key",
    "message": "Invalid API key",
    "type": "authentication_error"
  }
}
For invalid_request_errors where the failing field is identifiable, the gateway additionally emits an optional param:
{
  "error": {
    "code": "bad_request",
    "message": "model field is required",
    "type": "invalid_request_error",
    "param": "model"
  }
}
  • code — stable, snake_case, machine-readable slug. Match on this in your client code. Codes do not change once published.
  • message — human-readable string safe to surface in UIs or logs. The wording may change between releases — do not parse it.
  • type — OpenAI-style category enum: invalid_request_error, authentication_error, permission_error, not_found_error, rate_limit_error, server_error, service_unavailable. Used by the OpenAI Python and TypeScript SDKs to pick the exception class (AuthenticationError, RateLimitError, etc.). Safe to ignore if you match on code directly.
  • param (optional) — for invalid_request_errors, the request field at fault ("model", "input", "messages", etc.). Omitted when no specific field is identifiable, and omitted on non-validation errors. Mirrors OpenAI’s error-object shape so the OpenAI Python and TypeScript SDKs surface it on BadRequestError.param.
  • HTTP status — conveys the broad category (auth, throttling, server, etc.). The code is the precise identity within that category.
  • x-request-id (response header) — present on every response, including all errors (401/403/404 and every 4xx/5xx). Requests rejected at the gateway carry a minted req-gw-<hex> id; responses that reach an engine pass its id through. Log it and quote it in support requests so a failure can be correlated end-to-end (RD-919).
Not RFC 7807 / application/problem+json. Despite the “problem+json” wording in the originating ticket (REQ-025), Corvex emits an OpenAI-compatible envelope under application/json, matching the shape used by OpenAI, Fireworks, Together.ai, and Baseten. See ADR-039 for the rationale.

Anthropic endpoints (/v1/messages)

The Anthropic-compatible endpoints (POST /v1/messages, POST /v1/messages/count_tokens) return every 4xx/5xx as Content-Type: application/json with the Anthropic-style envelope, extended with the same stable code catalog:
{
  "type": "error",
  "error": {
    "type": "authentication_error",
    "message": "Invalid API key",
    "code": "invalid_api_key"
  },
  "request_id": "req_abc123"
}
  • error.type — Anthropic’s published category enum: invalid_request_error, authentication_error, permission_error, not_found_error, request_too_large, rate_limit_error, api_error, overloaded_error. The Anthropic Python and TypeScript SDKs use it (with the HTTP status) to pick the exception class (AuthenticationError, RateLimitError, etc.). Relative to the OpenAI-style type, two values map: server_errorapi_error and service_unavailableoverloaded_error; the rest are identical.
  • error.code — the SAME stable, snake_case catalog documented on this page. Match on this for precise handling — e.g. rate_limited vs token_limited both surface as rate_limit_error, and only code distinguishes them.
  • error.message — human-readable; wording may change between releases. There is no param field on this surface — the failing request field is named in the message text instead.
  • request_id — echoes your X-Request-Id request header (also returned as a response header) so you can correlate to gateway logs. Omitted when the request carried no X-Request-Id.
  • HTTP status — unchanged from the table below; Corvex does not adopt Anthropic’s 529 (overloaded_error ships on 503) or 413 (oversized requests surface as 400 bad_request).
Errors raised by an upstream engine on these endpoints are normalized into this same envelope before reaching you — clients never see a raw engine error body (on either endpoint family). For example, an upstream 502/504 surfaces as api_error and a 503 as overloaded_error:
{
  "type": "error",
  "error": {
    "type": "api_error",
    "message": "upstream service error",
    "code": "server_error"
  },
  "request_id": "req_abc123"
}
For stream: true requests, failures that occur before the stream starts return this JSON envelope with the error status (not an SSE body).

Categories

Each code below belongs to one of these categories:
CategoryWhat it means
AuthAuthentication or authorization failed (no valid key, key disabled, key not allowed for this scope).
ThrottlingRate or token quota was exceeded; back off and try again.
BillingCredits or quota are exhausted; top up to continue.
RequestThe request itself is malformed or invalid.
CatalogThe requested model or resource is not available.
ServerSomething went wrong on the platform; retry with backoff.
CapacityPlatform is temporarily over capacity; retry with backoff.

Quick reference

HTTPCodeCategoryRetryable?
400bad_requestRequestNo
401invalid_api_keyAuthNo
403virtual_key_blockedAuthNo
403model_blockedAuthNo
404model_unavailableCatalogNo
404not_foundRequestNo
429rate_limitedThrottlingYes (with Retry-After)
429token_limitedThrottlingYes (with Retry-After)
402insufficient_creditsBillingNo (top up first)
500server_errorServerYes (exponential backoff)
502service_unavailableCapacityYes (exponential backoff)
502server_errorServerYes (exponential backoff)
503service_unavailableCapacityYes (exponential backoff)
A 502 carries one of two codes depending on the source: service_unavailable when the gateway could not reach the upstream engine at all (capacity-class — the engine is unreachable), or server_error when the upstream itself answered 502 and the gateway normalized the error body (server-class — the upstream failed internally). Both are retryable with exponential backoff.

Codes

Each entry below documents:
  • Cause — what triggered this error.
  • Remediation — what the API consumer should do.
  • Retry — whether the SDK retries this automatically, and how.
  • Where to look — dashboards, settings, or support flow to investigate.

invalid_api_key (401)

Cause. No Authorization: Bearer <key> header, malformed header, or the key value is unknown to the gateway. Also returned when an admin key (sk-admin-*) is used on an inference endpoint such as /v1/chat/completions or /v1/embeddings. Remediation. Mint or rotate a virtual key (sk-corvex-*) for inference calls, or an admin key (sk-admin-*) for /api/* management calls. See Authentication for the key types. Retry. No. This is a non-retryable client error. Where to look. Dashboard → API Keys. Revoked keys appear with status disabled; expired keys must be re-minted.
{ "error": { "code": "invalid_api_key", "message": "Invalid API key", "type": "authentication_error" } }

virtual_key_blocked (403)

Cause. The virtual key was valid but has been deactivated (set is_active = false) by an admin or by automated policy. Remediation. Ask your team admin to re-enable the key, or mint a new virtual key. Retry. No. Where to look. Dashboard → API Keys → key status. Audit log will show the deactivation event.
{ "error": { "code": "virtual_key_blocked", "message": "Virtual key is deactivated", "type": "permission_error" } }

model_blocked (403)

Cause. The key is active, but the requested model is not in the key’s allowed_models list (or the team-level allow-list). Remediation. Ask your team admin to add the model to allowed_models on the virtual key or team, or use a different model that is allowed. Retry. No. Where to look. Dashboard → API Keys → key detail → Allowed models.
{ "error": { "code": "model_blocked", "message": "Model not allowed for this virtual key", "type": "permission_error" } }

model_unavailable (404)

The exact 4xx mapping may move between 404 (not in catalog) and 503 (in catalog but not currently deployed) once the gateway catalog hook lands. The code value is stable; treat 404/503 with this code as equivalent for client-side handling.
Cause. The requested model identifier (e.g. meta-llama/Llama-3-8B-Instruct) is not in the catalog, or is in the catalog but no engine workers are currently registered to serve it. Remediation. List available models via GET /v1/models and pick a model returned by the gateway. If you expect the model to be available, contact your team admin to verify the release status. Retry. No, unless you have just deployed a new release — in that case poll GET /api/v1/release for status == "deployed" before retrying (see the Quickstart). Where to look. Dashboard → Workloads → active release. Models list endpoint: GET /v1/models.
{ "error": { "code": "model_unavailable", "message": "Model not available", "type": "not_found_error" } }

not_found (404)

Cause. The request reached a valid inference surface prefix (/v1/… or /openai/v1/…) but no endpoint matched — e.g. GET /v1/models/{id} or a typo like GET /v1/bogus. Distinct from model_unavailable, which means the endpoint matched but the model is not serveable. Remediation. Check the path against the API reference. Unknown /v1/* paths return this structured envelope (RD-919) rather than a bare string, so SDK error parsing stays intact. Retry. No. This is a non-retryable client error. Where to look. Your request URL. The x-request-id response header correlates the 404 in the gateway logs.
{ "error": { "code": "not_found", "message": "unknown endpoint: /v1/bogus", "type": "not_found_error" } }

rate_limited (429)

Cause. The virtual key’s request-rate limit was exceeded. Limits are per key, per team, and may further be per model. Remediation. Slow down. If the response includes a Retry-After header (in seconds), wait that long before retrying. Otherwise back off exponentially. Retry. Yes. The Corvex Python and TypeScript SDKs retry automatically (FR-078): up to 3 attempts by default, exponential backoff starting at 1 s, and they honor Retry-After when present. Where to look. Dashboard → Usage → request rate per key. Adjust the key’s request_max_limit / request_reset_duration via the Governance API or ask your admin.
{ "error": { "code": "rate_limited", "message": "Request rate limit exceeded", "type": "rate_limit_error" } }

token_limited (429)

Cause. The virtual key’s token-rate limit was exceeded (independent of the request-rate limit — both run in parallel). Remediation. Same as rate_limited. The Corvex SDKs surface this as RateLimitError with type = "token_limited" and a retry_after field (FR-079) so you can branch on the limit category. Retry. Yes, same retry policy as rate_limited. Honor Retry-After. Where to look. Dashboard → Usage → token throughput per key. Adjust token_max_limit / token_reset_duration via the Governance API.
{ "error": { "code": "token_limited", "message": "Token rate limit exceeded", "type": "rate_limit_error" } }

insufficient_credits (402)

Billing is gated behind Stripe (FR-212). During Alpha you may not see this code unless prepaid credits are enabled on your workspace. The code value is stable; expect this surface to be emitted by the gateway as billing-enforcement lands.
Cause. One of: the workspace’s prepaid credit balance has reached zero; a hard-enforcement quota was exhausted for the current reset window; or the workspace’s monthly spend cap has been reached (see Monthly spend caps). The dollar spend cap resets at the start of each calendar month. Remediation. Top up credits in the dashboard (Billing → Add credits); for quota-based exhaustion, wait for the next reset window or have your admin raise the quota; for a spend cap, have the workspace owner raise the monthly cap (which unblocks requests immediately) or wait for the month to reset. Retry. No. Retrying without topping up will keep returning the same error. Where to look. Dashboard → Billing → Credit balance. Dashboard → Usage → quota burn-down per key.
{ "error": { "code": "insufficient_credits", "message": "Workspace has insufficient credits", "type": "permission_error" } }

bad_request (400)

Cause. The request body is malformed, missing required fields, or fails server-side validation. Examples: missing model, missing messages for chat completions, invalid JSON, or a key-management operation missing a required field. Also returned on POST /v1/completions when the requested model is not a text-completion model. To match OpenAI — which rejects chat models on the legacy completions surface and points them at /v1/chat/completions — the gateway serves /v1/completions only for catalog models that advertise a text-completion capability. Chat/instruct models and embedding-only models are rejected at the gateway before the request reaches an engine, with a message pointing to /v1/chat/completions. Models outside the catalog are forwarded unchanged (the engine remains the source of truth). This applies to streaming and non-streaming requests alike. Also returned on POST /v1/embeddings when the requested model is not an embedding model (e.g. a chat/instruct-only model). The gateway rejects it with param: "model" before the request reaches an engine, rather than letting the engine return a 5xx. Dual-purpose models that serve embeddings alongside chat are accepted; check GET /v1/models for embedding-capable entries. Also returned when a request carries an image content block to a text-only model (one that does not advertise the vision capability). Rather than let the engine silently fabricate a description, the gateway rejects the request with param: "messages" (RD-911). Check a model’s capabilities via GET /v1/models or its model card before sending images. Also returned when a numeric generation parameter is out of range or the wrong type — for example a negative or non-integer max_tokens (on /v1/chat/completions, /v1/completions, /v1/messages) or max_output_tokens (on /v1/responses). The gateway validates these before the request reaches the router, so the client gets a clean field-level 400 (the param names the field) rather than the upstream deserializer’s raw error string (FR-023, RD-965). Integer-valued floats (5.0, 1e3) and 0 are accepted; an absent value is accepted (the cap is optional, except where the endpoint requires it). Remediation. Read the message for the specific field at fault and correct the request. For the legacy-completions case, either switch the model to a chat- or text-generation model, or call /v1/chat/completions (recommended) or /v1/embeddings instead, depending on what you’re trying to do. The OpenAPI spec at API Reference is the authoritative schema. Retry. No. Retrying the same payload will return the same error. Where to look. Validate your request against the OpenAPI schema. The SDKs do this client-side for most endpoints. To see which models support which endpoint, call GET /v1/models — only text-completion-capable models are valid model values for /v1/completions (chat/instruct and embedding-only models are rejected), and only embedding-capable models are valid for /v1/embeddings.
{ "error": { "code": "bad_request", "message": "Model is required", "type": "invalid_request_error" } }
{ "error": { "code": "bad_request", "message": "model 'Qwen/Qwen3-Embedding-0.6B' does not support /v1/completions; use /v1/chat/completions or /v1/embeddings instead", "type": "invalid_request_error", "param": "model" } }
{ "error": { "code": "bad_request", "message": "model 'Qwen/Qwen2.5-0.5B-Instruct' does not support /v1/completions; use /v1/chat/completions or /v1/embeddings instead", "type": "invalid_request_error", "param": "model" } }
{ "error": { "code": "bad_request", "message": "model 'MiniMaxAI/MiniMax-M2.5' does not support the /v1/embeddings endpoint", "type": "invalid_request_error", "param": "model" } }
{ "error": { "code": "bad_request", "message": "messages: required", "type": "invalid_request_error", "param": "messages" } }
{ "error": { "code": "bad_request", "message": "model 'MiniMaxAI/MiniMax-M2.5' does not support image input", "type": "invalid_request_error", "param": "messages" } }
{ "error": { "code": "bad_request", "message": "max_tokens must be a non-negative integer", "type": "invalid_request_error", "param": "max_tokens" } }
{ "error": { "code": "bad_request", "message": "max_output_tokens must be a non-negative integer", "type": "invalid_request_error", "param": "max_output_tokens" } }

server_error (500, 502)

Cause. An unexpected server-side failure: a downstream component crashed, a database call failed, or the gateway hit an internal panic. On HTTP 502, this code means the upstream engine itself answered 502 and the gateway normalized its error body into the standard envelope. Remediation. Retry with exponential backoff. If the error persists beyond ~30 s of backoff, capture a sample response (including the x-request-id header if present) and reach out to support — the request ID lets us correlate to gateway logs. Retry. Yes. The Corvex SDKs retry 5xx by default (FR-078): up to 3 attempts, exponential backoff starting at 1 s. The SDK raises APIError with status_code = 500 if all retries are exhausted. Where to look. Status page (when available). For self-hosted clusters, check gateway logs for the matching x-request-id.
{ "error": { "code": "server_error", "message": "Internal server error", "type": "server_error" } }

service_unavailable (502, 503)

Cause. The platform is temporarily over capacity, an upstream engine has not finished registering with the router, or a queue-depth timeout fired before a worker became available (FR-205). On HTTP 502, this code means the gateway could not reach the upstream engine at all (connect failure). Remediation. Retry with exponential backoff. Honor Retry-After when present. Retry. Yes, same policy as server_error. Where to look. Status page. For self-hosted clusters, check that the release status is deployed (GET /api/v1/release) and that workers are registered with the router (/v1/models returns the expected list).
{ "error": { "code": "service_unavailable", "message": "Service temporarily unavailable", "type": "service_unavailable" } }

Retry semantics

The Corvex Python and TypeScript SDKs implement automatic retry per FR-078:
  • Retry on: HTTP 429 (rate_limited, token_limited) and any HTTP 5xx (server_error, service_unavailable).
  • Do not retry on: Other 4xx codes (bad_request, invalid_api_key, virtual_key_blocked, model_blocked, model_unavailable, insufficient_credits). These are raised immediately as APIError.
  • Backoff. Starts at 1 s and doubles per attempt, up to a configurable maximum. Default is 3 retries.
  • Retry-After header. When the server returns Retry-After, the SDK waits at least that long before retrying. The header takes precedence over the default exponential backoff.
Rate-limit errors are surfaced as RateLimitError (FR-079), a subclass of APIError exposing:
  • retry_after — value (in seconds) from the last response’s Retry-After header, when present.
  • type — the SDK’s RateLimitError.type property, either "token_limited" or "request_limited", so callers can distinguish which limit was hit. This is the SDK-side property and is distinct from the wire envelope’s error.type field (which is the OpenAI category enum "rate_limit_error").

Prompt caching (cache_control)

The Anthropic Messages surface (POST /v1/messages, POST /v1/messages/count_tokens) accepts cache_control annotations on system and message content blocks — they never cause a 4xx error, so Claude Code and Anthropic-SDK clients that send them work unchanged. Prompt caching is not supported, however: the annotation carries no caching semantics on this platform. Every /v1/messages response therefore reports the cache usage fields explicitly as zero — on the non-streaming response’s top-level usage, and on the streaming message_start event’s message.usage:
"usage": {
  "input_tokens": 32,
  "output_tokens": 10,
  "cache_creation_input_tokens": 0,
  "cache_read_input_tokens": 0
}
A value of 0 means “caching ran and cached nothing” in the spec-faithful shape the Anthropic SDKs read; it does not indicate an error. This is a deliberate scope decision (FR-602 / RD-887), not a defect — real prompt caching is tracked separately. See ADR-039 Amendment 2026-06-11.

See also

  • API Reference — endpoint-by-endpoint request/response schemas, including error responses per endpoint.
  • Authentication — virtual keys (sk-corvex-*) for inference, admin keys (sk-admin-*) for /api/* management.
  • Quickstart — first inference call end-to-end.