This is the authoritative reference for client-side error handling on
Corvex. The error envelope shape and the code values below are
stable for the Alpha contract. New codes may be added in an additive,
non-breaking way as the platform evolves.
Error envelope
The endpoint family decides the envelope shape (no content negotiation): the OpenAI-compatible endpoints return the OpenAI-style envelope below, and the Anthropic-compatible endpoints return the Anthropic-style envelope. The stablecode catalog is shared by both.
OpenAI endpoints
All Corvex OpenAI-family inference endpoints (/v1/chat/completions,
/v1/completions, /v1/embeddings, /v1/models, /v1/responses) return errors as
Content-Type: application/json with this shape:
invalid_request_errors where the failing field is identifiable, the
gateway additionally emits an optional param:
code— stable, snake_case, machine-readable slug. Match on this in your client code. Codes do not change once published.message— human-readable string safe to surface in UIs or logs. The wording may change between releases — do not parse it.type— OpenAI-style category enum:invalid_request_error,authentication_error,permission_error,not_found_error,rate_limit_error,server_error,service_unavailable. Used by the OpenAI Python and TypeScript SDKs to pick the exception class (AuthenticationError,RateLimitError, etc.). Safe to ignore if you match oncodedirectly.param(optional) — forinvalid_request_errors, the request field at fault ("model","input","messages", etc.). Omitted when no specific field is identifiable, and omitted on non-validation errors. Mirrors OpenAI’s error-object shape so the OpenAI Python and TypeScript SDKs surface it onBadRequestError.param.- HTTP status — conveys the broad category (auth, throttling, server,
etc.). The
codeis the precise identity within that category. x-request-id(response header) — present on every response, including all errors (401/403/404 and every 4xx/5xx). Requests rejected at the gateway carry a mintedreq-gw-<hex>id; responses that reach an engine pass its id through. Log it and quote it in support requests so a failure can be correlated end-to-end (RD-919).
Not RFC 7807 /
application/problem+json. Despite the
“problem+json” wording in the originating ticket (REQ-025), Corvex emits
an OpenAI-compatible envelope under application/json, matching the
shape used by OpenAI, Fireworks, Together.ai, and Baseten. See
ADR-039
for the rationale.Anthropic endpoints (/v1/messages)
The Anthropic-compatible endpoints (POST /v1/messages,
POST /v1/messages/count_tokens) return every 4xx/5xx as
Content-Type: application/json with the Anthropic-style envelope,
extended with the same stable code catalog:
error.type— Anthropic’s published category enum:invalid_request_error,authentication_error,permission_error,not_found_error,request_too_large,rate_limit_error,api_error,overloaded_error. The Anthropic Python and TypeScript SDKs use it (with the HTTP status) to pick the exception class (AuthenticationError,RateLimitError, etc.). Relative to the OpenAI-styletype, two values map:server_error→api_errorandservice_unavailable→overloaded_error; the rest are identical.error.code— the SAME stable, snake_case catalog documented on this page. Match on this for precise handling — e.g.rate_limitedvstoken_limitedboth surface asrate_limit_error, and onlycodedistinguishes them.error.message— human-readable; wording may change between releases. There is noparamfield on this surface — the failing request field is named in the message text instead.request_id— echoes yourX-Request-Idrequest header (also returned as a response header) so you can correlate to gateway logs. Omitted when the request carried noX-Request-Id.- HTTP status — unchanged from the table below; Corvex does not adopt
Anthropic’s 529 (
overloaded_errorships on 503) or 413 (oversized requests surface as 400bad_request).
502/504 surfaces as api_error and a 503 as overloaded_error:
stream: true requests, failures that occur before the stream starts
return this JSON envelope with the error status (not an SSE body).
Categories
Each code below belongs to one of these categories:| Category | What it means |
|---|---|
| Auth | Authentication or authorization failed (no valid key, key disabled, key not allowed for this scope). |
| Throttling | Rate or token quota was exceeded; back off and try again. |
| Billing | Credits or quota are exhausted; top up to continue. |
| Request | The request itself is malformed or invalid. |
| Catalog | The requested model or resource is not available. |
| Server | Something went wrong on the platform; retry with backoff. |
| Capacity | Platform is temporarily over capacity; retry with backoff. |
Quick reference
| HTTP | Code | Category | Retryable? |
|---|---|---|---|
| 400 | bad_request | Request | No |
| 401 | invalid_api_key | Auth | No |
| 403 | virtual_key_blocked | Auth | No |
| 403 | model_blocked | Auth | No |
| 404 | model_unavailable | Catalog | No |
| 404 | not_found | Request | No |
| 429 | rate_limited | Throttling | Yes (with Retry-After) |
| 429 | token_limited | Throttling | Yes (with Retry-After) |
| 402 | insufficient_credits | Billing | No (top up first) |
| 500 | server_error | Server | Yes (exponential backoff) |
| 502 | service_unavailable | Capacity | Yes (exponential backoff) |
| 502 | server_error | Server | Yes (exponential backoff) |
| 503 | service_unavailable | Capacity | Yes (exponential backoff) |
service_unavailable when the gateway could not reach the upstream
engine at all (capacity-class — the engine is unreachable), or
server_error when the upstream itself answered 502 and the gateway
normalized the error body (server-class — the upstream failed
internally). Both are retryable with exponential backoff.
Codes
Each entry below documents:- Cause — what triggered this error.
- Remediation — what the API consumer should do.
- Retry — whether the SDK retries this automatically, and how.
- Where to look — dashboards, settings, or support flow to investigate.
invalid_api_key (401)
Cause. NoAuthorization: Bearer <key> header, malformed header, or
the key value is unknown to the gateway. Also returned when an admin key
(sk-admin-*) is used on an inference endpoint such as
/v1/chat/completions or /v1/embeddings.
Remediation. Mint or rotate a virtual key (sk-corvex-*) for inference
calls, or an admin key (sk-admin-*) for /api/* management calls. See
Authentication for the key
types.
Retry. No. This is a non-retryable client error.
Where to look. Dashboard → API Keys. Revoked keys appear with status
disabled; expired keys must be re-minted.
virtual_key_blocked (403)
Cause. The virtual key was valid but has been deactivated (setis_active = false) by an admin or by automated policy.
Remediation. Ask your team admin to re-enable the key, or mint a new
virtual key.
Retry. No.
Where to look. Dashboard → API Keys → key status. Audit log will show
the deactivation event.
model_blocked (403)
Cause. The key is active, but the requested model is not in the key’sallowed_models list (or the team-level allow-list).
Remediation. Ask your team admin to add the model to allowed_models
on the virtual key or team, or use a different model that is allowed.
Retry. No.
Where to look. Dashboard → API Keys → key detail → Allowed models.
model_unavailable (404)
The exact 4xx mapping may move between 404 (not in catalog) and 503 (in
catalog but not currently deployed) once the gateway catalog hook lands.
The
code value is stable; treat 404/503 with this code as equivalent
for client-side handling.meta-llama/Llama-3-8B-Instruct) is not in the catalog, or is in
the catalog but no engine workers are currently registered to serve it.
Remediation. List available models via GET /v1/models and pick a
model returned by the gateway. If you expect the model to be available,
contact your team admin to verify the release status.
Retry. No, unless you have just deployed a new release — in that case
poll GET /api/v1/release for status == "deployed" before retrying
(see the Quickstart).
Where to look. Dashboard → Workloads → active release. Models list
endpoint: GET /v1/models.
not_found (404)
Cause. The request reached a valid inference surface prefix (/v1/…
or /openai/v1/…) but no endpoint matched — e.g. GET /v1/models/{id}
or a typo like GET /v1/bogus. Distinct from model_unavailable, which
means the endpoint matched but the model is not serveable.
Remediation. Check the path against the API
reference. Unknown /v1/* paths return
this structured envelope (RD-919) rather than a bare string, so SDK error
parsing stays intact.
Retry. No. This is a non-retryable client error.
Where to look. Your request URL. The x-request-id response header
correlates the 404 in the gateway logs.
rate_limited (429)
Cause. The virtual key’s request-rate limit was exceeded. Limits are per key, per team, and may further be per model. Remediation. Slow down. If the response includes aRetry-After
header (in seconds), wait that long before retrying. Otherwise back off
exponentially.
Retry. Yes. The Corvex Python and TypeScript SDKs retry
automatically (FR-078): up to 3 attempts by default, exponential backoff
starting at 1 s, and they honor Retry-After when present.
Where to look. Dashboard → Usage → request rate per key. Adjust the
key’s request_max_limit / request_reset_duration via the Governance
API or ask your admin.
token_limited (429)
Cause. The virtual key’s token-rate limit was exceeded (independent of the request-rate limit — both run in parallel). Remediation. Same asrate_limited. The Corvex SDKs surface this as
RateLimitError with type = "token_limited" and a retry_after field
(FR-079) so you can branch on the limit category.
Retry. Yes, same retry policy as rate_limited. Honor
Retry-After.
Where to look. Dashboard → Usage → token throughput per key. Adjust
token_max_limit / token_reset_duration via the Governance API.
insufficient_credits (402)
Billing is gated behind Stripe (FR-212). During Alpha you may not see this
code unless prepaid credits are enabled on your workspace. The
code
value is stable; expect this surface to be emitted by the gateway as
billing-enforcement lands.bad_request (400)
Cause. The request body is malformed, missing required fields, or fails server-side validation. Examples: missingmodel, missing
messages for chat completions, invalid JSON, or a key-management
operation missing a required field.
Also returned on POST /v1/completions when the requested model is not a
text-completion model. To match OpenAI — which rejects chat models on the
legacy completions surface and points them at /v1/chat/completions — the
gateway serves /v1/completions only for catalog models that advertise a
text-completion capability. Chat/instruct models and embedding-only models
are rejected at the gateway before the request reaches an engine, with a
message pointing to /v1/chat/completions. Models outside the catalog are
forwarded unchanged (the engine remains the source of truth). This applies to
streaming and non-streaming requests alike.
Also returned on POST /v1/embeddings when the requested model is not an
embedding model (e.g. a chat/instruct-only model). The gateway rejects it
with param: "model" before the request reaches an engine, rather than
letting the engine return a 5xx. Dual-purpose models that serve embeddings
alongside chat are accepted; check GET /v1/models for embedding-capable
entries.
Also returned when a request carries an image content block to a
text-only model (one that does not advertise the vision capability).
Rather than let the engine silently fabricate a description, the gateway
rejects the request with param: "messages" (RD-911). Check a model’s
capabilities via GET /v1/models or its model card before sending images.
Also returned when a numeric generation parameter is out of range or the
wrong type — for example a negative or non-integer max_tokens (on
/v1/chat/completions, /v1/completions, /v1/messages) or max_output_tokens
(on /v1/responses). The gateway validates these before the request reaches the
router, so the client gets a clean field-level 400 (the param names the field)
rather than the upstream deserializer’s raw error string (FR-023, RD-965).
Integer-valued floats (5.0, 1e3) and 0 are accepted; an absent value is
accepted (the cap is optional, except where the endpoint requires it).
Remediation. Read the message for the specific field at fault and
correct the request. For the legacy-completions case, either switch the
model to a chat- or text-generation model, or call
/v1/chat/completions (recommended) or /v1/embeddings instead,
depending on what you’re trying to do. The OpenAPI spec at
API Reference is the authoritative
schema.
Retry. No. Retrying the same payload will return the same error.
Where to look. Validate your request against the OpenAPI schema. The
SDKs do this client-side for most endpoints. To see which models support
which endpoint, call GET /v1/models — only text-completion-capable models
are valid model values for /v1/completions (chat/instruct and
embedding-only models are rejected), and only embedding-capable models are
valid for /v1/embeddings.
server_error (500, 502)
Cause. An unexpected server-side failure: a downstream component crashed, a database call failed, or the gateway hit an internal panic. On HTTP 502, this code means the upstream engine itself answered 502 and the gateway normalized its error body into the standard envelope. Remediation. Retry with exponential backoff. If the error persists beyond ~30 s of backoff, capture a sample response (including thex-request-id header if present) and reach out to support — the request
ID lets us correlate to gateway logs.
Retry. Yes. The Corvex SDKs retry 5xx by default (FR-078): up to
3 attempts, exponential backoff starting at 1 s. The SDK raises
APIError with status_code = 500 if all retries are exhausted.
Where to look. Status page (when available). For self-hosted clusters,
check gateway logs for the matching x-request-id.
service_unavailable (502, 503)
Cause. The platform is temporarily over capacity, an upstream engine has not finished registering with the router, or a queue-depth timeout fired before a worker became available (FR-205). On HTTP 502, this code means the gateway could not reach the upstream engine at all (connect failure). Remediation. Retry with exponential backoff. HonorRetry-After when
present.
Retry. Yes, same policy as server_error.
Where to look. Status page. For self-hosted clusters, check that the
release status is deployed (GET /api/v1/release) and that workers are
registered with the router (/v1/models returns the expected list).
Retry semantics
The Corvex Python and TypeScript SDKs implement automatic retry per FR-078:- Retry on: HTTP
429(rate_limited,token_limited) and any HTTP5xx(server_error,service_unavailable). - Do not retry on: Other 4xx codes (
bad_request,invalid_api_key,virtual_key_blocked,model_blocked,model_unavailable,insufficient_credits). These are raised immediately asAPIError. - Backoff. Starts at 1 s and doubles per attempt, up to a configurable maximum. Default is 3 retries.
Retry-Afterheader. When the server returnsRetry-After, the SDK waits at least that long before retrying. The header takes precedence over the default exponential backoff.
RateLimitError (FR-079), a subclass of
APIError exposing:
retry_after— value (in seconds) from the last response’sRetry-Afterheader, when present.type— the SDK’sRateLimitError.typeproperty, either"token_limited"or"request_limited", so callers can distinguish which limit was hit. This is the SDK-side property and is distinct from the wire envelope’serror.typefield (which is the OpenAI category enum"rate_limit_error").
Prompt caching (cache_control)
The Anthropic Messages surface (POST /v1/messages,
POST /v1/messages/count_tokens) accepts cache_control annotations on
system and message content blocks — they never cause a 4xx error, so Claude
Code and Anthropic-SDK clients that send them work unchanged. Prompt caching
is not supported, however: the annotation carries no caching semantics on this
platform. Every /v1/messages response therefore reports the cache usage fields
explicitly as zero — on the non-streaming response’s top-level usage, and on
the streaming message_start event’s message.usage:
0 means “caching ran and cached nothing” in the spec-faithful shape
the Anthropic SDKs read; it does not indicate an error. This is a deliberate
scope decision (FR-602 / RD-887), not a defect — real prompt caching is tracked
separately. See ADR-039 Amendment 2026-06-11.
See also
- API Reference — endpoint-by-endpoint request/response schemas, including error responses per endpoint.
- Authentication — virtual
keys (
sk-corvex-*) for inference, admin keys (sk-admin-*) for/api/*management. - Quickstart — first inference call end-to-end.