OpenAI Compatible

Queue a chat completion request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

`application/json`

curl -X POST "https://loading/v1/chat/completions" \  -H "Content-Type: application/json" \  -d '{    "messages": [      {        "role": "user",        "content": "Summarize this document in 5 bullets."      }    ]  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "chat.completion.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "chat",
  "model": "Llama3_8B_Instruct",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

{
  "error": {
    "code": "missing_endpoint_id",
    "message": "No RunPod endpoint configured for surface chat"
  }
}

Empty

Queue an image generation request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

`application/json`

curl -X POST "https://loading/v1/images/generations" \  -H "Content-Type: application/json" \  -d '{    "prompt": "A moody cyberpunk alley at night with rain reflections, 35mm film look"  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "image.generation.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "images",
  "model": "Flux1schnell",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

Empty

Queue an audio transcription request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

`application/json`

curl -X POST "https://loading/v1/audio/transcriptions" \  -H "Content-Type: application/json" \  -d '{    "audioUrl": "https://cdn.example.com/audio/customer-call-2026-03-15.mp3"  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "audio.transcription.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "transcribe",
  "model": "WhisperLargeV3",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

Empty

Queue an embeddings request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

Response Body

`application/json`

curl -X POST "https://loading/v1/embeddings" \  -H "Content-Type: application/json" \  -d '{    "input": "How to optimize cold starts for serverless GPUs"  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "embedding.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "embeddings",
  "model": "BGE_Large",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

Empty

Last updated on 21 March 2026

Queue a chat completion request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

model?string

Active chat model slug used for OpenAI-compatible chat routing.

Length1 <= length

messages*array<>

Conversation message list in OpenAI chat format.

Items1 <= items

temperature?number

Sampling temperature. Lower values are more deterministic.

Range0 <= value <= 2

max_tokens?integer

Maximum number of output tokens.

Range1 <= value <= 8192

stream?boolean

Request token streaming from the upstream provider when supported.

Defaultfalse

user?string

Caller-provided user identifier for traceability and abuse monitoring.

Length1 <= length

webhook_url?string

Optional HTTPS webhook URL for async status updates.

Formaturi

[key: string]?any

Response Body

`application/json`

curl -X POST "https://loading/v1/chat/completions" \  -H "Content-Type: application/json" \  -d '{    "messages": [      {        "role": "user",        "content": "Summarize this document in 5 bullets."      }    ]  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "chat.completion.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "chat",
  "model": "Llama3_8B_Instruct",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

{
  "error": {
    "code": "missing_endpoint_id",
    "message": "No RunPod endpoint configured for surface chat"
  }
}

Empty

Queue an image generation request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

model?string

Active image model slug used for OpenAI-compatible image routing.

Value in"Ben2" | "Flux_2_Klein_4B_BF16" | "Ltx2_3_22B_Dist_INT8" | "RealESRGAN_x4" | "ZImageTurbo_INT8"

prompt*string

Natural-language prompt used for image generation.

Length1 <= length

negative_prompt?string

Optional content to discourage in the generated output.

Length1 <= length

n?integer

Number of images to generate.

Default1

Range1 <= value <= 8

size?string

Requested output size in WIDTHxHEIGHT format.

Length3 <= length

response_format?|

Preferred image payload format when provider supports it.

webhook_url?string

Optional HTTPS webhook URL for async status updates.

Formaturi

[key: string]?any

Response Body

`application/json`

curl -X POST "https://loading/v1/images/generations" \  -H "Content-Type: application/json" \  -d '{    "prompt": "A moody cyberpunk alley at night with rain reflections, 35mm film look"  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "image.generation.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "images",
  "model": "Flux1schnell",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

Empty

Queue an audio transcription request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

model?string

Active transcription model slug used for OpenAI-compatible transcription routing.

Value in"WhisperLargeV3"

audioUrl*string

Publicly accessible audio URL to transcribe.

Formaturi

language?string

Optional BCP-47 language hint, such as en or es.

Length2 <= length

prompt?string

Optional prompt to bias transcription output.

Length1 <= length

response_format?||||

Preferred output format when provider supports custom transcript renderers.

webhook_url?string

Optional HTTPS webhook URL for async status updates.

Formaturi

[key: string]?any

Response Body

`application/json`

curl -X POST "https://loading/v1/audio/transcriptions" \  -H "Content-Type: application/json" \  -d '{    "audioUrl": "https://cdn.example.com/audio/customer-call-2026-03-15.mp3"  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "audio.transcription.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "transcribe",
  "model": "WhisperLargeV3",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

Empty

Queue an embeddings request

Authorization

BearerAuth

AuthorizationBearer <token>

Use Authorization: Bearer <api-key>.

In: header

Request Body

application/json

TypeScript Definitions

Use the request body type in TypeScript.

model?string

Active embeddings model slug used for OpenAI-compatible embedding routing.

Value in"Bge_M3_INT8"

input*|array<>

dimensions?integer

Optional embedding dimensionality override when supported by the model.

Range1 <= value

encoding_format?|

Vector encoding preference when supported by the provider.

user?string

Caller-provided user identifier for tracing and abuse monitoring.

Length1 <= length

webhook_url?string

Optional HTTPS webhook URL for async status updates.

Formaturi

[key: string]?any

Response Body

`application/json`

curl -X POST "https://loading/v1/embeddings" \  -H "Content-Type: application/json" \  -d '{    "input": "How to optimize cold starts for serverless GPUs"  }'

{
  "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
  "object": "embedding.enqueue",
  "created": 1773504000,
  "status": "queued",
  "surface": "embeddings",
  "model": "BGE_Large",
  "endpoint_id": "o6r4i5q9j8k7l6",
  "runpod": {
    "id": "f3de27f8-61d5-4d58-aad1-a7d63f8a6e0f",
    "status": "IN_QUEUE"
  }
}

Empty

Last updated on 21 March 2026

OpenAI Compatible

202application/json

400application/json

401

429

500

502

202application/json

400

401

429

500

502

202application/json

400

401

429

500

502

202application/json

400

401

429

500

502

OpenAI Compatible

202application/json

400application/json

401

429

500

502

202application/json

400

401

429

500

502

202application/json

400

401

429

500

502

202application/json

400

401

429

500

502

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`