Text-to-Speech (TTS)

Endpoint for requesting text2audio inference

Text-to-Speech converts text into natural-sounding audio. The endpoint supports three TTS modes via the mode parameter:

custom_voice (default) — Use a preset voice from the model's voice library. Requires the voice parameter.
voice_clone — Clone a voice from a short reference audio clip. Requires the ref_audio parameter (3–10 seconds, max 10 MB). Optionally provide ref_text with a transcript of the reference audio for improved accuracy.
voice_design — Create a new voice from a natural language description. Requires the instruct parameter (e.g. "A warm female voice with a British accent").

NOTE

Prerequisite: To ensure a successful request, you must first consult the Model Selection endpoint to identify a valid model slug, check specific limits and features, and verify available languages and voices.

WARNING

Mode-specific required fields:

custom_voice — voice is required.

voice_clone — ref_audio is required. ref_text is optional but recommended.

voice_design — instruct is required.

If mode is omitted, the API defaults to custom_voice.

Request Txt2 Audio

Authorization

bearerAuth

AuthorizationBearer <token>

In: header

Header Parameters

Accept*string

Default"application/json"

Value in"application/json"

Request Body

multipart/form-data

Audio generation parameters. Supports three TTS modes: custom_voice (default, preset speakers), voice_clone (clone from reference audio), voice_design (create voice from description).

TypeScript Definitions

Use the request body type in TypeScript.

`application/json`

curl -X POST "https://api.dryapi.dev/api/v1/client/txt2audio" \  -H "Accept: application/json" \  -F text="A beautiful sunset over mountains" \  -F model="Kokoro" \  -F lang="en-us" \  -F speed="1" \  -F format="flac" \  -F sample_rate="24000"

{
  "data": {
    "request_id": "c08a339c-73e5-4d67-a4d5-231302fbff9a"
  }
}

{
  "data": {},
  "message": "string",
  "errors": [
    null
  ],
  "statusCode": 0
}

{
  "data": {},
  "message": "string",
  "errors": [
    null
  ],
  "statusCode": 0
}

{
  "message": "The selected model does not support Text To Image.",
  "errors": {
    "model": [
      "The selected model does not support Text To Image."
    ]
  }
}

{
  "message": "Too Many Attempts."
}

Last updated on 21 March 2026

Endpoint for requesting text2audio inference

Text-to-Speech converts text into natural-sounding audio. The endpoint supports three TTS modes via the mode parameter:

custom_voice (default) — Use a preset voice from the model's voice library. Requires the voice parameter.
voice_clone — Clone a voice from a short reference audio clip. Requires the ref_audio parameter (3–10 seconds, max 10 MB). Optionally provide ref_text with a transcript of the reference audio for improved accuracy.
voice_design — Create a new voice from a natural language description. Requires the instruct parameter (e.g. "A warm female voice with a British accent").

NOTE

Prerequisite: To ensure a successful request, you must first consult the Model Selection endpoint to identify a valid model slug, check specific limits and features, and verify available languages and voices.

WARNING

Mode-specific required fields:

custom_voice — voice is required.

voice_clone — ref_audio is required. ref_text is optional but recommended.

voice_design — instruct is required.

If mode is omitted, the API defaults to custom_voice.

Request Txt2 Audio

Authorization

bearerAuth

AuthorizationBearer <token>

In: header

Header Parameters

Accept*string

Default"application/json"

Value in"application/json"

Request Body

multipart/form-data

Audio generation parameters. Supports three TTS modes: custom_voice (default, preset speakers), voice_clone (clone from reference audio), voice_design (create voice from description).

TypeScript Definitions

Use the request body type in TypeScript.

text*string

Text to be converted to speech

model*string

The model to use for speech generation. Available models can be retrieved via the GET /api/v1/client/models endpoint.

mode?|

TTS mode: custom_voice (default), voice_clone, or voice_design. Determines which fields are required.

Value in"custom_voice" | "voice_clone" | "voice_design"

voice?|

Name of the voice to be used. Required for custom_voice mode.

lang*string

Language to be used during audio generation

speed*number

Generated audio speech speed

format*string

Audio output format

sample_rate*number

Sample rate of generated audio

ref_audio?|

Reference audio file for voice cloning. Supported formats: mp3, wav, flac, ogg, m4a. Max 10MB. Duration must be between 3-10 seconds (model-specific limits may apply). Required for voice_clone mode.

Formatbinary

ref_text?|

Optional transcript of the reference audio for improved voice cloning accuracy.

instruct?|

Natural language voice description for voice_design mode (e.g. "A warm female voice with a British accent"), or style/emotion control in custom_voice mode.

webhook_url?|

Optional HTTPS URL to receive webhook notifications for job status changes (processing, completed, failed). Must be HTTPS. Max 2048 characters. See Webhook Documentation for payload structure and authentication details.

Formaturi

Lengthlength <= 2048

Response Body

`application/json`

curl -X POST "https://api.dryapi.dev/api/v1/client/txt2audio" \  -H "Accept: application/json" \  -F text="A beautiful sunset over mountains" \  -F model="Kokoro" \  -F lang="en-us" \  -F speed="1" \  -F format="flac" \  -F sample_rate="24000"

{
  "data": {
    "request_id": "c08a339c-73e5-4d67-a4d5-231302fbff9a"
  }
}

{
  "data": {},
  "message": "string",
  "errors": [
    null
  ],
  "statusCode": 0
}

{
  "data": {},
  "message": "string",
  "errors": [
    null
  ],
  "statusCode": 0
}

{
  "message": "The selected model does not support Text To Image.",
  "errors": {
    "model": [
      "The selected model does not support Text To Image."
    ]
  }
}

{
  "message": "Too Many Attempts."
}

Last updated on 21 March 2026

Text-to-Speech (TTS)

OpenAPI

Request Txt2 Audio

Authorization

Header Parameters

Request Body

Response Body

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

Text-to-Speech (TTS)

OpenAPI

Request Txt2 Audio

Authorization

Header Parameters

Request Body

Response Body

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

Text-to-Speech (TTS)

200application/json

401application/json

404application/json

422application/json

429application/json

Text-to-Speech (TTS)

200application/json

401application/json

404application/json

422application/json

429application/json

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`

`application/json`