POST
/
audio
/
speech
curl --request POST \
  --url https://api.electronhub.ai/v1/audio/speech \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "tts-1",
  "input": "Hello, world! This is a text-to-speech example.",
  "voice": "alloy",
  "response_format": "mp3",
  "speed": 1,
  "temperature": 1,
  "top_p": 0.5,
  "instructions": "<string>",
  "speaker_transcript": "<string>",
  "cfg_filter_top_k": 32,
  "cfg_scale": 3,
  "speech_rate": 0,
  "pitch_adjustment": 0,
  "emotional_style": "<string>"
}'
This response does not have an example.

Generate high-quality speech audio from text using our collection of state-of-the-art text-to-speech models from leading AI providers.

Create Speech

POST /audio/speech

Convert text to natural-sounding speech audio.

Request Body

model
string
required

The TTS model to use for speech generation (e.g., “tts-1”, “tts-1-hd”, “elevenlabs”)

Text Input Parameters

input
string

The text to convert to speech (1-4096 characters)

Voice Parameters

voice
string

Voice name or ID for speech generation

Common Parameters

response_format
string

The audio format for the generated speech (“mp3”, “opus”, “aac”, “flac”, “wav”, “pcm”)

speed
number

The speed of the generated audio (0.25 to 4.0 for most models, default: 1.0)

temperature
number

Temperature for randomness in speech generation (0.0 to 2.0)

Advanced Parameters

instructions
string

GPT-4o Mini TTS: Additional instructions to control voice characteristics

speaker_transcript
string

Dia model: Speaker transcript for enhanced voice control (max 1000 chars)

cfg_filter_top_k
integer

Dia model: CFG filter top k value (15-50)

cfg_scale
integer

Dia model: CFG scale value for generation control (1-5)

speech_rate
integer

Microsoft TTS: Speech rate adjustment (-100 to 100, default: 0)

pitch_adjustment
integer

Microsoft TTS: Pitch adjustment (-100 to 100, default: 0)

emotional_style
string

Microsoft TTS: Emotional style (e.g., “cheerful”, “sad”, “angry”)

Response

Returns an audio file in the specified format.

Basic Example

const response = await fetch('https://api.electronhub.ai/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'tts-1',
    input: 'Hello! Welcome to Electron Hub text-to-speech.',
    voice: 'alloy',
    response_format: 'mp3'
  })
});

const audioBuffer = await response.arrayBuffer();
const fs = require('fs');
fs.writeFileSync('speech.mp3', Buffer.from(audioBuffer));

Provider-Specific Examples

OpenAI Models (TTS-1, TTS-1 HD, GPT-4o Mini TTS)

{
  "model": "tts-1",
  "input": "Hello, world!",
  "voice": "alloy",
  "speed": 1.0
}

ElevenLabs Models

{
  "model": "elevenlabs",
  "input": "Hello, this is ElevenLabs TTS.",
  "voice": "Will (US male)"
}

Kokoro 82M Model

{
  "model": "kokoro-82m",
  "input": "Hello from Kokoro TTS!",
  "voice": "af_alloy",
  "speed": 1.2
}

NariLabs Dia Model (Advanced)

{
  "model": "dia-1.6b",
  "input": "Hello, this is a conversational speech example.",
  "speaker_transcript": "Speaking in a calm, professional tone",
  "cfg_scale": 3,
  "cfg_filter_top_k": 25,
  "temperature": 1.2,
  "speed": 0.9
}

MeloTTS Multilingual

{
  "model": "melotts",
  "input": "Bonjour, comment allez-vous?",
  "voice": "fr"
}

PlayAI Dialog Models

{
  "model": "playai-tts",
  "input": "This is a conversational AI speaking.",
  "voice": "Celeste-PlayAI",
  "temperature": 0.8
}

Microsoft TTS

{
  "model": "microsoft-tts",
  "input": "Hello, this is Microsoft Azure Text-to-Speech.",
  "voice": "en-US-JennyNeural"
}

Available Models

OpenAI Models

TTS-1 (tts-1)

  • Optimized for real-time text-to-speech
  • Cost-effective for most applications
  • 11 available voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

TTS-1 HD (tts-1-hd)

  • High-quality text-to-speech
  • Best audio quality with slower generation
  • Same 11 voices as TTS-1

GPT-4o Mini TTS (gpt-4o-mini-tts)

  • Advanced TTS with instruction following
  • Supports voice control via instructions
  • Same 11 voices as TTS-1

Premium Models

ElevenLabs (elevenlabs)

  • Ultra-realistic human-like voices
  • 40+ multilingual voices
  • Supports English, Spanish, French, German, Arabic, Chinese, Hindi, Polish

PlayAI Dialog (playai-tts)

  • Specialized for conversational content
  • 27 expressive voices
  • Optimized for dialogue and storytelling

Specialized Models

Kokoro 82M (kokoro-82m)

  • Lightweight but high-quality
  • 80+ voices in multiple languages
  • Open-source Apache-licensed model

Microsoft TTS (microsoft-tts)

  • Enterprise-grade quality
  • 100+ neural voices
  • Extensive language support

Voice Examples

OpenAI Voices

// Available voices for OpenAI models
const openaiVoices = [
  'alloy',    // Neutral, balanced
  'echo',     // Clear, professional
  'fable',    // Warm, storytelling
  'onyx',     // Deep, authoritative
  'nova',     // Bright, energetic
  'shimmer'   // Gentle, soothing
];

ElevenLabs Voices

// Sample of ElevenLabs voices by language
const elevenlabsVoices = {
  english: ['Will (US male)', 'Jessica (US female)', 'George (UK male)', 'Lily (UK female)'],
  spanish: ['Juan (Spanish male)', 'Gabriela (Spanish female)'],
  french: ['Guillaume (French male)', 'Darine (French female)'],
  german: ['Kurt (German male)', 'Leonie (German female)']
};

// Use with voice parameter and text input
{
  "model": "elevenlabs",
  "input": "Hello world",
  "voice": "Will (US male)"
}

Microsoft TTS Voices

// Sample of Microsoft Azure Neural voices
const microsoftVoices = {
  english: [
    'en-US-JennyNeural',    // Friendly female
    'en-US-GuyNeural',      // Casual male
    'en-US-AriaNeural',     // News anchor style
    'en-US-DavisNeural',    // Professional male
    'en-GB-SoniaNeural',    // British female
    'en-AU-NatashaNeural'   // Australian female
  ],
  chinese: [
    'zh-CN-XiaoxiaoNeural',  // Standard female
    'zh-CN-YunyangNeural',   // Professional male
    'zh-HK-HiuMaanNeural',   // Hong Kong Cantonese
    'zh-TW-HsiaoChenNeural'  // Taiwan Mandarin
  ],
  multilingual: [
    'es-ES-ElviraNeural',    // Spanish
    'fr-FR-DeniseNeural',    // French
    'de-DE-KatjaNeural',     // German
    'ja-JP-NanamiNeural',    // Japanese
    'ko-KR-SunHiNeural'      // Korean
  ]
};

// Use with text input and advanced controls
{
  "model": "microsoft-tts",
  "input": "Hello world",
  "voice": "en-US-AriaNeural",
  "emotional_style": "cheerful",
  "speech_rate": 10
}

Model-Specific Parameters

ProviderSpecial ParametersUsage
GPT-4o Mini TTSinstructionsNatural language voice control
Dia 1.6Bspeaker_transcript, cfg_scale, cfg_filter_top_kAdvanced voice conditioning
Microsoft TTSspeech_rate, pitch_adjustment, emotional_styleVoice modulation and emotions
MeloTTSlangLanguage selection (en, fr, es, etc.)
All Modelsspeed, temperature, top_pCommon generation controls

Advanced Features

Speed Control

Adjust playback speed for different use cases:

{
  "model": "tts-1",
  "input": "This text will be spoken faster.",
  "voice": "alloy",
  "speed": 1.5
}

Audio Formats

Choose the optimal format for your application:

  • MP3: Standard, widely compatible
  • WAV: Uncompressed, highest quality
  • OGG/Opus: Efficient compression
  • FLAC: Lossless compression
  • AAC: Good balance of quality and size

Instruction-Based Control (GPT-4o Mini TTS)

Control voice characteristics with natural language:

{
  "model": "gpt-4o-mini-tts",
  "input": "Welcome to our store!",
  "voice": "alloy",
  "instructions": "Speak with enthusiasm and excitement, like a friendly shopkeeper"
}

Microsoft TTS Advanced Controls

Microsoft TTS offers fine-grained control over speech characteristics:

Speech Rate Control

{
  "model": "microsoft-tts",
  "input": "This text will be spoken faster.",
  "voice": "en-US-JennyNeural",
  "speech_rate": 50
}

Pitch Adjustment

{
  "model": "microsoft-tts",
  "input": "This text has a higher pitch.",
  "voice": "en-US-AriaNeural",
  "pitch_adjustment": 25
}

Emotional Styles

Different voices support different emotional styles:

{
  "model": "microsoft-tts",
  "input": "I'm so excited about this news!",
  "voice": "en-US-AriaNeural",
  "emotional_style": "cheerful",
  "speech_rate": 10
}

Available Emotional Styles (voice-dependent):

  • cheerful - Happy and upbeat
  • sad - Melancholic tone
  • angry - Frustrated or upset
  • fearful - Nervous or scared
  • calm - Relaxed and peaceful
  • gentle - Soft and caring
  • newscast - Professional news anchor
  • customerservice - Helpful and polite

Best Practices

Text Optimization

  • Use clear punctuation for natural pauses
  • Spell out numbers and abbreviations
  • Use SSML tags for fine-grained control (model-dependent)

Voice Selection

  • Customer Service: Professional voices (echo, George)
  • Storytelling: Warm voices (fable, nova)
  • Educational: Clear voices (alloy, shimmer)
  • Gaming: Character voices (onyx, sage)

Performance Tips

  • Cache generated audio when possible
  • Use appropriate audio formats for your platform
  • Consider real-time vs. high-quality models based on use case

Error Handling

Common error scenarios:

try {
  const response = await fetch('/audio/speech', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'tts-1',
      input: text,
      voice: 'alloy'
    })
  });

  if (!response.ok) {
    const error = await response.json();
    console.error('TTS Error:', error);
  }
} catch (error) {
  console.error('Network Error:', error);
}

Use Cases

  • Voice Assistants: Natural conversation interfaces
  • Audiobooks: Long-form content narration
  • E-learning: Educational content delivery
  • Accessibility: Screen reader alternatives
  • Gaming: Character voice generation
  • Customer Service: Automated phone systems
  • Content Creation: Podcast and video narration

Authorizations

Authorization
string
header
required

Enter your API key (starts with 'ek-')

Body

application/json

Response

200
audio/mpeg

Audio file

The response is of type file.