Generate high-quality speech audio from text using our collection of state-of-the-art text-to-speech models from leading AI providers.
Create Speech
POST /audio/speech
Convert text to natural-sounding speech audio.
Request Body
The TTS model to use for speech generation (e.g., “tts-1”, “tts-1-hd”, “elevenlabs”)
Text Input Parameters
The text to convert to speech (1-4096 characters)
Voice Parameters
Voice name or ID for speech generation
Common Parameters
The audio format for the generated speech (“mp3”, “opus”, “aac”, “flac”, “wav”, “pcm”)
The speed of the generated audio (0.25 to 4.0 for most models, default: 1.0)
Temperature for randomness in speech generation (0.0 to 2.0)
Advanced Parameters
GPT-4o Mini TTS : Additional instructions to control voice characteristics
Dia model : Speaker transcript for enhanced voice control (max 1000 chars)
Dia model : CFG filter top k value (15-50)
Dia model : CFG scale value for generation control (1-5)
Microsoft TTS : Speech rate adjustment (-100 to 100, default: 0)
Microsoft TTS : Pitch adjustment (-100 to 100, default: 0)
Microsoft TTS : Emotional style (e.g., “cheerful”, “sad”, “angry”)
Response
Returns an audio file in the specified format.
Basic Example
const response = await fetch ( 'https://api.electronhub.ai/v1/audio/speech' , {
method: 'POST' ,
headers: {
'Authorization' : 'Bearer YOUR_API_KEY' ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
model: 'tts-1' ,
input: 'Hello! Welcome to Electron Hub text-to-speech.' ,
voice: 'alloy' ,
response_format: 'mp3'
})
});
const audioBuffer = await response . arrayBuffer ();
const fs = require ( 'fs' );
fs . writeFileSync ( 'speech.mp3' , Buffer . from ( audioBuffer ));
Provider-Specific Examples
OpenAI Models (TTS-1, TTS-1 HD, GPT-4o Mini TTS)
OpenAI Standard
GPT-4o Mini TTS with Instructions
{
"model" : "tts-1" ,
"input" : "Hello, world!" ,
"voice" : "alloy" ,
"speed" : 1.0
}
ElevenLabs Models
ElevenLabs
Python - ElevenLabs
{
"model" : "elevenlabs" ,
"input" : "Hello, this is ElevenLabs TTS." ,
"voice" : "Will (US male)"
}
Kokoro 82M Model
{
"model" : "kokoro-82m" ,
"input" : "Hello from Kokoro TTS!" ,
"voice" : "af_alloy" ,
"speed" : 1.2
}
NariLabs Dia Model (Advanced)
{
"model" : "dia-1.6b" ,
"input" : "Hello, this is a conversational speech example." ,
"speaker_transcript" : "Speaking in a calm, professional tone" ,
"cfg_scale" : 3 ,
"cfg_filter_top_k" : 25 ,
"temperature" : 1.2 ,
"speed" : 0.9
}
MeloTTS Multilingual
MeloTTS
Python - MeloTTS Languages
{
"model" : "melotts" ,
"input" : "Bonjour, comment allez-vous?" ,
"voice" : "fr"
}
PlayAI Dialog Models
PlayAI
Python - PlayAI Arabic
{
"model" : "playai-tts" ,
"input" : "This is a conversational AI speaking." ,
"voice" : "Celeste-PlayAI" ,
"temperature" : 0.8
}
Microsoft TTS
Microsoft Basic
Microsoft Advanced
Python - Microsoft Multilingual
Python - Microsoft Emotional
{
"model" : "microsoft-tts" ,
"input" : "Hello, this is Microsoft Azure Text-to-Speech." ,
"voice" : "en-US-JennyNeural"
}
Available Models
OpenAI Models
TTS-1 (tts-1)
Optimized for real-time text-to-speech
Cost-effective for most applications
11 available voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse
TTS-1 HD (tts-1-hd)
High-quality text-to-speech
Best audio quality with slower generation
Same 11 voices as TTS-1
GPT-4o Mini TTS (gpt-4o-mini-tts)
Advanced TTS with instruction following
Supports voice control via instructions
Same 11 voices as TTS-1
Premium Models
ElevenLabs (elevenlabs)
Ultra-realistic human-like voices
40+ multilingual voices
Supports English, Spanish, French, German, Arabic, Chinese, Hindi, Polish
PlayAI Dialog (playai-tts)
Specialized for conversational content
27 expressive voices
Optimized for dialogue and storytelling
Specialized Models
Kokoro 82M (kokoro-82m)
Lightweight but high-quality
80+ voices in multiple languages
Open-source Apache-licensed model
Microsoft TTS (microsoft-tts)
Enterprise-grade quality
100+ neural voices
Extensive language support
Voice Examples
OpenAI Voices
// Available voices for OpenAI models
const openaiVoices = [
'alloy' , // Neutral, balanced
'echo' , // Clear, professional
'fable' , // Warm, storytelling
'onyx' , // Deep, authoritative
'nova' , // Bright, energetic
'shimmer' // Gentle, soothing
];
ElevenLabs Voices
// Sample of ElevenLabs voices by language
const elevenlabsVoices = {
english: [ 'Will (US male)' , 'Jessica (US female)' , 'George (UK male)' , 'Lily (UK female)' ],
spanish: [ 'Juan (Spanish male)' , 'Gabriela (Spanish female)' ],
french: [ 'Guillaume (French male)' , 'Darine (French female)' ],
german: [ 'Kurt (German male)' , 'Leonie (German female)' ]
};
// Use with voice parameter and text input
{
"model" : "elevenlabs" ,
"input" : "Hello world" ,
"voice" : "Will (US male)"
}
Microsoft TTS Voices
// Sample of Microsoft Azure Neural voices
const microsoftVoices = {
english: [
'en-US-JennyNeural' , // Friendly female
'en-US-GuyNeural' , // Casual male
'en-US-AriaNeural' , // News anchor style
'en-US-DavisNeural' , // Professional male
'en-GB-SoniaNeural' , // British female
'en-AU-NatashaNeural' // Australian female
],
chinese: [
'zh-CN-XiaoxiaoNeural' , // Standard female
'zh-CN-YunyangNeural' , // Professional male
'zh-HK-HiuMaanNeural' , // Hong Kong Cantonese
'zh-TW-HsiaoChenNeural' // Taiwan Mandarin
],
multilingual: [
'es-ES-ElviraNeural' , // Spanish
'fr-FR-DeniseNeural' , // French
'de-DE-KatjaNeural' , // German
'ja-JP-NanamiNeural' , // Japanese
'ko-KR-SunHiNeural' // Korean
]
};
// Use with text input and advanced controls
{
"model" : "microsoft-tts" ,
"input" : "Hello world" ,
"voice" : "en-US-AriaNeural" ,
"emotional_style" : "cheerful" ,
"speech_rate" : 10
}
Model-Specific Parameters
Provider Special Parameters Usage GPT-4o Mini TTS instructionsNatural language voice control Dia 1.6B speaker_transcript, cfg_scale, cfg_filter_top_kAdvanced voice conditioning Microsoft TTS speech_rate, pitch_adjustment, emotional_styleVoice modulation and emotions MeloTTS langLanguage selection (en, fr, es, etc.) All Models speed, temperature, top_pCommon generation controls
Advanced Features
Speed Control
Adjust playback speed for different use cases:
{
"model" : "tts-1" ,
"input" : "This text will be spoken faster." ,
"voice" : "alloy" ,
"speed" : 1.5
}
Choose the optimal format for your application:
MP3 : Standard, widely compatible
WAV : Uncompressed, highest quality
OGG/Opus : Efficient compression
FLAC : Lossless compression
AAC : Good balance of quality and size
Instruction-Based Control (GPT-4o Mini TTS)
Control voice characteristics with natural language:
{
"model" : "gpt-4o-mini-tts" ,
"input" : "Welcome to our store!" ,
"voice" : "alloy" ,
"instructions" : "Speak with enthusiasm and excitement, like a friendly shopkeeper"
}
Microsoft TTS Advanced Controls
Microsoft TTS offers fine-grained control over speech characteristics:
Speech Rate Control
{
"model" : "microsoft-tts" ,
"input" : "This text will be spoken faster." ,
"voice" : "en-US-JennyNeural" ,
"speech_rate" : 50
}
Pitch Adjustment
{
"model" : "microsoft-tts" ,
"input" : "This text has a higher pitch." ,
"voice" : "en-US-AriaNeural" ,
"pitch_adjustment" : 25
}
Emotional Styles
Different voices support different emotional styles:
{
"model" : "microsoft-tts" ,
"input" : "I'm so excited about this news!" ,
"voice" : "en-US-AriaNeural" ,
"emotional_style" : "cheerful" ,
"speech_rate" : 10
}
Available Emotional Styles (voice-dependent):
cheerful - Happy and upbeat
sad - Melancholic tone
angry - Frustrated or upset
fearful - Nervous or scared
calm - Relaxed and peaceful
gentle - Soft and caring
newscast - Professional news anchor
customerservice - Helpful and polite
Best Practices
Text Optimization
Use clear punctuation for natural pauses
Spell out numbers and abbreviations
Use SSML tags for fine-grained control (model-dependent)
Voice Selection
Customer Service : Professional voices (echo, George)
Storytelling : Warm voices (fable, nova)
Educational : Clear voices (alloy, shimmer)
Gaming : Character voices (onyx, sage)
Cache generated audio when possible
Use appropriate audio formats for your platform
Consider real-time vs. high-quality models based on use case
Error Handling
Common error scenarios:
try {
const response = await fetch ( '/audio/speech' , {
method: 'POST' ,
headers: {
'Authorization' : 'Bearer YOUR_API_KEY' ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
model: 'tts-1' ,
input: text ,
voice: 'alloy'
})
});
if ( ! response . ok ) {
const error = await response . json ();
console . error ( 'TTS Error:' , error );
}
} catch ( error ) {
console . error ( 'Network Error:' , error );
}
Use Cases
Voice Assistants : Natural conversation interfaces
Audiobooks : Long-form content narration
E-learning : Educational content delivery
Accessibility : Screen reader alternatives
Gaming : Character voice generation
Customer Service : Automated phone systems
Content Creation : Podcast and video narration
Enter your API key (starts with 'ek-')
The TTS model to use for speech generation
The text to convert to speech (OpenAI models)
Required string length: 1 - 4096
Example: "Hello, world! This is a text-to-speech example."
The voice to use for speech generation (OpenAI, Orpheus, PlayAI, ElevenLabs models)
The audio format for the generated speech
Available options:
mp3,
opus,
aac,
flac,
wav,
pcm
The speed of the generated audio
Required range: 0.25 <= x <= 4
Temperature for randomness in speech generation
Required range: 0 <= x <= 2
Top-p value for nucleus sampling
Required range: 0 <= x <= 1
Additional instructions to control voice generation (GPT-4o Mini TTS)
Speaker transcript for Dia model
Maximum length: 1000
CFG filter top k value (Dia model)
Required range: 15 <= x <= 50
CFG scale value (Dia model)
Required range: 1 <= x <= 5
Speech rate adjustment for Microsoft TTS (-100 to 100)
Required range: -100 <= x <= 100
Pitch adjustment for Microsoft TTS (-100 to 100)
Required range: -100 <= x <= 100
Emotional style for Microsoft TTS (e.g., 'cheerful', 'sad', 'angry')
The response is of type file .