Text-to-Speech - Electron Hub API

Generate high-quality speech audio from text using our collection of state-of-the-art text-to-speech models from leading AI providers.

Create Speech

POST /audio/speech Convert text to natural-sounding speech audio.

Request Body

model

string

required

The TTS model to use for speech generation (e.g., “tts-1”, “tts-1-hd”, “elevenlabs”)

Text Input Parameters

input

string

The text to convert to speech (1-4096 characters)

Voice Parameters

voice

string

Voice name or ID for speech generation

Common Parameters

response_format

string

The audio format for the generated speech (“mp3”, “opus”, “aac”, “flac”, “wav”, “pcm”)

speed

number

The speed of the generated audio (0.25 to 4.0 for most models, default: 1.0)

temperature

number

Temperature for randomness in speech generation (0.0 to 2.0)

Advanced Parameters

instructions

string

GPT-4o Mini TTS: Additional instructions to control voice characteristics

speaker_transcript

string

Dia model: Speaker transcript for enhanced voice control (max 1000 chars)

cfg_filter_top_k

integer

Dia model: CFG filter top k value (15-50)

cfg_scale

integer

Dia model: CFG scale value for generation control (1-5)

speech_rate

integer

Microsoft TTS: Speech rate adjustment (-100 to 100, default: 0)

pitch_adjustment

integer

Microsoft TTS: Pitch adjustment (-100 to 100, default: 0)

emotional_style

string

Microsoft TTS: Emotional style (e.g., “cheerful”, “sad”, “angry”)

Response

Returns an audio file in the specified format.

Basic Example

const response = await fetch('https://api.electronhub.ai/v1/audio/speech', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer YOUR_API_KEY',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    model: 'tts-1',
    input: 'Hello! Welcome to Electron Hub text-to-speech.',
    voice: 'alloy',
    response_format: 'mp3'
  })
});

const audioBuffer = await response.arrayBuffer();
const fs = require('fs');
fs.writeFileSync('speech.mp3', Buffer.from(audioBuffer));

Provider-Specific Examples

OpenAI Models (TTS-1, TTS-1 HD, GPT-4o Mini TTS)

{
  "model": "tts-1",
  "input": "Hello, world!",
  "voice": "alloy",
  "speed": 1.0
}

ElevenLabs Models

{
  "model": "elevenlabs",
  "input": "Hello, this is ElevenLabs TTS.",
  "voice": "Will (US male)"
}

Kokoro 82M Model

{
  "model": "kokoro-82m",
  "input": "Hello from Kokoro TTS!",
  "voice": "af_alloy",
  "speed": 1.2
}

NariLabs Dia Model (Advanced)

{
  "model": "dia-1.6b",
  "input": "Hello, this is a conversational speech example.",
  "speaker_transcript": "Speaking in a calm, professional tone",
  "cfg_scale": 3,
  "cfg_filter_top_k": 25,
  "temperature": 1.2,
  "speed": 0.9
}

MeloTTS Multilingual

{
  "model": "melotts",
  "input": "Bonjour, comment allez-vous?",
  "voice": "fr"
}

PlayAI Dialog Models

{
  "model": "playai-tts",
  "input": "This is a conversational AI speaking.",
  "voice": "Celeste-PlayAI",
  "temperature": 0.8
}

Microsoft TTS

{
  "model": "microsoft-tts",
  "input": "Hello, this is Microsoft Azure Text-to-Speech.",
  "voice": "en-US-JennyNeural"
}

Available Models

OpenAI Models

TTS-1 (tts-1)

Optimized for real-time text-to-speech
Cost-effective for most applications
11 available voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse

TTS-1 HD (tts-1-hd)

High-quality text-to-speech
Best audio quality with slower generation
Same 11 voices as TTS-1

GPT-4o Mini TTS (gpt-4o-mini-tts)

Advanced TTS with instruction following
Supports voice control via instructions
Same 11 voices as TTS-1

Premium Models

ElevenLabs (elevenlabs)

Ultra-realistic human-like voices
40+ multilingual voices
Supports English, Spanish, French, German, Arabic, Chinese, Hindi, Polish

PlayAI Dialog (playai-tts)

Specialized for conversational content
27 expressive voices
Optimized for dialogue and storytelling

Specialized Models

Kokoro 82M (kokoro-82m)

Lightweight but high-quality
80+ voices in multiple languages
Open-source Apache-licensed model

Microsoft TTS (microsoft-tts)

Enterprise-grade quality
100+ neural voices
Extensive language support

Voice Examples

OpenAI Voices

// Available voices for OpenAI models
const openaiVoices = [
  'alloy',    // Neutral, balanced
  'echo',     // Clear, professional
  'fable',    // Warm, storytelling
  'onyx',     // Deep, authoritative
  'nova',     // Bright, energetic
  'shimmer'   // Gentle, soothing
];

ElevenLabs Voices

// Sample of ElevenLabs voices by language
const elevenlabsVoices = {
  english: ['Will (US male)', 'Jessica (US female)', 'George (UK male)', 'Lily (UK female)'],
  spanish: ['Juan (Spanish male)', 'Gabriela (Spanish female)'],
  french: ['Guillaume (French male)', 'Darine (French female)'],
  german: ['Kurt (German male)', 'Leonie (German female)']
};

// Use with voice parameter and text input
{
  "model": "elevenlabs",
  "input": "Hello world",
  "voice": "Will (US male)"
}

Microsoft TTS Voices

// Sample of Microsoft Azure Neural voices
const microsoftVoices = {
  english: [
    'en-US-JennyNeural',    // Friendly female
    'en-US-GuyNeural',      // Casual male
    'en-US-AriaNeural',     // News anchor style
    'en-US-DavisNeural',    // Professional male
    'en-GB-SoniaNeural',    // British female
    'en-AU-NatashaNeural'   // Australian female
  ],
  chinese: [
    'zh-CN-XiaoxiaoNeural',  // Standard female
    'zh-CN-YunyangNeural',   // Professional male
    'zh-HK-HiuMaanNeural',   // Hong Kong Cantonese
    'zh-TW-HsiaoChenNeural'  // Taiwan Mandarin
  ],
  multilingual: [
    'es-ES-ElviraNeural',    // Spanish
    'fr-FR-DeniseNeural',    // French
    'de-DE-KatjaNeural',     // German
    'ja-JP-NanamiNeural',    // Japanese
    'ko-KR-SunHiNeural'      // Korean
  ]
};

// Use with text input and advanced controls
{
  "model": "microsoft-tts",
  "input": "Hello world",
  "voice": "en-US-AriaNeural",
  "emotional_style": "cheerful",
  "speech_rate": 10
}

Model-Specific Parameters

Provider	Special Parameters	Usage
GPT-4o Mini TTS	`instructions`	Natural language voice control
Dia 1.6B	`speaker_transcript`, `cfg_scale`, `cfg_filter_top_k`	Advanced voice conditioning
Microsoft TTS	`speech_rate`, `pitch_adjustment`, `emotional_style`	Voice modulation and emotions
MeloTTS	`lang`	Language selection (en, fr, es, etc.)
All Models	`speed`, `temperature`, `top_p`	Common generation controls

Advanced Features

Speed Control

Adjust playback speed for different use cases:

{
  "model": "tts-1",
  "input": "This text will be spoken faster.",
  "voice": "alloy",
  "speed": 1.5
}

Audio Formats

Choose the optimal format for your application:

MP3: Standard, widely compatible
WAV: Uncompressed, highest quality
OGG/Opus: Efficient compression
FLAC: Lossless compression
AAC: Good balance of quality and size

Instruction-Based Control (GPT-4o Mini TTS)

Control voice characteristics with natural language:

{
  "model": "gpt-4o-mini-tts",
  "input": "Welcome to our store!",
  "voice": "alloy",
  "instructions": "Speak with enthusiasm and excitement, like a friendly shopkeeper"
}

Microsoft TTS Advanced Controls

Microsoft TTS offers fine-grained control over speech characteristics:

Speech Rate Control

{
  "model": "microsoft-tts",
  "input": "This text will be spoken faster.",
  "voice": "en-US-JennyNeural",
  "speech_rate": 50
}

Pitch Adjustment

{
  "model": "microsoft-tts",
  "input": "This text has a higher pitch.",
  "voice": "en-US-AriaNeural",
  "pitch_adjustment": 25
}

Emotional Styles

Different voices support different emotional styles:

{
  "model": "microsoft-tts",
  "input": "I'm so excited about this news!",
  "voice": "en-US-AriaNeural",
  "emotional_style": "cheerful",
  "speech_rate": 10
}

Available Emotional Styles (voice-dependent):

cheerful - Happy and upbeat
sad - Melancholic tone
angry - Frustrated or upset
fearful - Nervous or scared
calm - Relaxed and peaceful
gentle - Soft and caring
newscast - Professional news anchor
customerservice - Helpful and polite

Best Practices

Text Optimization

Use clear punctuation for natural pauses
Spell out numbers and abbreviations
Use SSML tags for fine-grained control (model-dependent)

Voice Selection

Customer Service: Professional voices (echo, George)
Storytelling: Warm voices (fable, nova)
Educational: Clear voices (alloy, shimmer)
Gaming: Character voices (onyx, sage)

Performance Tips

Cache generated audio when possible
Use appropriate audio formats for your platform
Consider real-time vs. high-quality models based on use case

Error Handling

Common error scenarios:

try {
  const response = await fetch('/audio/speech', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      model: 'tts-1',
      input: text,
      voice: 'alloy'
    })
  });

  if (!response.ok) {
    const error = await response.json();
    console.error('TTS Error:', error);
  }
} catch (error) {
  console.error('Network Error:', error);
}

Use Cases

Voice Assistants: Natural conversation interfaces
Audiobooks: Long-form content narration
E-learning: Educational content delivery
Accessibility: Screen reader alternatives
Gaming: Character voice generation
Customer Service: Automated phone systems
Content Creation: Podcast and video narration

Authorizations

Authorization

string

header

required

Enter your API key (starts with 'ek-')

Body

application/json

Response

200

audio/mpeg

Audio file

The response is of type file.

Getting Started

API Reference

Guides

Examples

​Create Speech

​Request Body

​Text Input Parameters

​Voice Parameters

​Common Parameters

​Advanced Parameters

​Response

​Basic Example

​Provider-Specific Examples

​OpenAI Models (TTS-1, TTS-1 HD, GPT-4o Mini TTS)

​ElevenLabs Models

​Kokoro 82M Model

​NariLabs Dia Model (Advanced)

​MeloTTS Multilingual

​PlayAI Dialog Models

​Microsoft TTS

​Available Models

​OpenAI Models

​Premium Models

​Specialized Models

​Voice Examples

​OpenAI Voices

​ElevenLabs Voices

​Microsoft TTS Voices

​Model-Specific Parameters

​Advanced Features

​Speed Control

​Audio Formats

​Instruction-Based Control (GPT-4o Mini TTS)

​Microsoft TTS Advanced Controls

​Speech Rate Control

​Pitch Adjustment

​Emotional Styles

​Best Practices

​Text Optimization

​Voice Selection

​Performance Tips

​Error Handling

​Use Cases

Authorizations

Body

Response

Create Speech

Request Body

Text Input Parameters

Voice Parameters

Common Parameters

Advanced Parameters

Response

Basic Example

Provider-Specific Examples

OpenAI Models (TTS-1, TTS-1 HD, GPT-4o Mini TTS)

ElevenLabs Models

Kokoro 82M Model

NariLabs Dia Model (Advanced)

MeloTTS Multilingual

PlayAI Dialog Models

Microsoft TTS

Available Models

OpenAI Models

Premium Models

Specialized Models

Voice Examples

OpenAI Voices

ElevenLabs Voices

Microsoft TTS Voices

Model-Specific Parameters

Advanced Features

Speed Control

Audio Formats

Instruction-Based Control (GPT-4o Mini TTS)

Microsoft TTS Advanced Controls

Speech Rate Control

Pitch Adjustment

Emotional Styles

Best Practices

Text Optimization

Voice Selection

Performance Tips

Error Handling

Use Cases