Rate Limits

Overview

Electron Hub implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits are measured in Requests Per Minute (RPM) and vary by subscription plan.

You can check your current rate limits and usage anytime in the API Keys section of your dashboard.

Rate Limit Tiers

Subscription Plan	Requests per minute (RPM)	Best For
Free	5 RPM	Testing and prototypes
Starter	10 RPM	Small applications
Plus	15 RPM	Growing projects
Core	20 RPM	Professional use
Pro	30 RPM	Production applications
Business	60 RPM	High-volume usage
Enterprise	100+ RPM	Custom enterprise needs

How Rate Limiting Works

Request Counting

Each API call counts as one request regardless of model or complexity
Both successful and failed requests count toward your limit
Streaming requests count as a single request when initiated

Time Windows

Rate limits are calculated using a sliding window of 60 seconds
If you exceed your limit, you’ll receive a 429 error
The limit resets continuously, not at fixed intervals
Important: When you exceed your rate limit, there’s a 3-minute cooldown period before you can make requests again

Cooldown Period: If you exceed your rate limit, you must wait 3 minutes (180 seconds) before making any new requests. This cooldown is enforced even if your normal rate limit window would have reset.

Rate Limit Headers

Every API response includes headers showing your current status:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 15
X-RateLimit-Reset: 1640995200

Handling Rate Limits

Error Responses

When you encounter rate limits, you’ll receive different error codes depending on the type of limit: Rate Limit Exceeded (429)

{
  "error": {
    "message": "Rate limit exceeded. Try again in 180 seconds.",
    "type": "error",
    "param": null,
    "code": 429
  }
}

IP Address Limit (403)

{
  "error": {
    "message": "IP address limit reached. Maximum 3 API keys for IP 192.168.1.1. Please upgrade to a premium plan to bypass this limit.",
    "type": "error",
    "param": null,
    "code": 403
  }
}

Insufficient Balance (402)

{
  "error": {
    "message": "Insufficient balance.\n\nPlease wait for the next daily reset or purchase more tokens in Electron Hub.",
    "type": "error",
    "param": null,
    "code": 402
  }
}

Proxy Key Limit Exceeded (402)

{
  "error": {
    "message": "Exceeded allocated amount for this key.",
    "type": "error",
    "param": null,
    "code": 402
  }
}

Special Model Limits

Some models have additional daily limits beyond the standard rate limits:

Google Models (Daily Limits)

Google experimental and preview models have daily usage limits that reset at 21:00 UTC:

Subscription Plan	Daily Requests	Models Affected
Free	50	Experimental/Preview models only
Starter	50	Experimental/Preview models only
Plus	75	Experimental/Preview models only
Core	100	Experimental/Preview models only
Pro	150	Experimental/Preview models only
Business	300	Experimental/Preview models only
Enterprise	500	Experimental/Preview models only

Experimental/Preview models include models with -exp or -preview in their names, such as:

gemini-1.5-flash-exp
gemini-2.0-flash-exp
gemini-2.0-flash-thinking-exp-1219
gemini-2.5-flash-preview-05-20
gemini-2.5-pro-preview-05-06

Regular Google models (like gemini-1.5-flash, gemini-1.5-pro, gemma-3-27b-it) are not subject to these daily limits.

Midjourney Models (Daily Limits)

Midjourney image generation has a separate daily limit:

Daily Limit: 50 requests per day
Reset Time: 21:00 UTC
Applies to: All Midjourney image generation endpoints

Implementing Retry Logic

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.electronhub.ai/v1",
    api_key="ek-your-api-key"
)

def make_request_with_retry(max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Hello!"}]
            )
            return response
            
        except RateLimitError:
            if attempt < max_retries - 1:
                # Exponential backoff with jitter
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Waiting {wait_time:.2f} seconds...")
                time.sleep(wait_time)
            else:
                raise
    
    return None

# Usage
response = make_request_with_retry()

Request Queuing

For applications with variable load, implement a request queue:

import asyncio
import time
from typing import List, Callable

class RateLimitedClient:
    def __init__(self, client, rpm_limit: int):
        self.client = client
        self.rpm_limit = rpm_limit
        self.request_times: List[float] = []
        self.lock = asyncio.Lock()
    
    async def make_request(self, request_func: Callable):
        async with self.lock:
            now = time.time()
            
            # Remove requests older than 60 seconds
            cutoff = now - 60
            self.request_times = [t for t in self.request_times if t > cutoff]
            
            # Check if we need to wait
            if len(self.request_times) >= self.rpm_limit:
                wait_time = 60 - (now - self.request_times[0])
                if wait_time > 0:
                    await asyncio.sleep(wait_time)
                    now = time.time()
            
            # Make the request
            self.request_times.append(now)
            return await request_func()

# Usage
rate_limited_client = RateLimitedClient(client, rpm_limit=20)

async def my_request():
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

response = await rate_limited_client.make_request(my_request)

Optimization Strategies

1. Batch Processing

Group multiple operations when possible:

# Instead of multiple separate requests
results = []
for text in texts:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Analyze: {text}"}]
    )
    results.append(response)

# Combine into fewer requests
combined_text = "\n---\n".join(texts)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user", 
        "content": f"Analyze each section:\n{combined_text}"
    }]
)

2. Efficient Model Selection

Choose the right model for your task:

GPT-3.5-turbo: Fast and cost-effective for simple tasks
GPT-4o: Best for complex reasoning and analysis
Claude-3-haiku: Fastest for quick responses
Claude-3-5-sonnet: Balanced performance and capability

3. Request Optimization

Optimize your requests to reduce unnecessary calls:

# Use system messages efficiently
system_message = "You are a helpful assistant. Always respond concisely."

# Reuse conversations instead of starting fresh
conversation = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": "First question"},
]

# Add to conversation instead of new requests
conversation.append({"role": "assistant", "content": previous_response})
conversation.append({"role": "user", "content": "Follow-up question"})

4. Caching Results

Cache responses for repeated queries:

import hashlib
import json
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_completion(model, messages_hash, **kwargs):
    return client.chat.completions.create(
        model=model,
        messages=json.loads(messages_hash),
        **kwargs
    )

def make_cached_request(model, messages, **kwargs):
    # Create hash of messages for caching
    messages_str = json.dumps(messages, sort_keys=True)
    messages_hash = hashlib.md5(messages_str.encode()).hexdigest()
    
    return cached_completion(model, messages_hash, **kwargs)

Monitoring Usage

Check Current Usage

Monitor your API usage programmatically:

def check_rate_limit_status(response):
    headers = response.response.headers
    limit = int(headers.get('x-ratelimit-limit', 0))
    remaining = int(headers.get('x-ratelimit-remaining', 0))
    reset_time = int(headers.get('x-ratelimit-reset', 0))
    
    print(f"Rate Limit: {limit}")
    print(f"Remaining: {remaining}")
    print(f"Resets at: {reset_time}")
    
    if remaining < 5:
        print("⚠️  Warning: Approaching rate limit!")

Dashboard Monitoring

Use the dashboard to:

View real-time usage statistics
Set up usage alerts
Monitor trends over time
Identify peak usage periods

Upgrading Your Plan

When to Upgrade

Consider upgrading when you:

Consistently hit rate limits
Need higher throughput for production
Want to reduce latency from queuing
Require dedicated support

How to Upgrade

Visit your dashboard
Go to “Billing”
Select your desired plan
Complete the upgrade process
New limits take effect immediately

Best Practices

Implement Graceful Degradation

Design your application to handle rate limits gracefully:

Show loading states during retries
Provide fallback responses when possible
Queue non-urgent requests for later processing
Inform users about temporary delays

Monitor and Alert

Set up monitoring for rate limit issues:

Track 429 error rates
Alert when approaching limits
Monitor request patterns
Analyze peak usage times

Optimize Request Patterns

Structure your requests efficiently:

Batch similar operations
Use appropriate models for tasks
Cache common responses
Implement smart retry logic

Plan for Growth

Prepare for scaling:

Start with higher limits before launch
Monitor usage trends
Have upgrade path ready
Consider enterprise plans for large scale

Next Steps

Error Handling

Learn how to handle and debug API errors effectively

Best Practices

Discover advanced optimization techniques

Getting Started

API Reference

Guides

Examples

Overview

Rate Limit Tiers

How Rate Limiting Works

Request Counting

Time Windows

Rate Limit Headers

Handling Rate Limits

Error Responses

Special Model Limits

Google Models (Daily Limits)

Midjourney Models (Daily Limits)

Implementing Retry Logic

Request Queuing

Optimization Strategies

1. Batch Processing

2. Efficient Model Selection

3. Request Optimization

4. Caching Results

Monitoring Usage

Check Current Usage

Dashboard Monitoring

Upgrading Your Plan

When to Upgrade

How to Upgrade

Best Practices

Next Steps

Error Handling

Best Practices

Getting Started

API Reference

Guides

Examples

​Overview

​Rate Limit Tiers

​How Rate Limiting Works

​Request Counting

​Time Windows

​Rate Limit Headers

​Handling Rate Limits

​Error Responses

​Special Model Limits

​Google Models (Daily Limits)

​Midjourney Models (Daily Limits)

​Implementing Retry Logic

​Request Queuing

​Optimization Strategies

​1. Batch Processing

​2. Efficient Model Selection

​3. Request Optimization

​4. Caching Results

​Monitoring Usage

​Check Current Usage

​Dashboard Monitoring

​Upgrading Your Plan

​When to Upgrade

​How to Upgrade

​Best Practices

​Next Steps

Error Handling

Best Practices

Overview

Rate Limit Tiers

How Rate Limiting Works

Request Counting

Time Windows

Rate Limit Headers

Handling Rate Limits

Error Responses

Special Model Limits

Google Models (Daily Limits)

Midjourney Models (Daily Limits)

Implementing Retry Logic

Request Queuing

Optimization Strategies

1. Batch Processing

2. Efficient Model Selection

3. Request Optimization

4. Caching Results

Monitoring Usage

Check Current Usage

Dashboard Monitoring

Upgrading Your Plan

When to Upgrade

How to Upgrade

Best Practices

Next Steps