Overview

Electron Hub implements rate limiting to ensure fair usage and maintain service quality for all users. Rate limits are measured in Requests Per Minute (RPM) and vary by subscription plan.

You can check your current rate limits and usage anytime in the API Keys section of your dashboard.

Rate Limit Tiers

Subscription PlanRequests per minute (RPM)Best For
Free7 RPMTesting and prototypes
Starter10 RPMSmall applications
Plus15 RPMGrowing projects
Pro20 RPMProduction applications
Business40 RPMHigh-volume usage
Enterprise60+ RPMCustom enterprise needs

How Rate Limiting Works

Request Counting

  • Each API call counts as one request regardless of model or complexity
  • Both successful and failed requests count toward your limit
  • Streaming requests count as a single request when initiated

Time Windows

  • Rate limits are calculated using a sliding window of 60 seconds
  • If you exceed your limit, you’ll receive a 429 error
  • The limit resets continuously, not at fixed intervals
  • Important: When you exceed your rate limit, there’s a 3-minute cooldown period before you can make requests again

Cooldown Period: If you exceed your rate limit, you must wait 3 minutes (180 seconds) before making any new requests. This cooldown is enforced even if your normal rate limit window would have reset.

Rate Limit Headers

Every API response includes headers showing your current status:

X-RateLimit-Limit: 20
X-RateLimit-Remaining: 15
X-RateLimit-Reset: 1640995200

Handling Rate Limits

Error Responses

When you encounter rate limits, you’ll receive different error codes depending on the type of limit:

Rate Limit Exceeded (429)

{
  "error": {
    "message": "Rate limit exceeded. Try again in 180 seconds.",
    "type": "error",
    "param": null,
    "code": 429
  }
}

IP Address Limit (403)

{
  "error": {
    "message": "IP address limit reached. Maximum 3 API keys for IP 192.168.1.1. Please upgrade to a premium plan to bypass this limit.",
    "type": "error",
    "param": null,
    "code": 403
  }
}

Insufficient Balance (402)

{
  "error": {
    "message": "Insufficient balance.\n\nPlease wait for the next daily reset or purchase more tokens in Electron Hub.",
    "type": "error",
    "param": null,
    "code": 402
  }
}

Proxy Key Limit Exceeded (402)

{
  "error": {
    "message": "Exceeded allocated amount for this key.",
    "type": "error",
    "param": null,
    "code": 402
  }
}

Special Model Limits

Some models have additional daily limits beyond the standard rate limits:

Google Models (Daily Limits)

Google experimental and preview models have daily usage limits that reset at 21:00 UTC:

Subscription PlanDaily RequestsModels Affected
Free50Experimental/Preview models only
Starter50Experimental/Preview models only
Plus75Experimental/Preview models only
Pro100Experimental/Preview models only
Business200Experimental/Preview models only
Enterprise300Experimental/Preview models only

Experimental/Preview models include models with -exp or -preview in their names, such as:

  • gemini-1.5-flash-exp
  • gemini-2.0-flash-exp
  • gemini-2.0-flash-thinking-exp-1219
  • gemini-2.5-flash-preview-05-20
  • gemini-2.5-pro-preview-05-06

Regular Google models (like gemini-1.5-flash, gemini-1.5-pro, gemma-3-27b-it) are not subject to these daily limits.

Midjourney Models (Daily Limits)

Midjourney image generation has a separate daily limit:

  • Daily Limit: 20 requests per day
  • Reset Time: 21:00 UTC
  • Applies to: All Midjourney image generation endpoints

Implementing Retry Logic

import time
import random
from openai import OpenAI, RateLimitError

client = OpenAI(
    base_url="https://api.electronhub.ai/v1",
    api_key="ek-your-api-key"
)

def make_request_with_retry(max_retries=3):
    for attempt in range(max_retries):
        try:
            response = client.chat.completions.create(
                model="gpt-4o",
                messages=[{"role": "user", "content": "Hello!"}]
            )
            return response
            
        except RateLimitError:
            if attempt < max_retries - 1:
                # Exponential backoff with jitter
                wait_time = (2 ** attempt) + random.uniform(0, 1)
                print(f"Rate limited. Waiting {wait_time:.2f} seconds...")
                time.sleep(wait_time)
            else:
                raise
    
    return None

# Usage
response = make_request_with_retry()

Request Queuing

For applications with variable load, implement a request queue:

import asyncio
import time
from typing import List, Callable

class RateLimitedClient:
    def __init__(self, client, rpm_limit: int):
        self.client = client
        self.rpm_limit = rpm_limit
        self.request_times: List[float] = []
        self.lock = asyncio.Lock()
    
    async def make_request(self, request_func: Callable):
        async with self.lock:
            now = time.time()
            
            # Remove requests older than 60 seconds
            cutoff = now - 60
            self.request_times = [t for t in self.request_times if t > cutoff]
            
            # Check if we need to wait
            if len(self.request_times) >= self.rpm_limit:
                wait_time = 60 - (now - self.request_times[0])
                if wait_time > 0:
                    await asyncio.sleep(wait_time)
                    now = time.time()
            
            # Make the request
            self.request_times.append(now)
            return await request_func()

# Usage
rate_limited_client = RateLimitedClient(client, rpm_limit=20)

async def my_request():
    return client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )

response = await rate_limited_client.make_request(my_request)

Optimization Strategies

1. Batch Processing

Group multiple operations when possible:

# Instead of multiple separate requests
results = []
for text in texts:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": f"Analyze: {text}"}]
    )
    results.append(response)

# Combine into fewer requests
combined_text = "\n---\n".join(texts)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{
        "role": "user", 
        "content": f"Analyze each section:\n{combined_text}"
    }]
)

2. Efficient Model Selection

Choose the right model for your task:

  • GPT-3.5-turbo: Fast and cost-effective for simple tasks
  • GPT-4o: Best for complex reasoning and analysis
  • Claude-3-haiku: Fastest for quick responses
  • Claude-3-5-sonnet: Balanced performance and capability

3. Request Optimization

Optimize your requests to reduce unnecessary calls:

# Use system messages efficiently
system_message = "You are a helpful assistant. Always respond concisely."

# Reuse conversations instead of starting fresh
conversation = [
    {"role": "system", "content": system_message},
    {"role": "user", "content": "First question"},
]

# Add to conversation instead of new requests
conversation.append({"role": "assistant", "content": previous_response})
conversation.append({"role": "user", "content": "Follow-up question"})

4. Caching Results

Cache responses for repeated queries:

import hashlib
import json
from functools import lru_cache

@lru_cache(maxsize=1000)
def cached_completion(model, messages_hash, **kwargs):
    return client.chat.completions.create(
        model=model,
        messages=json.loads(messages_hash),
        **kwargs
    )

def make_cached_request(model, messages, **kwargs):
    # Create hash of messages for caching
    messages_str = json.dumps(messages, sort_keys=True)
    messages_hash = hashlib.md5(messages_str.encode()).hexdigest()
    
    return cached_completion(model, messages_hash, **kwargs)

Monitoring Usage

Check Current Usage

Monitor your API usage programmatically:

def check_rate_limit_status(response):
    headers = response.response.headers
    limit = int(headers.get('x-ratelimit-limit', 0))
    remaining = int(headers.get('x-ratelimit-remaining', 0))
    reset_time = int(headers.get('x-ratelimit-reset', 0))
    
    print(f"Rate Limit: {limit}")
    print(f"Remaining: {remaining}")
    print(f"Resets at: {reset_time}")
    
    if remaining < 5:
        print("⚠️  Warning: Approaching rate limit!")

Dashboard Monitoring

Use the dashboard to:

  • View real-time usage statistics
  • Set up usage alerts
  • Monitor trends over time
  • Identify peak usage periods

Upgrading Your Plan

When to Upgrade

Consider upgrading when you:

  • Consistently hit rate limits
  • Need higher throughput for production
  • Want to reduce latency from queuing
  • Require dedicated support

How to Upgrade

  1. Visit your dashboard
  2. Go to “Billing”
  3. Select your desired plan
  4. Complete the upgrade process
  5. New limits take effect immediately

Best Practices

Next Steps