Kate Docs
Reference

Rate Limits

API rate limits by endpoint.

Rate limits prevent abuse and ensure fair access. When you exceed a limit, the API returns 429 Too Many Requests.

Default Limits

Most endpoints share a default rate limit per API key:

TierRequests per Minute
Standard60
Elevated (on request)300

Endpoint-Specific Limits

Some endpoints have stricter limits:

EndpointLimitReason
POST /agents/{id}/eval/trigger30/minuteEvaluation is compute-intensive
POST /agents/{id}/knowledge10/minuteKnowledge ingestion is resource-heavy
POST /agents/{id}/runs/{id}/spans60/minuteSpan upload rate limiting
GET /marketplace/listings/{id}/preview20/minutePrevents preview abuse
POST /auth/otp/send3/minutePrevents OTP spam

Rate Limit Headers

Rate-limited responses include headers indicating your current status:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1711814400
Retry-After: 30
HeaderDescription
X-RateLimit-LimitMaximum requests allowed in the window
X-RateLimit-RemainingRequests remaining in current window
X-RateLimit-ResetUnix timestamp when the window resets
Retry-AfterSeconds to wait before retrying

Handling Rate Limits

import asyncio
import httpx

async def call_with_retry(client, url, max_retries=3):
    for attempt in range(max_retries):
        response = await client.get(url)
        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 5))
            await asyncio.sleep(retry_after)
            continue
        return response
    raise Exception("Rate limit exceeded after retries")

Tips

  • Batch operations where possible instead of many individual requests
  • Cache responses for data that doesn't change frequently
  • Use webhooks (when available) instead of polling
  • Spread requests evenly rather than bursting

On this page