Rate limiting with a single Lua script
- 7 minutes read - 1297 wordsWe needed rate limiting across all our services: FastAPI, Django, you name it. The usual approach is to grab a library, wire it up per-framework, and accept the slight differences in behavior between them. We went a different way: one Lua script that runs atomically in Redis, with everything else being thin wrappers around it.
This is how the rate limiting system in application-kit works, including per-project overrides, element-based counting, and a monitor mode for gradual rollouts.
The problem with two rate limiters
Early on, we had two Lua scripts: one for basic path-based rate limiting and another for per-project limits with override support. They did almost the same thing but diverged just enough to be annoying. Bug fixes had to land in two places. Behavior wasn’t always consistent. The classic duplication trap.
The fix was to collapse both into a single script (PROJECT_RATE_LIMITER_LUA) with optional parameters that fall back to sensible defaults. No override key? It behaves like a simple limiter. Pass one? It checks per-project overrides atomically.
One Lua script to rule them all
The core idea: every rate limit check is a single atomic Redis operation. No round-trips, no race conditions between reading the count and incrementing it.
local rate_limit_key = KEYS[1]
local override_key = KEYS[2]
local default_max_requests = tonumber(ARGV[1])
local default_expiry = tonumber(ARGV[2])
local increment_amount = tonumber(ARGV[3]) or 1
-- Check for per-project override
local max_requests = default_max_requests
local expiry = default_expiry
if override_key ~= "" then
local override = redis.call('HGETALL', override_key)
if #override > 0 then
-- Parse hash fields into max_requests and expiry
for i = 1, #override, 2 do
if override[i] == "max_requests" then
max_requests = tonumber(override[i + 1])
elseif override[i] == "expiry" then
expiry = tonumber(override[i + 1])
end
end
end
end
-- Get current count
local count = tonumber(redis.call('GET', rate_limit_key)) or -1
local is_over_limit = 0
if count == -1 then
-- First request in this window
redis.call('SET', rate_limit_key, increment_amount)
redis.call('EXPIRE', rate_limit_key, expiry)
count = increment_amount
if count > max_requests then
is_over_limit = 1
end
elseif count + increment_amount > max_requests then
is_over_limit = 1
else
count = redis.call('INCRBY', rate_limit_key, increment_amount)
end
local ttl = redis.call('TTL', rate_limit_key)
return {is_over_limit, count, ttl, max_requests}The script takes two keys and three arguments:
- KEYS[1]: the counter key (e.g.
ratelimit:/api/search:1:42) - KEYS[2]: the override key (empty string means “no overrides”)
- ARGV[1..3]: default max requests, expiry in seconds, and increment amount
Everything after that is Redis doing its thing. One EVALSHA, one atomic operation, four values back.
Redis key layout
The key format puts the endpoint first and IDs at the end:
Counter: ratelimit:{endpoint}:{org_id}:{project_id}
Override: ratelimit_override:{endpoint}:{org_id}:{project_id}This ordering matters. Endpoint names can contain colons (like api:v1:search), so we parse keys from the right: the last two segments are always org_id:project_id. The parse_override_key() function handles this reliably, stripping the known prefix and extracting integers from the tail.
Counters auto-expire via Redis TTL. When the window ends, the key vanishes and the next request starts fresh.
Per-project overrides
The override system uses Redis hashes. An override key like ratelimit_override:/api/search:1:42 stores:
max_requests = "500"
expiry = "120"Because the Lua script checks for overrides inside the same atomic call, there’s no window where a request could slip through with stale limits. If an override exists, it replaces the defaults. If it doesn’t, the defaults apply. Zero ambiguity.
Overrides have their own optional TTL (independent of the rate limit window), so you can set a temporary elevated limit that auto-expires:
await set_rate_limit_override(
redis, org_id=1, project_id=42,
endpoint="/api/search",
max_requests=500, expiry=120,
override_ttl=86400 # Override expires in 24h, limit window is still 120s
)Management functions (set, get, delete, list, clear) are provided but designed for admin/provisioning services. Not every consumer needs them.
Element-based counting
Not all requests are equal. A distance matrix call with 10 origins and 5 destinations should count as 50 elements, not 1 request.
The increment_amount parameter in the Lua script makes this trivial. Instead of INCR, we use INCRBY:
# Request: 10 origins x 5 destinations = 50 elements
rows, cols = len(request.origins), len(request.destinations)
await apply_element_rate_limit(
request, response,
max_requests=1000, # 1000 elements per minute
expiry=60,
increment_amount=rows * cols, # This request costs 50
redis_client=redis,
)Element counters use a key_suffix (default: /elements) to maintain separate counters from regular request counting. Same endpoint, two independent limits:
ratelimit:/api/matrix:1:42 → request counter
ratelimit:/api/matrix:/elements:1:42 → element counterMonitor mode
Rolling out rate limits on existing APIs is nerve-wracking. You want to know who would be affected before actually blocking anyone.
Monitor mode (RATE_LIMIT_MODE=monitor) runs the full rate limit logic (counting, checking, setting headers) but never returns a 429. Instead, it tags the Datadog root span:
if result.is_over_limit and mode == RateLimitMode.monitor:
span = tracer.current_root_span()
if span:
span.set_tag("ratelimit.over_limit", endpoint_path)Clients still see RateLimit-Remaining: 0 in response headers, so they can self-throttle if they’re polite. But the server won’t enforce it.
Three modes, set via the Bender manifest:
| Mode | Counts | Headers | Blocks | Datadog tag |
|---|---|---|---|---|
on | Yes | Yes | Yes (429) | No |
off | No | No | No | No |
monitor | Yes | Yes | No | Yes |
The setting is read dynamically on every request. No restart needed to switch modes.
Framework wrappers
FastAPI: three entry points
FastAPI gets the most complete support with three ways to rate limit:
1. ProjectRateLimiter: the recommended default. Uses dependency injection, extracts project identity from the authenticated request, supports overrides and monitor mode:
@router.get(
"/search",
dependencies=[Depends(ProjectRateLimiter(max_requests=100, expiry=60))],
)
async def search():
...It follows the template method pattern: make_key() and make_override_key() can be overridden without touching the rate limit logic itself:
class GeoRateLimiter(ProjectRateLimiter):
def make_key(self, request: Request) -> str | None:
region = request.headers.get("X-Region", "default")
base = super().make_key(request)
return f"{base}:{region}" if base else None2. PathRateLimiter: simpler, no project isolation. Useful for public or webhook endpoints where there’s no authenticated project:
@router.post(
"/webhook",
dependencies=[Depends(PathRateLimiter(max_requests=10, expiry=60))],
)
async def webhook():
...3. Programmatic API: apply_rate_limit() and apply_element_rate_limit() for when limits depend on request content:
async def process_batch(request: Request, response: Response):
batch_size = len(request.json()["items"])
await apply_element_rate_limit(
request, response,
max_requests=500, expiry=60,
increment_amount=batch_size,
redis_client=redis,
)Django: decorator-based
Django uses a @rate_limit decorator that auto-detects sync vs. async views:
@authenticate_key()
@rate_limit(max_requests=100, expiry=60)
def my_view(request):
return JsonResponse({"status": "ok"})The endpoint name is auto-detected from the URL pattern (e.g. /api/v1/datasets/<int:dataset_id>/search). Django middleware catches RateLimitExceeded exceptions and returns a 429 with the proper rate limit headers.
Both frameworks share the same Lua script, the same key generation functions, and the same result handling logic. The wrappers are genuinely thin.
The result object
Every rate limit check produces a RateLimitResult:
@dataclass(frozen=True)
class RateLimitResult:
is_over_limit: bool
request_count: int
ttl: int
max_requests: int
@property
def remaining(self) -> int:
return max(0, self.max_requests - self.request_count)
def to_headers(self) -> dict[str, str]:
return {
"RateLimit-Limit": str(self.max_requests),
"RateLimit-Remaining": str(self.remaining),
"RateLimit-Reset": str(self.ttl),
}Frozen dataclass, immutable, with a convenience method for standard rate limit headers. Every response (except when mode is off) gets these three headers so clients can implement their own backoff.
What made this work
A few design decisions that paid off:
Atomic override resolution. The Lua script checks overrides inside the same call that does the counting. No separate Redis roundtrip means no race condition between “check the override” and “apply the limit.”
Optional parameters with backwards-compatible defaults. override_key="" and increment_amount=1 mean the same script handles simple path limiting, per-project limiting, and element counting. No script duplication.
Dynamic configuration. get_rate_limit_mode() calls into Bender on every request. No cached settings, no restarts. Switch from monitor to on when you’re confident.
Separate counters via key suffix. Element counting doesn’t interfere with request counting. Same endpoint, independent limits, same Lua script.
The whole thing is tested with fakeredis[lua], which actually executes the Lua script, so the tests are as close to production behavior as you can get without a real Redis.