Rate limiting with a single Lua script

February 10, 2026 - 7 minutes read - 1297 words

We needed rate limiting across all our services: FastAPI, Django, you name it. The usual approach is to grab a library, wire it up per-framework, and accept the slight differences in behavior between them. We went a different way: one Lua script that runs atomically in Redis, with everything else being thin wrappers around it.

This is how the rate limiting system in application-kit works, including per-project overrides, element-based counting, and a monitor mode for gradual rollouts.

The problem with two rate limiters

Early on, we had two Lua scripts: one for basic path-based rate limiting and another for per-project limits with override support. They did almost the same thing but diverged just enough to be annoying. Bug fixes had to land in two places. Behavior wasn’t always consistent. The classic duplication trap.

The fix was to collapse both into a single script (PROJECT_RATE_LIMITER_LUA) with optional parameters that fall back to sensible defaults. No override key? It behaves like a simple limiter. Pass one? It checks per-project overrides atomically.

One Lua script to rule them all

The core idea: every rate limit check is a single atomic Redis operation. No round-trips, no race conditions between reading the count and incrementing it.

The unified Lua script (simplified)

local rate_limit_key = KEYS[1]
local override_key = KEYS[2]
local default_max_requests = tonumber(ARGV[1])
local default_expiry = tonumber(ARGV[2])
local increment_amount = tonumber(ARGV[3]) or 1

-- Check for per-project override
local max_requests = default_max_requests
local expiry = default_expiry

if override_key ~= "" then
    local override = redis.call('HGETALL', override_key)
    if #override > 0 then
        -- Parse hash fields into max_requests and expiry
        for i = 1, #override, 2 do
            if override[i] == "max_requests" then
                max_requests = tonumber(override[i + 1])
            elseif override[i] == "expiry" then
                expiry = tonumber(override[i + 1])
            end
        end
    end
end

-- Get current count
local count = tonumber(redis.call('GET', rate_limit_key)) or -1
local is_over_limit = 0

if count == -1 then
    -- First request in this window
    redis.call('SET', rate_limit_key, increment_amount)
    redis.call('EXPIRE', rate_limit_key, expiry)
    count = increment_amount
    if count > max_requests then
        is_over_limit = 1
    end
elseif count + increment_amount > max_requests then
    is_over_limit = 1
else
    count = redis.call('INCRBY', rate_limit_key, increment_amount)
end

local ttl = redis.call('TTL', rate_limit_key)
return {is_over_limit, count, ttl, max_requests}

The script takes two keys and three arguments:

KEYS[1]: the counter key (e.g. ratelimit:/api/search:1:42)
KEYS[2]: the override key (empty string means “no overrides”)
ARGV[1..3]: default max requests, expiry in seconds, and increment amount

Everything after that is Redis doing its thing. One EVALSHA, one atomic operation, four values back.

Redis key layout

The key format puts the endpoint first and IDs at the end:

Counter:  ratelimit:{endpoint}:{org_id}:{project_id}
Override: ratelimit_override:{endpoint}:{org_id}:{project_id}

This ordering matters. Endpoint names can contain colons (like api:v1:search), so we parse keys from the right: the last two segments are always org_id:project_id. The parse_override_key() function handles this reliably, stripping the known prefix and extracting integers from the tail.

Counters auto-expire via Redis TTL. When the window ends, the key vanishes and the next request starts fresh.

Per-project overrides

The override system uses Redis hashes. An override key like ratelimit_override:/api/search:1:42 stores:

max_requests = "500"
expiry = "120"

Because the Lua script checks for overrides inside the same atomic call, there’s no window where a request could slip through with stale limits. If an override exists, it replaces the defaults. If it doesn’t, the defaults apply. Zero ambiguity.

Overrides have their own optional TTL (independent of the rate limit window), so you can set a temporary elevated limit that auto-expires:

Setting a temporary override

await set_rate_limit_override(
    redis, org_id=1, project_id=42,
    endpoint="/api/search",
    max_requests=500, expiry=120,
    override_ttl=86400  # Override expires in 24h, limit window is still 120s
)

Management functions (set, get, delete, list, clear) are provided but designed for admin/provisioning services. Not every consumer needs them.

Element-based counting

Not all requests are equal. A distance matrix call with 10 origins and 5 destinations should count as 50 elements, not 1 request.

The increment_amount parameter in the Lua script makes this trivial. Instead of INCR, we use INCRBY:

Element-based rate limiting

# Request: 10 origins x 5 destinations = 50 elements
rows, cols = len(request.origins), len(request.destinations)

await apply_element_rate_limit(
    request, response,
    max_requests=1000,        # 1000 elements per minute
    expiry=60,
    increment_amount=rows * cols,  # This request costs 50
    redis_client=redis,
)

Element counters use a key_suffix (default: /elements) to maintain separate counters from regular request counting. Same endpoint, two independent limits:

ratelimit:/api/matrix:1:42           → request counter
ratelimit:/api/matrix:/elements:1:42 → element counter

Monitor mode

Rolling out rate limits on existing APIs is nerve-wracking. You want to know who would be affected before actually blocking anyone.

Monitor mode (RATE_LIMIT_MODE=monitor) runs the full rate limit logic (counting, checking, setting headers) but never returns a 429. Instead, it tags the Datadog root span:

Monitor mode behavior

if result.is_over_limit and mode == RateLimitMode.monitor:
    span = tracer.current_root_span()
    if span:
        span.set_tag("ratelimit.over_limit", endpoint_path)

Clients still see RateLimit-Remaining: 0 in response headers, so they can self-throttle if they’re polite. But the server won’t enforce it.

Three modes, set via the Bender manifest:

Mode	Counts	Headers	Blocks	Datadog tag
`on`	Yes	Yes	Yes (429)	No
`off`	No	No	No	No
`monitor`	Yes	Yes	No	Yes

The setting is read dynamically on every request. No restart needed to switch modes.

Framework wrappers

FastAPI: three entry points

FastAPI gets the most complete support with three ways to rate limit:

1. ProjectRateLimiter: the recommended default. Uses dependency injection, extracts project identity from the authenticated request, supports overrides and monitor mode:

FastAPI ProjectRateLimiter

@router.get(
    "/search",
    dependencies=[Depends(ProjectRateLimiter(max_requests=100, expiry=60))],
)
async def search():
    ...

It follows the template method pattern: make_key() and make_override_key() can be overridden without touching the rate limit logic itself:

Custom rate limiter via subclassing

class GeoRateLimiter(ProjectRateLimiter):
    def make_key(self, request: Request) -> str | None:
        region = request.headers.get("X-Region", "default")
        base = super().make_key(request)
        return f"{base}:{region}" if base else None

2. PathRateLimiter: simpler, no project isolation. Useful for public or webhook endpoints where there’s no authenticated project:

FastAPI PathRateLimiter

@router.post(
    "/webhook",
    dependencies=[Depends(PathRateLimiter(max_requests=10, expiry=60))],
)
async def webhook():
    ...

3. Programmatic API: apply_rate_limit() and apply_element_rate_limit() for when limits depend on request content:

Runtime-determined limits

async def process_batch(request: Request, response: Response):
    batch_size = len(request.json()["items"])
    await apply_element_rate_limit(
        request, response,
        max_requests=500, expiry=60,
        increment_amount=batch_size,
        redis_client=redis,
    )

Django: decorator-based

Django uses a @rate_limit decorator that auto-detects sync vs. async views:

Django rate limiting

@authenticate_key()
@rate_limit(max_requests=100, expiry=60)
def my_view(request):
    return JsonResponse({"status": "ok"})

The endpoint name is auto-detected from the URL pattern (e.g. /api/v1/datasets/<int:dataset_id>/search). Django middleware catches RateLimitExceeded exceptions and returns a 429 with the proper rate limit headers.

Both frameworks share the same Lua script, the same key generation functions, and the same result handling logic. The wrappers are genuinely thin.

The result object

Every rate limit check produces a RateLimitResult:

RateLimitResult

@dataclass(frozen=True)
class RateLimitResult:
    is_over_limit: bool
    request_count: int
    ttl: int
    max_requests: int

    @property
    def remaining(self) -> int:
        return max(0, self.max_requests - self.request_count)

    def to_headers(self) -> dict[str, str]:
        return {
            "RateLimit-Limit": str(self.max_requests),
            "RateLimit-Remaining": str(self.remaining),
            "RateLimit-Reset": str(self.ttl),
        }

Frozen dataclass, immutable, with a convenience method for standard rate limit headers. Every response (except when mode is off) gets these three headers so clients can implement their own backoff.

What made this work

A few design decisions that paid off:

Atomic override resolution. The Lua script checks overrides inside the same call that does the counting. No separate Redis roundtrip means no race condition between “check the override” and “apply the limit.”

Optional parameters with backwards-compatible defaults. override_key="" and increment_amount=1 mean the same script handles simple path limiting, per-project limiting, and element counting. No script duplication.

Dynamic configuration. get_rate_limit_mode() calls into Bender on every request. No cached settings, no restarts. Switch from monitor to on when you’re confident.

Separate counters via key suffix. Element counting doesn’t interfere with request counting. Same endpoint, independent limits, same Lua script.

The whole thing is tested with fakeredis[lua], which actually executes the Lua script, so the tests are as close to production behavior as you can get without a real Redis.