Rate Limiting in Go: Token Bucket, Sliding Window, and Redis
Uncontrolled traffic kills backends. Rate limiting is not optional in production — it’s the difference between a degraded service and a complete outage. Here’s how to implement it correctly in Go.
The Wrong Way: In-Memory Single Instance
// Looks fine, breaks in production
var (
mu sync.Mutex
requests = make(map[string]int)
)
func isAllowed(clientID string) bool {
mu.Lock()
defer mu.Unlock()
requests[clientID]++
return requests[clientID] <= 100 // 100 per... when? This has no window.
}
Problems: no time window, doesn’t work with multiple instances, memory leak.
Token Bucket with golang.org/x/time/rate
The standard library approach for single-instance rate limiting:
import "golang.org/x/time/rate"
type RateLimiter struct {
limiters sync.Map
r rate.Limit
b int
}
func NewRateLimiter(rps float64, burst int) *RateLimiter {
return &RateLimiter{r: rate.Limit(rps), b: burst}
}
func (rl *RateLimiter) getLimiter(key string) *rate.Limiter {
if v, ok := rl.limiters.Load(key); ok {
return v.(*rate.Limiter)
}
l := rate.NewLimiter(rl.r, rl.b)
rl.limiters.Store(key, l)
return l
}
func (rl *RateLimiter) Allow(key string) bool {
return rl.getLimiter(key).Allow()
}
// Middleware
func RateLimitMiddleware(rl *RateLimiter) func(http.Handler) http.Handler {
return func(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
clientIP := r.RemoteAddr
if !rl.Allow(clientIP) {
w.Header().Set("Retry-After", "1")
http.Error(w, "rate limit exceeded", http.StatusTooManyRequests)
return
}
next.ServeHTTP(w, r)
})
}
}
Distributed Rate Limiting with Redis
For multi-instance deployments, Redis is the source of truth. The sliding window log algorithm:
const rateLimitScript = `
local key = KEYS[1]
local window = tonumber(ARGV[1]) -- window in seconds
local limit = tonumber(ARGV[2])
local now = tonumber(ARGV[3]) -- current timestamp in ms
-- Remove entries outside the window
redis.call('ZREMRANGEBYSCORE', key, '-inf', now - window * 1000)
-- Count current entries
local count = redis.call('ZCARD', key)
if count < limit then
-- Add this request
redis.call('ZADD', key, now, now)
redis.call('EXPIRE', key, window)
return 1 -- allowed
else
return 0 -- denied
end
`
type RedisRateLimiter struct {
rdb *redis.Client
script *redis.Script
window time.Duration
limit int
}
func (r *RedisRateLimiter) Allow(ctx context.Context, key string) (bool, error) {
now := time.Now().UnixMilli()
result, err := r.script.Run(ctx, r.rdb,
[]string{"ratelimit:" + key},
int(r.window.Seconds()),
r.limit,
now,
).Int()
if err != nil {
// Fail open: if Redis is down, allow the request
return true, err
}
return result == 1, nil
}
Fixed Window vs. Sliding Window
Fixed Window: 100 req/min
[ 0s ........ 59s ] [ 60s ........ 119s ]
100 requests → OK 100 more → OK
But: 100 at t=59, 100 at t=61 = 200 in 2 seconds!
Sliding Window: always looks back exactly 60 seconds
Much fairer, slightly more expensive.
Adaptive Rate Limiting
The smartest approach: back off when the system is stressed.
type AdaptiveRateLimiter struct {
base *RedisRateLimiter
metrics MetricsCollector
}
func (a *AdaptiveRateLimiter) effectiveLimit() int {
p99Latency := a.metrics.P99Latency()
errorRate := a.metrics.ErrorRate()
switch {
case errorRate > 0.05 || p99Latency > 2*time.Second:
return a.base.limit / 4 // severe throttle
case errorRate > 0.01 || p99Latency > 500*time.Millisecond:
return a.base.limit / 2 // moderate throttle
default:
return a.base.limit
}
}
Response Headers (Do This)
Always tell clients when to retry:
func writeRateLimitHeaders(w http.ResponseWriter, remaining int, reset time.Time) {
w.Header().Set("X-RateLimit-Limit", "100")
w.Header().Set("X-RateLimit-Remaining", strconv.Itoa(remaining))
w.Header().Set("X-RateLimit-Reset", strconv.FormatInt(reset.Unix(), 10))
w.Header().Set("Retry-After", strconv.Itoa(int(time.Until(reset).Seconds())))
}
Summary
| Algorithm | Pros | Cons | Use When |
|---|---|---|---|
| Token Bucket | Smooth, allows bursts | In-memory only | Single instance |
| Fixed Window | Simple, fast | Boundary burst problem | Coarse limits |
| Sliding Window Log | Accurate | O(N) memory per client | Strict fairness |
| Sliding Window Counter | Approximate, cheap | Slight inaccuracy | High-volume APIs |
| Adaptive | Self-healing | Complex | Production SLO protection |