Design a distributed rate limiting system that supports 100,000 QPS globally with precise limiting (error ≤1%), supporting multiple algorithms (Token Bucket, Leaky Bucket, Sliding Window, Fixed Window), dynamic threshold adjustment, and hot-key isolation. Explain core data structures, storage choices, consistency guarantees, fault tolerance, and performance optimization strategies.
分类: technical
难度: hard
标签:
答题技巧
["Explain why Redis+Lua is insufficient for ultra-precise global rate limiting at this scale","Compare multi-level caching (local + distributed), CRDT, database vs Redis+HyperLogLog+Bloom filter combinations","Discuss boundary issues and memory explosion in sliding window implementations","Common approaches to hot-key isolation and their trade-offs","Control plane design for dynamic threshold adjustment (config center, push vs pull)","Fault tolerance: single point of failure, split-brain, network partition behavior","Performance testing focus: QPS, P99 latency, memory, CPU, network bandwidth"]
参考答案
Recommended approach: local token bucket/sliding window + Redis Bitmap/HyperLogLog for coarse limiting + centralized precise counter (multi-replica + CRDT or small Raft strong-consistent cluster). Hot-key isolation via consistent hashing + local cache + request coloring. Dynamic thresholds via config center + long polling/change listening for second-level effectiveness. Fault tolerance with multi-active deployment + read-write separation + fallback to local limiting.