The system experiences severe slow queries, with P99 latency jumping from 300ms to 12s. Logs show many slow SQL statements concentrated on the same orders table (~1.8 billion rows). Provide a complete diagnosis and optimization approach, including but not limited to: slow SQL identification, execution plan analysis, index selection, table structure optimization, sharding strategy, read-write separation, caching design, SQL rewriting, etc. Require priority ordering and estimated benefits.
分类: technical
难度: hard
标签:
答题技巧
["Prioritize monitoring tools (slow query log, PMM, tracing) to locate Top 5 slow SQLs","Key metrics interpretation in explain analyze (rows examined, key usage, filesort, tmp table, etc.)","Priority of covering index, index condition pushdown, JOIN order, subquery-to-JOIN optimizations","When to introduce sharding vs cold-hot separation / historical table archiving first","Appropriate sharding keys for different business scenarios (user vs time vs order status)","Value of multi-level caching (local + distributed + hot preloading) in this scenario","Examples of order-of-magnitude improvements from SQL rewriting"]
参考答案
Priority: 1. Enable slow query log + tracing → locate Top 10 slow SQLs (immediate problem discovery) → 2. Add/optimize covering indexes + index condition pushdown (expect P99 to 2-4s) → 3. Bypass cache for hot data (local + Redis hot preloading, expect another 50-70% reduction) → 4. Materialized views/summary tables for high-frequency queries (another 30-60% reduction) → 5. Cold-hot separation + historical order archiving (long-term table size reduction) → 6. Finally consider composite sharding by user+time (reduce single table size).