X Xerobit

SQL Query Optimization — Write Faster Queries with Indexes and Explain

Slow SQL queries are usually missing indexes, using table scans, or written inefficiently. Here's how to use EXPLAIN to find bottlenecks, add the right indexes, and write...

Mian Ali Khalid · · 7 min read
Use the tool
SQL Formatter
Format and beautify SQL queries. 12 dialect options (PostgreSQL, MySQL, SQLite, MSSQL, BigQuery, Snowflake, and more). Keyword casing control.
Open SQL Formatter →

Slow SQL queries are one of the most common causes of application performance problems. Most slow queries share the same root causes: missing indexes, unnecessary full table scans, N+1 query patterns, or inefficient joins. Here’s how to identify and fix them.

Use the SQL Formatter to format and clean up your queries before analyzing them.

EXPLAIN: understand what the database does

EXPLAIN shows the query execution plan — how the database plans to retrieve your data.

PostgreSQL EXPLAIN ANALYZE

EXPLAIN ANALYZE
SELECT users.name, COUNT(orders.id) AS order_count
FROM users
LEFT JOIN orders ON users.id = orders.user_id
WHERE users.created_at > '2024-01-01'
GROUP BY users.id, users.name
ORDER BY order_count DESC
LIMIT 10;

Output (simplified):

Limit  (cost=2834.67..2834.69 rows=10 width=48) (actual time=112.3..112.3 rows=10 loops=1)
  -> Sort  (cost=2834.67..2844.67 rows=4000 width=48) (actual time=112.3..112.3 rows=10 loops=1)
       Sort Key: (count(orders.id)) DESC
       -> HashAggregate  (cost=2600.00..2640.00 rows=4000 width=48)
            -> Hash Left Join  (cost=400.00..2400.00 rows=40000 width=16)
                 Hash Cond: (orders.user_id = users.id)
                 -> Seq Scan on orders  (cost=0..1200 rows=40000 width=8)
                 -> Hash  (cost=300..300 rows=8000 width=44)
                      -> Seq Scan on users  (cost=0..300 rows=8000 width=44)
                           Filter: (created_at > '2024-01-01')

Key things to look for:

  • Seq Scan: Full table scan — usually needs an index
  • cost= — estimated cost (lower is better)
  • actual time= — real execution time in milliseconds
  • rows= — estimated vs actual rows (large discrepancies mean stale statistics)

MySQL EXPLAIN

EXPLAIN SELECT * FROM users WHERE email = 'alice@example.com';
id | select_type | table | type  | key          | rows | Extra
---|-------------|-------|-------|--------------|------|-------
1  | SIMPLE      | users | ref   | idx_email    | 1    | Using index

Key columns:

  • type: const > ref > range > index > ALL (ALL = full scan, bad)
  • key: which index is used (NULL = no index)
  • rows: estimated rows scanned

Adding indexes

An index is the single highest-impact optimization for most queries.

When to add an index

-- WHERE clause on a column with many distinct values:
SELECT * FROM users WHERE email = 'alice@example.com';
-- → Create index on email

-- JOIN condition columns:
SELECT * FROM users JOIN orders ON users.id = orders.user_id;
-- → orders.user_id should be indexed (users.id is the primary key, already indexed)

-- ORDER BY column (avoids sort):
SELECT * FROM posts ORDER BY created_at DESC LIMIT 10;
-- → Create index on created_at

-- Frequent filter combinations:
SELECT * FROM orders WHERE user_id = 1 AND status = 'pending';
-- → Create composite index on (user_id, status)

Creating indexes

-- Single column index:
CREATE INDEX idx_users_email ON users(email);
CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_posts_created_at ON posts(created_at DESC);

-- Unique index (also enforces uniqueness):
CREATE UNIQUE INDEX idx_users_email_unique ON users(email);

-- Composite index (order matters — most selective first):
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Partial index (only index a subset):
CREATE INDEX idx_orders_pending ON orders(user_id) WHERE status = 'pending';
-- Only indexes pending orders — much smaller index

-- Expression index:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- For: WHERE LOWER(email) = 'alice@example.com'

Composite index column order

For WHERE user_id = 1 AND status = 'pending' AND created_at > '2024-01-01':

-- Best: most selective column first, then others used in WHERE
CREATE INDEX idx_orders_user_status_date ON orders(user_id, status, created_at);

-- This index also helps for:
-- WHERE user_id = 1
-- WHERE user_id = 1 AND status = 'pending'
-- But NOT for: WHERE status = 'pending' alone (can't skip the first column)

Common slow query patterns

N+1 query problem

// WRONG: 1 query for users + N queries for each user's orders:
const users = await db.query('SELECT * FROM users LIMIT 10');
for (const user of users) {
  user.orders = await db.query('SELECT * FROM orders WHERE user_id = ?', [user.id]);
}
// Result: 11 queries

// CORRECT: 2 queries (or 1 with JOIN):
const users = await db.query('SELECT * FROM users LIMIT 10');
const userIds = users.map(u => u.id);
const orders = await db.query('SELECT * FROM orders WHERE user_id IN (?)', [userIds]);
// Map orders to users in JS
// Result: 2 queries

SELECT * performance

-- SLOW: fetches all columns including large TEXT/BLOB columns:
SELECT * FROM posts;

-- FAST: only fetch what you need:
SELECT id, title, summary, author_id, created_at FROM posts;

Missing pagination

-- WRONG: returns all rows (millions on large tables):
SELECT * FROM orders WHERE status = 'completed';

-- CORRECT: paginate with LIMIT and OFFSET:
SELECT * FROM orders WHERE status = 'completed' ORDER BY id LIMIT 100 OFFSET 0;

-- BETTER: keyset pagination (fast for large offsets):
SELECT * FROM orders WHERE status = 'completed' AND id > 50000 ORDER BY id LIMIT 100;

Function in WHERE clause breaks indexes

-- WRONG: function on column prevents index use:
SELECT * FROM users WHERE YEAR(created_at) = 2024;
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';

-- CORRECT: use range condition instead:
SELECT * FROM users WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

-- Or create a function-based index:
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
SELECT * FROM users WHERE LOWER(email) = 'alice@example.com';
-- Now the function index is used

Query caching and connection pooling

// Connection pooling — reuse connections:
const pool = mysql.createPool({
  host: 'localhost',
  user: 'root',
  database: 'mydb',
  connectionLimit: 10  // Max 10 concurrent connections
});

// Results caching for expensive queries (Redis):
async function getUserStats(userId) {
  const cacheKey = `user:stats:${userId}`;
  const cached = await redis.get(cacheKey);
  if (cached) return JSON.parse(cached);
  
  const stats = await db.query(
    'SELECT COUNT(*) as orders, SUM(total) as revenue FROM orders WHERE user_id = ?',
    [userId]
  );
  
  await redis.setex(cacheKey, 300, JSON.stringify(stats));  // Cache 5 minutes
  return stats;
}

Related posts

Related tool

SQL Formatter

Format and beautify SQL queries. 12 dialect options (PostgreSQL, MySQL, SQLite, MSSQL, BigQuery, Snowflake, and more). Keyword casing control.

Written by Mian Ali Khalid. Part of the Dev Productivity pillar.