SQL Subqueries — Correlated, Scalar, and IN vs EXISTS
SQL subqueries nest one query inside another. Learn the difference between correlated and non-correlated subqueries, scalar subqueries, and when to use IN vs EXISTS vs JOIN for...
Subqueries allow you to use the result of one query inside another. They’re flexible but can be slow if used where a JOIN would work better. Understanding when to use each form is key to performant SQL.
Use the SQL Formatter to format and indent complex queries with nested subqueries.
Types of subqueries
Non-correlated (independent) subquery
Runs once, its result is used by the outer query:
-- Find customers who have ever placed an order:
SELECT name, email
FROM customers
WHERE id IN (
SELECT DISTINCT customer_id FROM orders
);
-- Inner SELECT runs once, outer SELECT filters using its result
Correlated subquery
References the outer query and runs once per outer row:
-- Find customers whose most recent order was over $1000:
SELECT c.name, c.email
FROM customers c
WHERE (
SELECT MAX(total)
FROM orders o
WHERE o.customer_id = c.id -- references outer c.id
) > 1000;
-- Inner SELECT runs once per customer row — can be slow on large tables
Scalar subquery
Returns exactly one row and one column — can be used anywhere a single value is expected:
-- Show each product with the count of orders containing it:
SELECT
p.name,
p.price,
(SELECT COUNT(*) FROM order_items oi WHERE oi.product_id = p.id) AS order_count
FROM products p
ORDER BY order_count DESC;
IN vs EXISTS
IN
-- IN: executes inner query, collects values, checks membership
SELECT * FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 100);
-- Problem: IN(NULL) always returns FALSE (can cause unexpected results)
-- If orders had NULL customer_ids, those rows are silently excluded
EXISTS
-- EXISTS: returns TRUE as soon as first matching row is found
-- Usually faster than IN for large result sets
SELECT * FROM customers c
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.customer_id = c.id AND o.total > 100
);
-- EXISTS handles NULLs correctly
-- Also use NOT EXISTS instead of NOT IN (NOT IN with NULLs is buggy):
-- WRONG (can return empty result if any order has NULL customer_id):
SELECT * FROM customers WHERE id NOT IN (SELECT customer_id FROM orders);
-- RIGHT:
SELECT * FROM customers c
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.customer_id = c.id
);
Subquery vs JOIN
-- Subquery approach:
SELECT name FROM customers
WHERE id IN (SELECT customer_id FROM orders WHERE total > 500);
-- JOIN approach (often faster, optimizer handles it better):
SELECT DISTINCT c.name
FROM customers c
JOIN orders o ON o.customer_id = c.id
WHERE o.total > 500;
-- When subquery is cleaner (aggregation):
SELECT c.name, c.email
FROM customers c
WHERE (SELECT SUM(total) FROM orders WHERE customer_id = c.id) > 10000;
-- vs. GROUP BY approach:
SELECT c.name, c.email
FROM customers c
JOIN orders o ON o.customer_id = c.id
GROUP BY c.id, c.name, c.email
HAVING SUM(o.total) > 10000;
Subqueries in FROM (derived tables)
-- Aggregate first, then filter on the aggregate:
SELECT category, avg_price
FROM (
SELECT category, AVG(price) AS avg_price
FROM products
GROUP BY category
) AS category_stats
WHERE avg_price > 50;
-- Common pattern: get top-N per group
SELECT *
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rn
FROM products
) ranked
WHERE rn <= 3; -- Top 3 most expensive per category
WITH clause (CTE) — cleaner than nested subqueries
-- Complex nested subquery:
SELECT c.name
FROM customers c
WHERE c.id IN (
SELECT customer_id
FROM orders
WHERE total > (SELECT AVG(total) FROM orders)
);
-- Same query as CTE (much cleaner):
WITH avg_order AS (
SELECT AVG(total) AS avg_total FROM orders
),
high_value_orders AS (
SELECT customer_id
FROM orders, avg_order
WHERE total > avg_total
)
SELECT c.name
FROM customers c
JOIN high_value_orders h ON h.customer_id = c.id;
Lateral joins (PostgreSQL)
A lateral join lets the subquery reference the outer query, like a correlated subquery, but can return multiple rows:
-- Get the 3 most recent orders for each customer (PostgreSQL):
SELECT c.name, o.total, o.created_at
FROM customers c
CROSS JOIN LATERAL (
SELECT total, created_at
FROM orders
WHERE customer_id = c.id
ORDER BY created_at DESC
LIMIT 3
) o;
Related tools
- SQL Formatter — format and indent SQL queries
- SQL Window Functions — ROW_NUMBER, RANK, LAG/LEAD
Related posts
- SQL Style Guide: The Things That Actually Matter — Keyword casing, indentation, naming conventions, CTEs vs subqueries — the practi…
- SQL CTEs (Common Table Expressions) — WITH Clause Guide — Common Table Expressions (CTEs) use the WITH clause to create named temporary re…
- SQL Formatter Online — Format and Beautify SQL Queries — A SQL formatter adds consistent indentation and line breaks to SQL queries, maki…
- SQL Joins Explained — INNER, LEFT, RIGHT, and FULL JOIN — SQL joins combine rows from multiple tables. INNER JOIN returns matching rows, L…
- SQL Query Optimization — Write Faster Queries with Indexes and Explain — Slow SQL queries are usually missing indexes, using table scans, or written inef…
Related tool
Format and beautify SQL queries. 12 dialect options (PostgreSQL, MySQL, SQLite, MSSQL, BigQuery, Snowflake, and more). Keyword casing control.
Written by Mian Ali Khalid. Part of the Dev Productivity pillar.