Subqueries
Learn how to use nested queries (subqueries) to solve complex data problems.
What is a Subquery?
A subquery is a query nested inside another query. It can return:
- A single value (scalar)
- A single column of values (list)
- A full result set (table)
Subquery Types
| Type | Returns | Used In |
|---|---|---|
| Scalar | Single value | SELECT, WHERE |
| List | Column of values | IN, NOT IN |
| Table | Multiple columns/rows | FROM clause |
| Correlated | Depends on outer query | WHERE, SELECT |
Scalar Subqueries
Returns a single value. Can be used anywhere a single value is expected.
Test Case: Compare to Average
-- Find invoices above average amount
SELECT
id,
tranid,
amount,
(SELECT AVG(amount) FROM transaction WHERE type = 'CustInvc') AS avg_amount
FROM transaction
WHERE type = 'CustInvc'
AND amount > (SELECT AVG(amount) FROM transaction WHERE type = 'CustInvc')
ORDER BY amount DESC
LIMIT 20
Sample Result:
| id | tranid | amount | avg_amount |
|---|---|---|---|
| 501 | INV-501 | 25,000 | 1,960 |
| 502 | INV-502 | 18,500 | 1,960 |
| 503 | INV-503 | 15,200 | 1,960 |
Test Case: Latest Order Date per Customer
-- Customer with their latest order date
SELECT
c.id,
c.companyname,
(SELECT MAX(trandate)
FROM transaction t
WHERE t.entity = c.id AND t.type = 'SalesOrd'
) AS last_order_date
FROM customer c
WHERE c.isinactive = 'F'
ORDER BY c.companyname
LIMIT 20
Subqueries with IN
Returns a list of values for comparison.
Test Case: Customers with Orders
-- Find customers who have placed at least one order
SELECT id, companyname, email
FROM customer
WHERE id IN (
SELECT DISTINCT entity
FROM transaction
WHERE type = 'SalesOrd'
)
ORDER BY companyname
Test Case: Items Never Sold
-- Find items that have never been sold
SELECT id, itemid, displayname
FROM item
WHERE isinactive = 'F'
AND id NOT IN (
SELECT DISTINCT item
FROM transactionline tl
INNER JOIN transaction t ON tl.transaction = t.id
WHERE t.type = 'SalesOrd'
AND tl.item IS NOT NULL
)
ORDER BY itemid
Test Case: Multi-Level IN
-- Find customers in states where we have sales reps
SELECT id, companyname, state
FROM customer
WHERE state IN (
SELECT DISTINCT state
FROM customer c2
INNER JOIN employee e ON c2.salesrep = e.id
WHERE e.isinactive = 'F'
)
ORDER BY state, companyname
Subqueries with EXISTS
Tests whether a subquery returns any rows. Often more efficient than IN.
Test Case: Customers WITH Orders (EXISTS)
-- Customers who have placed orders (using EXISTS)
SELECT c.id, c.companyname
FROM customer c
WHERE EXISTS (
SELECT 1
FROM transaction t
WHERE t.entity = c.id
AND t.type = 'SalesOrd'
)
ORDER BY c.companyname
Test Case: Customers WITHOUT Orders (NOT EXISTS)
-- Customers who have never ordered (using NOT EXISTS)
SELECT c.id, c.companyname, c.datecreated
FROM customer c
WHERE NOT EXISTS (
SELECT 1
FROM transaction t
WHERE t.entity = c.id
AND t.type = 'SalesOrd'
)
ORDER BY c.datecreated DESC
EXISTS vs IN: When to Use
| Use EXISTS When | Use IN When |
|---|---|
| Subquery returns many rows | Subquery returns few rows |
| You only need to check existence | You need the actual values |
| Subquery is correlated | Subquery is independent |
| Better for large datasets | Simpler syntax |
Correlated Subqueries
A subquery that references columns from the outer query. Executes once per outer row.
Illustration
Test Case: Customer's Order Count
-- Each customer with their order count (correlated)
SELECT
c.id,
c.companyname,
(SELECT COUNT(*)
FROM transaction t
WHERE t.entity = c.id
AND t.type = 'SalesOrd'
) AS order_count
FROM customer c
WHERE c.isinactive = 'F'
ORDER BY order_count DESC
LIMIT 20
Test Case: Above Average for Their Category
-- Items priced above their category average
SELECT
i.id,
i.itemid,
i.displayname,
i.class AS category,
i.baseprice,
(SELECT AVG(baseprice)
FROM item i2
WHERE i2.class = i.class
AND i2.isinactive = 'F'
) AS category_avg
FROM item i
WHERE i.isinactive = 'F'
AND i.baseprice > (
SELECT AVG(baseprice)
FROM item i2
WHERE i2.class = i.class
AND i2.isinactive = 'F'
)
ORDER BY i.class, i.baseprice DESC
Test Case: Latest Order for Each Customer
-- Each customer's most recent order details
SELECT
c.companyname,
t.tranid,
t.trandate,
t.amount
FROM transaction t
INNER JOIN customer c ON t.entity = c.id
WHERE t.type = 'SalesOrd'
AND t.trandate = (
SELECT MAX(t2.trandate)
FROM transaction t2
WHERE t2.entity = t.entity
AND t2.type = 'SalesOrd'
)
ORDER BY t.trandate DESC
LIMIT 20
Subqueries in FROM (Derived Tables)
Use a subquery as a virtual table.
Test Case: Customer Summary Report
-- Join customer data with order aggregates
SELECT
c.companyname,
c.email,
order_stats.order_count,
order_stats.total_sales,
order_stats.last_order
FROM customer c
INNER JOIN (
SELECT
entity,
COUNT(*) AS order_count,
SUM(amount) AS total_sales,
MAX(trandate) AS last_order
FROM transaction
WHERE type = 'SalesOrd'
GROUP BY entity
) order_stats ON c.id = order_stats.entity
ORDER BY order_stats.total_sales DESC
LIMIT 20
Test Case: Ranked Results
-- Top 5 items per category (derived table approach)
SELECT *
FROM (
SELECT
i.class AS category,
i.itemid,
i.displayname,
SUM(tl.amount) AS total_sales,
ROW_NUMBER() OVER (PARTITION BY i.class ORDER BY SUM(tl.amount) DESC) AS rank
FROM item i
INNER JOIN transactionline tl ON i.id = tl.item
INNER JOIN transaction t ON tl.transaction = t.id
WHERE t.type = 'SalesOrd'
GROUP BY i.id, i.class, i.itemid, i.displayname
) ranked
WHERE rank <= 5
ORDER BY category, rank
Subqueries in SELECT
Return a calculated value for each row.
Test Case: Multiple Calculated Columns
-- Customer dashboard with multiple metrics
SELECT
c.id,
c.companyname,
c.balance,
(SELECT COUNT(*) FROM transaction t
WHERE t.entity = c.id AND t.type = 'SalesOrd') AS total_orders,
(SELECT COUNT(*) FROM transaction t
WHERE t.entity = c.id AND t.type = 'CustInvc'
AND t.status != 'C') AS open_invoices,
(SELECT MAX(trandate) FROM transaction t
WHERE t.entity = c.id AND t.type = 'SalesOrd') AS last_order
FROM customer c
WHERE c.isinactive = 'F'
ORDER BY c.balance DESC
LIMIT 20
Test Case: Percentage Calculations
-- Each customer's share of total sales
SELECT
c.companyname,
SUM(t.amount) AS customer_sales,
ROUND(
SUM(t.amount) * 100.0 /
(SELECT SUM(amount) FROM transaction WHERE type = 'SalesOrd'),
2
) AS pct_of_total
FROM transaction t
INNER JOIN customer c ON t.entity = c.id
WHERE t.type = 'SalesOrd'
GROUP BY c.id, c.companyname
ORDER BY customer_sales DESC
LIMIT 10
Practical Examples
Example 1: Find Duplicate Records
-- Find customers with duplicate names
SELECT id, companyname, email
FROM customer
WHERE companyname IN (
SELECT companyname
FROM customer
GROUP BY companyname
HAVING COUNT(*) > 1
)
ORDER BY companyname
Example 2: Gap Analysis
-- Find missing order numbers (gaps in sequence)
SELECT seq_num AS missing_number
FROM (
SELECT LEVEL AS seq_num
FROM DUAL
CONNECT BY LEVEL <= (SELECT MAX(CAST(SUBSTR(tranid, 4) AS INTEGER)) FROM transaction WHERE type = 'SalesOrd')
) all_numbers
WHERE seq_num NOT IN (
SELECT CAST(SUBSTR(tranid, 4) AS INTEGER)
FROM transaction
WHERE type = 'SalesOrd'
AND tranid LIKE 'SO-%'
)
Example 3: Compare Current vs Previous Period
-- Sales comparison: current month vs previous month
SELECT
c.companyname,
(SELECT NVL(SUM(amount), 0) FROM transaction
WHERE entity = c.id AND type = 'SalesOrd'
AND trandate >= TRUNC(SYSDATE, 'MM')) AS current_month,
(SELECT NVL(SUM(amount), 0) FROM transaction
WHERE entity = c.id AND type = 'SalesOrd'
AND trandate >= ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -1)
AND trandate < TRUNC(SYSDATE, 'MM')) AS previous_month
FROM customer c
WHERE c.isinactive = 'F'
AND EXISTS (
SELECT 1 FROM transaction t
WHERE t.entity = c.id
AND t.trandate >= ADD_MONTHS(SYSDATE, -2)
)
ORDER BY current_month DESC
LIMIT 20
Example 4: Customers Needing Follow-up
-- Customers with no orders in last 90 days but had orders before
SELECT c.id, c.companyname, c.email, c.salesrep
FROM customer c
WHERE c.isinactive = 'F'
AND EXISTS (
-- Had orders more than 90 days ago
SELECT 1 FROM transaction t
WHERE t.entity = c.id
AND t.type = 'SalesOrd'
AND t.trandate < SYSDATE - 90
)
AND NOT EXISTS (
-- No orders in last 90 days
SELECT 1 FROM transaction t
WHERE t.entity = c.id
AND t.type = 'SalesOrd'
AND t.trandate >= SYSDATE - 90
)
ORDER BY c.companyname
Performance Tips
Avoid Correlated When Possible
-- SLOWER: Correlated subquery (runs for each row)
SELECT c.companyname,
(SELECT SUM(amount) FROM transaction WHERE entity = c.id) AS total
FROM customer c
-- FASTER: JOIN with aggregation (runs once)
SELECT c.companyname, NVL(SUM(t.amount), 0) AS total
FROM customer c
LEFT JOIN transaction t ON c.id = t.entity
GROUP BY c.id, c.companyname
Use EXISTS Instead of COUNT
-- SLOWER: Counts all matching rows
SELECT * FROM customer c
WHERE (SELECT COUNT(*) FROM transaction t WHERE t.entity = c.id) > 0
-- FASTER: Stops at first match
SELECT * FROM customer c
WHERE EXISTS (SELECT 1 FROM transaction t WHERE t.entity = c.id)
Use CTE for Reused Subqueries
-- BAD: Same subquery twice
SELECT *
FROM customer
WHERE balance > (SELECT AVG(balance) FROM customer)
AND creditlimit < (SELECT AVG(balance) FROM customer) * 2
-- GOOD: Calculate once with CTE
WITH AvgBalance AS (
SELECT AVG(balance) AS avg_bal FROM customer
)
SELECT c.*
FROM customer c, AvgBalance ab
WHERE c.balance > ab.avg_bal
AND c.creditlimit < ab.avg_bal * 2
Quick Reference
Subquery Locations
-- In WHERE clause
SELECT * FROM table1
WHERE column IN (SELECT column FROM table2)
-- In FROM clause (derived table)
SELECT * FROM (SELECT ... FROM table) AS derived
-- In SELECT clause (scalar)
SELECT column, (SELECT ... FROM table2) AS calc FROM table1
-- In HAVING clause
SELECT category, SUM(amount)
FROM table1
GROUP BY category
HAVING SUM(amount) > (SELECT AVG(total) FROM summary)
Common Patterns
-- Find in list
WHERE id IN (SELECT id FROM other_table)
-- Find not in list
WHERE id NOT IN (SELECT id FROM other_table WHERE column IS NOT NULL)
-- Check existence
WHERE EXISTS (SELECT 1 FROM other_table WHERE condition)
-- Compare to aggregate
WHERE amount > (SELECT AVG(amount) FROM same_table)