BlogSQL & PerformanceSQL Query Optimization: 7 Techniques for Faster Queries
SQL & Performance

SQL Query Optimization: 7 Techniques for Faster Queries

Learn practical SQL optimization techniques that can dramatically improve query performance in production databases. From indexing strategies to query rewriting patterns.

mdatool Team·February 24, 2026·10 min read
["SQL""Performance""Optimization"]

Introduction

Slow SQL queries cost enterprises millions in lost productivity, degraded user experience, and wasted cloud infrastructure. A single unoptimized query can consume 1000x more resources than necessary, bringing entire applications to their knees during peak load.

This comprehensive guide explores seven battle-tested techniques for optimizing SQL query performance—covering indexing strategies, query rewriting, execution plan analysis, and database tuning patterns used by high-scale applications processing billions of queries daily.


Why SQL Query Optimization Matters

Business Impact:

  • 1-second delay = 7% conversion loss (Amazon research)
  • Page load times > 3 seconds lose 40% of users
  • Slow queries increase cloud costs by 300-500%
  • Poor database performance costs Fortune 500 companies $300K+ annually per application

Technical Consequences:

  • Database CPU saturation (100% utilization)
  • Connection pool exhaustion (applications can't connect)
  • Lock contention and deadlocks
  • Disk I/O bottlenecks (swapping to disk)
  • Query timeouts and user-facing errors

Resource Waste:

  • Unindexed queries scan entire tables (millions of rows)
  • N+1 query problems generate thousands of redundant queries
  • Inefficient joins create massive intermediate result sets
  • Missing query caching results in repeated expensive calculations

Technique 1: Strategic Index Design

Understanding Index Types

B-Tree Indexes (Default):

  • Best for: Equality searches, range queries, sorting
  • Structure: Balanced tree with sorted values
  • Use case: Primary keys, foreign keys, frequently filtered columns

Example:

-- Poor: Full table scan on 10M rows
SELECT * FROM orders 
WHERE customer_id = 12345;

-- Execution: Seq Scan on orders (cost=0.00..180256.00 rows=100)

-- Optimized: Add B-tree index
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Execution: Index Scan using idx_orders_customer_id (cost=0.42..8.44 rows=100)
-- 20,000x faster!

Composite Indexes (Multi-Column)

Index Column Order Matters:

-- Query pattern
SELECT * FROM orders 
WHERE customer_id = 12345 
  AND order_date >= '2024-01-01'
ORDER BY order_date DESC;

-- WRONG index order
CREATE INDEX idx_orders_date_customer ON orders(order_date, customer_id);
-- Can't use index for customer_id filtering efficiently

-- CORRECT index order (matches WHERE clause order)
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
-- Uses index for both filtering AND sorting

Rule of Thumb: Put equality conditions first, then range conditions, then sort columns.


Covering Indexes (Index-Only Scans)

-- Query needs: customer_id, order_date, total_amount
SELECT customer_id, order_date, total_amount
FROM orders
WHERE customer_id = 12345;

-- Regular index: Must access table heap
CREATE INDEX idx_orders_customer ON orders(customer_id);
-- Execution: Index Scan + Heap Fetches (slower)

-- Covering index: All needed columns in index
CREATE INDEX idx_orders_customer_covering ON orders(customer_id) 
INCLUDE (order_date, total_amount);
-- Execution: Index-Only Scan (faster, no heap access!)

Benefit: Avoids expensive heap lookups, reads only index.


Partial Indexes (Filtered)

-- Only 5% of orders are 'PENDING'
-- Full index wastes 95% of space

-- Wasteful: Index all rows
CREATE INDEX idx_orders_status ON orders(status);

-- Efficient: Index only relevant rows
CREATE INDEX idx_orders_pending ON orders(order_id) 
WHERE status = 'PENDING';

-- Query must match filter
SELECT * FROM orders WHERE status = 'PENDING';
-- Uses partial index!

Benefit: Smaller index size, faster updates, targeted queries.


When NOT to Index

Anti-Patterns:

  • Low cardinality columns (e.g., gender with 2-3 values)
  • Small tables (<1000 rows)
  • Columns updated frequently (index maintenance overhead)
  • Columns never used in WHERE/JOIN/ORDER BY
-- BAD: Indexing boolean flag
CREATE INDEX idx_users_is_active ON users(is_active);
-- 50% TRUE, 50% FALSE = useless index

-- BETTER: Partial index for minority case
CREATE INDEX idx_users_inactive ON users(user_id) 
WHERE is_active = false;
-- Assuming inactive users are rare

Technique 2: Query Rewriting for Performance

Avoid SELECT *

-- BAD: Retrieves all 50 columns
SELECT * FROM products WHERE product_id = 123;

-- GOOD: Retrieve only needed columns
SELECT product_id, product_name, price 
FROM products 
WHERE product_id = 123;

Impact:

  • Reduces network transfer by 90%
  • Enables covering indexes
  • Lower memory consumption
  • Faster serialization in application layer

Use EXISTS Instead of COUNT

-- BAD: Counts all matching rows (expensive)
SELECT COUNT(*) FROM orders WHERE customer_id = 12345;
IF count > 0 THEN ...

-- GOOD: Stops at first match
SELECT EXISTS(SELECT 1 FROM orders WHERE customer_id = 12345);

-- Even better with LIMIT
SELECT 1 FROM orders WHERE customer_id = 12345 LIMIT 1;

Performance: EXISTS stops at first match, COUNT scans all rows.


Rewrite Subqueries as JOINs

-- SLOW: Correlated subquery (runs for EACH row)
SELECT 
    c.customer_name,
    (SELECT COUNT(*) FROM orders o WHERE o.customer_id = c.customer_id) as order_count
FROM customers c;

-- FAST: JOIN with GROUP BY (single scan)
SELECT 
    c.customer_name,
    COUNT(o.order_id) as order_count
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name;

Speedup: 100-1000x faster for large datasets.


Use UNION ALL Instead of UNION

-- SLOW: UNION removes duplicates (requires sort)
SELECT customer_id FROM orders_2023
UNION
SELECT customer_id FROM orders_2024;

-- FAST: UNION ALL keeps duplicates (no sort)
SELECT customer_id FROM orders_2023
UNION ALL
SELECT customer_id FROM orders_2024;

Use UNION ALL when:

  • You know there are no duplicates
  • Duplicates are acceptable
  • Tables are partitions (inherently distinct)

Avoid Functions on Indexed Columns

-- BAD: Function prevents index usage
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
-- Execution: Seq Scan (index on email not used)

-- GOOD: Use expression index
CREATE INDEX idx_users_email_upper ON users(UPPER(email));
SELECT * FROM users WHERE UPPER(email) = 'JOHN@EXAMPLE.COM';
-- Execution: Index Scan

-- BETTER: Store uppercase version
ALTER TABLE users ADD COLUMN email_upper VARCHAR(255) 
GENERATED ALWAYS AS (UPPER(email)) STORED;

CREATE INDEX idx_users_email_upper ON users(email_upper);
SELECT * FROM users WHERE email_upper = 'JOHN@EXAMPLE.COM';

Technique 3: JOIN Optimization

JOIN Order Matters

-- Database joins in this order:
-- 1. customers (1M rows)
-- 2. orders (100M rows) 
-- 3. order_items (500M rows)

-- SLOW: Joins largest tables first
SELECT c.customer_name, oi.product_id, oi.quantity
FROM order_items oi
JOIN orders o ON oi.order_id = o.order_id
JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id = 12345;

-- FAST: Filter early, join smaller result sets
SELECT c.customer_name, oi.product_id, oi.quantity
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
WHERE c.customer_id = 12345;

Modern databases reorder JOINs automatically, but explicit ordering helps.


Use Appropriate JOIN Types

JOIN TypeUse CasePerformance
INNER JOINOnly matching rowsFastest
LEFT JOINAll left rows + matchesModerate
RIGHT JOINAll right rows + matchesModerate
FULL OUTER JOINAll rows from bothSlowest
CROSS JOINCartesian productAvoid!
-- BAD: LEFT JOIN when INNER JOIN suffices
SELECT o.order_id, c.customer_name
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id;
-- Returns orders with NULL customers (probably unintended)

-- GOOD: INNER JOIN (faster, correct)
SELECT o.order_id, c.customer_name
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

Avoid Cartesian Products

-- DISASTER: No JOIN condition = Cartesian product
SELECT * FROM orders, customers;
-- 1M orders × 100K customers = 100 BILLION rows!

-- CORRECT: Proper JOIN condition
SELECT * FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id;

Technique 4: Execution Plan Analysis

Reading EXPLAIN Output

-- Get query plan
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;

-- Get plan with actual execution stats
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;

Sample Output:

Seq Scan on orders  (cost=0.00..180256.00 rows=100 width=120)
  Filter: (customer_id = 12345)
Planning Time: 0.123 ms
Execution Time: 2341.567 ms

Key Metrics:

  • Seq Scan: Full table scan (BAD for large tables)
  • Index Scan: Uses index (GOOD)
  • cost=0.00..180256.00: Estimated cost (higher = slower)
  • rows=100: Estimated rows returned
  • Execution Time: Actual runtime

Identifying Problems

Red Flags in Execution Plans:

  1. Seq Scan on large tables

    Seq Scan on orders  (cost=0.00..180256.00 rows=100000)
    

    Fix: Add index

  2. High loop counts

    Nested Loop  (cost=0.00..50000.00 rows=1000 loops=10000)
    

    Fix: Rewrite as JOIN or use different index

  3. Large sort operations

    Sort  (cost=145000.00..148000.00 rows=1000000)
    

    Fix: Add index to avoid sort

  4. Hash joins on small tables

    Hash Join  (cost=1000.00..2000.00 rows=100)
    

    Fix: Nested loop might be faster for small result sets


Using mdatool SQL Linter

The [mdatool [[SQL Linter](/tools/sql-linter)](/tools/sql-linter)](https://www.mdatool.com/sql-linter) automatically detects common query anti-patterns:

Checks for:

  • SELECT * usage
  • Missing indexes on JOIN columns
  • Missing indexes on WHERE columns
  • Functions on indexed columns
  • Cartesian products
  • Subquery optimization opportunities

Example:

-- Paste this into SQL Linter:
SELECT * FROM orders WHERE YEAR(order_date) = 2024;

-- Detected issues:
-- ⚠️ Avoid SELECT * - specify columns
-- ⚠️ Function on indexed column prevents index usage
-- ✅ Suggested fix: WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01'

Try the SQL Linter for free at mdatool.com


Technique 5: Database Statistics and Maintenance

Update Statistics Regularly

-- PostgreSQL: Analyze tables
ANALYZE orders;
ANALYZE customers;

-- Or all tables
ANALYZE;

-- Check when last analyzed
SELECT 
    schemaname,
    tablename,
    last_analyze,
    last_autoanalyze
FROM pg_stat_user_tables
ORDER BY last_analyze NULLS FIRST;

Why it matters: Query planner needs accurate statistics to choose optimal execution plans.


Vacuum to Reclaim Space

-- PostgreSQL: Vacuum to remove dead rows
VACUUM orders;

-- Vacuum with analyze
VACUUM ANALYZE orders;

-- Check table bloat
SELECT 
    schemaname,
    tablename,
    pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
    n_dead_tup
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

Index Maintenance

-- Rebuild fragmented indexes (PostgreSQL)
REINDEX INDEX idx_orders_customer_id;

-- Or entire table
REINDEX TABLE orders;

-- Find unused indexes
SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS size
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE '%pkey'
ORDER BY pg_relation_size(indexrelid) DESC;

Action: Drop unused indexes to improve INSERT/UPDATE performance.


Technique 6: Caching Strategies

Application-Level Caching

// Cache query results in Redis
const cacheKey = `customer:${customerId}:orders`;

// Check cache first
let orders = await redis.get(cacheKey);

if (!orders) {
    // Cache miss - query database
    orders = await db.query(
        'SELECT * FROM orders WHERE customer_id = $1',
        [customerId]
    );
    
    // Store in cache (5 minute TTL)
    await redis.setex(cacheKey, 300, JSON.stringify(orders));
}

return JSON.parse(orders);

Benefit: 99% cache hit rate reduces database load by 100x.


Database Query Result Caching

-- PostgreSQL: Prepared statements (cached execution plans)
PREPARE get_customer_orders (INT) AS
SELECT * FROM orders WHERE customer_id = $1;

EXECUTE get_customer_orders(12345);

Materialized Views

-- Expensive aggregation query
SELECT 
    DATE_TRUNC('day', order_date) as day,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('day', order_date);

-- Create materialized view (pre-computed results)
CREATE MATERIALIZED VIEW daily_sales AS
SELECT 
    DATE_TRUNC('day', order_date) as day,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
GROUP BY DATE_TRUNC('day', order_date);

-- Query materialized view (instant results)
SELECT * FROM daily_sales WHERE day >= '2024-01-01';

-- Refresh periodically
REFRESH MATERIALIZED VIEW CONCURRENTLY daily_sales;

Speedup: 1000x faster for complex aggregations.


Technique 7: Batch Operations and Pagination

Batch Inserts

-- SLOW: Individual inserts (10,000 round trips)
FOR each row:
    INSERT INTO orders (customer_id, total) VALUES (123, 99.99);

-- FAST: Batch insert (1 round trip)
INSERT INTO orders (customer_id, total) VALUES
    (123, 99.99),
    (124, 149.99),
    (125, 79.99),
    ... -- 10,000 rows
    (10123, 199.99);

Speedup: 100-1000x faster.


Efficient Pagination

-- BAD: OFFSET becomes slower as offset increases
SELECT * FROM orders 
ORDER BY order_id 
LIMIT 100 OFFSET 1000000;
-- Scans and discards 1M rows!

-- GOOD: Keyset pagination (cursor-based)
SELECT * FROM orders 
WHERE order_id > 1000000  -- Last seen ID
ORDER BY order_id 
LIMIT 100;
-- Uses index, always fast!

Bulk Updates

-- SLOW: Row-by-row updates
UPDATE orders SET status = 'SHIPPED' WHERE order_id = 1;
UPDATE orders SET status = 'SHIPPED' WHERE order_id = 2;
...

-- FAST: Single batch update
UPDATE orders 
SET status = 'SHIPPED'
WHERE order_id IN (1, 2, 3, ..., 10000);

-- FASTEST: Set-based update
UPDATE orders 
SET status = 'SHIPPED'
WHERE order_date < CURRENT_DATE - INTERVAL '7 days'
  AND status = 'PENDING';

Common Query Anti-Patterns

Anti-Pattern 1: N+1 Query Problem

-- BAD: 1 query + N queries (101 total queries)
SELECT * FROM customers LIMIT 100;
-- Then for each customer:
SELECT * FROM orders WHERE customer_id = ?;

-- GOOD: 1 query with JOIN
SELECT 
    c.customer_id,
    c.customer_name,
    o.order_id,
    o.total_amount
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
LIMIT 100;

Anti-Pattern 2: SELECT COUNT(*) on Large Tables

-- SLOW: Full table scan
SELECT COUNT(*) FROM orders;
-- Scans all 100M rows!

-- FAST: Approximate count
SELECT reltuples::BIGINT FROM pg_class WHERE relname = 'orders';
-- Instant, uses statistics (acceptable for dashboards)

-- MEDIUM: Count with WHERE clause uses index
SELECT COUNT(*) FROM orders WHERE created_at >= '2024-01-01';

Anti-Pattern 3: OR Conditions on Different Columns

-- SLOW: Can't use index efficiently
SELECT * FROM products 
WHERE category = 'Electronics' OR brand = 'Samsung';

-- FAST: Rewrite as UNION
SELECT * FROM products WHERE category = 'Electronics'
UNION
SELECT * FROM products WHERE brand = 'Samsung';

Real-World Case Studies

Problem: Product search taking 8 seconds on 10M products

Before:

SELECT * FROM products 
WHERE LOWER(product_name) LIKE '%laptop%'
  OR LOWER(description) LIKE '%laptop%';
-- Full table scan: 8 seconds

After:

-- Add full-text search index
CREATE INDEX idx_products_search ON products 
USING GIN(to_tsvector('english', product_name || ' ' || description));

-- Use full-text search
SELECT * FROM products 
WHERE to_tsvector('english', product_name || ' ' || description) 
      @@ to_tsquery('english', 'laptop');
-- Index scan: 45ms

Results:

  • 180x faster (8s → 45ms)
  • Supports typo tolerance and ranking
  • Handles 10,000 concurrent searches

Case Study 2: Analytics Dashboard

Problem: Daily sales report taking 2 minutes

Before:

SELECT 
    DATE(order_date),
    COUNT(*),
    SUM(total_amount)
FROM orders
WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY DATE(order_date);
-- 120 seconds

After:

-- Materialized view refreshed hourly
CREATE MATERIALIZED VIEW daily_sales_summary AS
SELECT 
    DATE(order_date) as sale_date,
    COUNT(*) as order_count,
    SUM(total_amount) as revenue
FROM orders
GROUP BY DATE(order_date);

-- Query materialized view
SELECT * FROM daily_sales_summary
WHERE sale_date >= CURRENT_DATE - INTERVAL '30 days';
-- 0.2 seconds

Results:

  • 600x faster (120s → 0.2s)
  • Dashboard loads instantly
  • Hourly refresh acceptable for business

Case Study 3: API Response Time

Problem: Customer order history API timing out at 30 seconds

Before:

-- N+1 query: 1 + 1000 queries
SELECT * FROM customers WHERE customer_id = 12345;
-- Then 1000 separate queries for orders

After:

-- Single query with proper JOINs
SELECT 
    c.*,
    json_agg(o.*) as orders
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id = 12345
GROUP BY c.customer_id;
-- 150ms

Results:

  • 200x faster (30s → 150ms)
  • No timeouts
  • Simplified application code

Optimization Checklist

Query Review:

  • Avoid SELECT * - specify needed columns
  • Use EXISTS instead of COUNT for existence checks
  • Rewrite correlated subqueries as JOINs
  • Use UNION ALL instead of UNION when duplicates acceptable
  • Avoid functions on indexed columns

Index Strategy:

  • Create indexes on WHERE clause columns
  • Create indexes on JOIN columns
  • Use composite indexes for multi-column filters
  • Consider covering indexes for frequently queried columns
  • Use partial indexes for filtered queries
  • Drop unused indexes

Execution Plan Analysis:

  • Run EXPLAIN ANALYZE on slow queries
  • Identify sequential scans on large tables
  • Check for high-cost sort operations
  • Verify index usage on JOINs
  • Use SQL Linter for automated analysis

Database Maintenance:

  • Run ANALYZE regularly (daily for active tables)
  • VACUUM to reclaim space
  • Monitor table bloat
  • Rebuild fragmented indexes
  • Review and drop unused indexes

Application Design:

  • Implement query result caching (Redis/Memcached)
  • Use connection pooling
  • Batch INSERT/UPDATE operations
  • Implement cursor-based pagination
  • Avoid N+1 query problems

Tools and Resources

mdatool Query Optimization Tools:

Database-Specific Tools:

  • PostgreSQL: pg_stat_statements, EXPLAIN ANALYZE
  • MySQL: EXPLAIN, slow query log
  • SQL Server: Query Store, Execution Plans
  • Oracle: AWR reports, SQL Tuning Advisor

Monitoring:

  • Application Performance Monitoring (APM): New Relic, DataDog
  • Database monitoring: pgAdmin, MySQL Workbench
  • Query analyzers: EverSQL, SolarWinds Database Performance Analyzer

Conclusion

SQL query optimization is not a one-time task but an ongoing discipline that compounds performance improvements. The seven techniques covered—strategic indexing, query rewriting, JOIN optimization, execution plan analysis, statistics maintenance, caching, and batch operations—form the foundation of high-performance database applications.

Key Takeaways:

  1. Index strategically - Not every column, focus on WHERE/JOIN/ORDER BY
  2. Read execution plans - Trust data, not intuition
  3. Rewrite for efficiency - Small query changes = massive speedups
  4. Maintain statistics - ANALYZE regularly for accurate query plans
  5. Cache aggressively - Avoid repeated expensive queries
  6. Batch operations - Single query beats N queries every time
  7. Use automation - SQL Linter catches issues early

Organizations that prioritize query optimization see 10-100x performance improvements, reduced infrastructure costs, and happier users. Start optimizing your queries today with mdatool's automated analysis tools.

Optimize your SQL queries now with mdatool's SQL Linter and performance tools.

Try our free tools at mdatool.com

M

mdatool Team

Data modeling experts helping enterprises build better databases and data architectures.

Ready to improve your data architecture?

Free tools for DDL conversion, SQL analysis, naming standards, and more.

Get Started Free