Postgres_missing_index_check

Snippet Name: Postgres_missing_index_check

Tags:

Created Date: 2025-06-16 13:12:42
Last Modified Date: 2025-05-31 10:32:20

Dependencies:

DBA Notes:

Sample Output:

Title: Postgres missing index check
Author: user
With PostGres SQL, is there an equivalent way to get "missing indexes" that SQL server has? I was told I have this code example that uses pg_stat_all_tables that seems to be implying it finds table scans. does that look right, or is there a better way to looka t all indexes/missing indexes, isntea dof individual queries: SELECT 
    relname, 
    seq_scan-idx_scan AS too_much_seq, 
    CASE when seq_scan-idx_scan>0 THEN 'Missing Index?' ELSE 'OK' END,  
    pg_relation_size(relid::regclass) AS rel_size, 
    seq_scan, 
    idx_scan  
FROM pg_stat_all_tables 
WHERE schemaname='public' 
AND pg_relation_size('"' || relname || '"')::regclass>80000 
ORDER BY too_much_seq DESC NULLS LAST;

Author: ChatGPT
Great question, Lowell.

You're on the right track ??? in PostgreSQL, there's no direct built-in "missing index DMVs" like SQL Server's `sys.dm_db_missing_index_details`, but you **can approximate** missing indexes using statistics views like `pg_stat_all_tables`, as your query is doing.

### What your query is doing:
```sql
SELECT 
    relname, 
    seq_scan - idx_scan AS too_much_seq, 
    CASE WHEN seq_scan - idx_scan > 0 THEN 'Missing Index?' ELSE 'OK' END,  
    pg_relation_size(relid::regclass) AS rel_size, 
    seq_scan, 
    idx_scan  
FROM pg_stat_all_tables 
WHERE schemaname = 'public' 
  AND pg_relation_size('"' || relname || '"')::regclass > 80000 
ORDER BY too_much_seq DESC NULLS LAST;
```

#### Key Concepts:
- `seq_scan`: how many times a sequential scan was done on the table.
- `idx_scan`: how many times indexes were used.
- If `seq_scan` >> `idx_scan`, and the table is large (here, size > 80,000 bytes), that **might** indicate a missing or unused index.

However:
- **False Positives**: Some queries might still do seq scans intentionally (e.g., reading the whole table).
- **Size check**: You're coercing a table name string into `regclass` which works, but could fail on non-standard names. You can simplify with `pg_relation_size(relid)` instead ??? which is already used above.

---

### Recommended Improved Query:
```sql
SELECT 
    schemaname,
    relname, 
    seq_scan,
    idx_scan,
    seq_scan - idx_scan AS too_much_seq,
    pg_size_pretty(pg_relation_size(relid)) AS rel_size,
    CASE 
        WHEN seq_scan - idx_scan > 0 AND pg_relation_size(relid) > 80000 THEN 'Investigate Indexing'
        ELSE 'OK'
    END AS index_status
FROM pg_stat_all_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY too_much_seq DESC NULLS LAST;
```

---

### Other Helpful Queries

#### 1. **Unused Indexes**
```sql
SELECT 
    schemaname,
    relname,
    indexrelname,
    idx_scan,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
JOIN pg_index ON pg_index.indexrelid = pg_stat_user_indexes.indexrelid
WHERE idx_scan = 0 -- never used
ORDER BY pg_relation_size(indexrelid) DESC;
```

#### 2. **Index Bloat**
For analyzing *bloat* (inefficient index storage), you can use [pgstattuple](https://www.postgresql.org/docs/current/pgstattuple.html), or use tools like [pg_bloat_check](https://github.com/ioguix/pgsql-bloat-estimation).

---

### Third-party Tools
- **[pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html)**: Track queries that do lots of reads ??? can correlate with missing indexes.
- **[HypoPG](https://github.com/HypoPG/hypopg)**: Simulate hypothetical indexes and see if they???d help.
- **auto_explain** module: Can log plans with seq scans.

---

### Summary
PostgreSQL lacks SQL Server???s precise missing index DMVs, but you can:
- Use `pg_stat_all_tables` for tables with high seq scans.
- Combine with `pg_stat_user_indexes` for index usage.
- Analyze query plans for hotspots using `pg_stat_statements`.
- Optional: consider using `HypoPG` to simulate the benefits of potential indexes.

Let me know if you want a script to combine all of the above into one indexed optimization report.

Author: user
is this something where we could create a job that gathers information to a table  for later review , to create the equivalent of sys.dm_db_missing_index_details? or are teh queries from pg_stat_all_tables already cumulative, so there is no need to capture over time/

Author: ChatGPT
Excellent follow-up, and you're absolutely thinking in the right direction.

### ??? Short Answer:
Yes ??? **`pg_stat_all_tables` and related views are cumulative**, **but** they reset when:
- The database is restarted
- Statistics are manually reset via `pg_stat_reset_*()` functions

So if you want a **long-term record like SQL Server's `sys.dm_db_missing_index_*`**, then yes, you **should** create a job that periodically snapshots this data into your own historical table for analysis over time.

---

### ???? What's Cumulative in PostgreSQL?

The following views collect cumulative counters **since the last reset**:
- `pg_stat_all_tables`
- `pg_stat_user_indexes`
- `pg_stat_statements` (if enabled)

These are **analogous to SQL Server's dynamic management views (DMVs)** ??? great for real-time insight, but **ephemeral** over restarts.

---

### ???? Suggested Strategy

1. **Create a table** to store snapshots:
```sql
CREATE TABLE monitoring.index_usage_snapshot (
    snapshot_time      timestamptz DEFAULT now(),
    schemaname         text,
    relname            text,
    seq_scan           bigint,
    idx_scan           bigint,
    too_much_seq       bigint,
    rel_size_bytes     bigint
);
```

2. **Create a job** (via cron, `pg_cron`, or an external scheduler) that runs this insert regularly:
```sql
INSERT INTO monitoring.index_usage_snapshot (schemaname, relname, seq_scan, idx_scan, too_much_seq, rel_size_bytes)
SELECT 
    schemaname,
    relname,
    seq_scan,
    idx_scan,
    seq_scan - idx_scan AS too_much_seq,
    pg_relation_size(relid) AS rel_size_bytes
FROM pg_stat_all_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
```

3. **Use this historical table** to identify patterns:
   - Tables with increasing `seq_scan - idx_scan` deltas
   - Tables that are growing and never getting indexed
   - Times of high sequential scan activity

---

### ???? Bonus: Resetting Stats (if needed)
You can optionally reset stats at specific times to segment snapshots (like doing a delta per day/week):
```sql
SELECT pg_stat_reset(); -- resets all stats for all databases
-- or:
SELECT pg_stat_reset_shared('bgwriter'); -- more granular options exist
```

---

### Final Tip:
Also consider capturing **query fingerprints** from `pg_stat_statements` (if enabled) alongside this data ??? that would be like collecting the `dm_exec_sql_text` part of SQL Server???s missing index data.

---

Would you like me to generate a full SQL job script (with DDL + job + retention policy) or a PowerShell/Linux shell script that runs the insert remotely and logs it?

Author: user
yes please create the full SQL job script (with DDL + job + retention policy)

Author: ChatGPT
Here???s a **complete PostgreSQL SQL job script** that:

1. Creates a schema and table to store index usage snapshots  
2. Creates a stored procedure to insert snapshot data  
3. Adds a simple retention cleanup policy  
4. (Optionally) schedules the job using [pg_cron](https://github.com/citusdata/pg_cron)

> ??? **Assumptions**:  
> - You have `pg_cron` installed (for native scheduling in PostgreSQL)  
> - You want to retain 30 days of history  
> - You???re running this in the `postgres` database, but this can be adjusted

---

### ???? Step 1: Create the schema and table

```sql
CREATE SCHEMA IF NOT EXISTS monitoring;

CREATE TABLE IF NOT EXISTS monitoring.index_usage_snapshot (
    snapshot_time     timestamptz DEFAULT now(),
    schemaname        text NOT NULL,
    relname           text NOT NULL,
    seq_scan          bigint,
    idx_scan          bigint,
    too_much_seq      bigint,
    rel_size_bytes    bigint
);
```

---

### ???? Step 2: Create a stored procedure to collect the snapshot

```sql
CREATE OR REPLACE PROCEDURE monitoring.collect_index_usage_snapshot()
LANGUAGE plpgsql
AS $$
BEGIN
    INSERT INTO monitoring.index_usage_snapshot (
        schemaname, relname, seq_scan, idx_scan, too_much_seq, rel_size_bytes
    )
    SELECT 
        schemaname,
        relname,
        seq_scan,
        idx_scan,
        seq_scan - idx_scan AS too_much_seq,
        pg_relation_size(relid) AS rel_size_bytes
    FROM pg_stat_all_tables
    WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
END;
$$;
```

---

### ???? Step 3: Retention policy (delete rows older than 30 days)

```sql
CREATE OR REPLACE PROCEDURE monitoring.cleanup_old_index_usage_snapshots()
LANGUAGE plpgsql
AS $$
BEGIN
    DELETE FROM monitoring.index_usage_snapshot
    WHERE snapshot_time < now() - interval '30 days';
END;
$$;
```

---

### ??? Step 4: Schedule the job (using pg_cron)

If you're using `pg_cron`, schedule it like so:

```sql
-- Run every 6 hours
SELECT cron.schedule(
    'index_usage_snapshot_job',
    '0 */6 * * *',  -- every 6 hours
    $$CALL monitoring.collect_index_usage_snapshot();$$
);

-- Run cleanup job daily
SELECT cron.schedule(
    'index_usage_cleanup_job',
    '0 2 * * *',  -- daily at 2 AM
    $$CALL monitoring.cleanup_old_index_usage_snapshots();$$
);
```

> ???? To see your jobs in `pg_cron`:  
```sql
SELECT * FROM cron.job;
```

---

### ???? Optional: Query recent problem tables
```sql
SELECT *
FROM monitoring.index_usage_snapshot
WHERE snapshot_time > now() - interval '1 day'
  AND too_much_seq > 100
ORDER BY too_much_seq DESC;
```

---

Let me know if you???d prefer to run this **outside the DB** using PowerShell, Bash, or cronjobs hitting `psql`. I can adapt the script accordingly.

Author: user
that was excellent and very enlightening. i will test it now

Author: ChatGPT
Awesome! I'm really glad it was helpful ??? sounds like you're about to make this a slick part of your toolbox.

Once you've had a chance to test it out, feel free to share how it's working or if you'd like to:
- Expand it to include index-level detail (like unused indexes),
- Tie it in with `pg_stat_statements` for query context,
- Or export/report it for review by others (HTML, CSV, etc.).

Happy tuning, Lowell ??? let me know what you find!


Leave a Reply