Snippet Name: Postgres_missing_index_check
Tags:
Created Date: 2025-06-16 13:12:42
Last Modified Date: 2025-05-31 10:32:20
Dependencies:
DBA Notes:
Sample Output:
Title: Postgres missing index check
Author: user
With PostGres SQL, is there an equivalent way to get "missing indexes" that SQL server has? I was told I have this code example that uses pg_stat_all_tables that seems to be implying it finds table scans. does that look right, or is there a better way to looka t all indexes/missing indexes, isntea dof individual queries: SELECT
relname,
seq_scan-idx_scan AS too_much_seq,
CASE when seq_scan-idx_scan>0 THEN 'Missing Index?' ELSE 'OK' END,
pg_relation_size(relid::regclass) AS rel_size,
seq_scan,
idx_scan
FROM pg_stat_all_tables
WHERE schemaname='public'
AND pg_relation_size('"' || relname || '"')::regclass>80000
ORDER BY too_much_seq DESC NULLS LAST;
Author: ChatGPT
Great question, Lowell.
You're on the right track ??? in PostgreSQL, there's no direct built-in "missing index DMVs" like SQL Server's `sys.dm_db_missing_index_details`, but you **can approximate** missing indexes using statistics views like `pg_stat_all_tables`, as your query is doing.
### What your query is doing:
```sql
SELECT
relname,
seq_scan - idx_scan AS too_much_seq,
CASE WHEN seq_scan - idx_scan > 0 THEN 'Missing Index?' ELSE 'OK' END,
pg_relation_size(relid::regclass) AS rel_size,
seq_scan,
idx_scan
FROM pg_stat_all_tables
WHERE schemaname = 'public'
AND pg_relation_size('"' || relname || '"')::regclass > 80000
ORDER BY too_much_seq DESC NULLS LAST;
```
#### Key Concepts:
- `seq_scan`: how many times a sequential scan was done on the table.
- `idx_scan`: how many times indexes were used.
- If `seq_scan` >> `idx_scan`, and the table is large (here, size > 80,000 bytes), that **might** indicate a missing or unused index.
However:
- **False Positives**: Some queries might still do seq scans intentionally (e.g., reading the whole table).
- **Size check**: You're coercing a table name string into `regclass` which works, but could fail on non-standard names. You can simplify with `pg_relation_size(relid)` instead ??? which is already used above.
---
### Recommended Improved Query:
```sql
SELECT
schemaname,
relname,
seq_scan,
idx_scan,
seq_scan - idx_scan AS too_much_seq,
pg_size_pretty(pg_relation_size(relid)) AS rel_size,
CASE
WHEN seq_scan - idx_scan > 0 AND pg_relation_size(relid) > 80000 THEN 'Investigate Indexing'
ELSE 'OK'
END AS index_status
FROM pg_stat_all_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY too_much_seq DESC NULLS LAST;
```
---
### Other Helpful Queries
#### 1. **Unused Indexes**
```sql
SELECT
schemaname,
relname,
indexrelname,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
JOIN pg_index ON pg_index.indexrelid = pg_stat_user_indexes.indexrelid
WHERE idx_scan = 0 -- never used
ORDER BY pg_relation_size(indexrelid) DESC;
```
#### 2. **Index Bloat**
For analyzing *bloat* (inefficient index storage), you can use [pgstattuple](https://www.postgresql.org/docs/current/pgstattuple.html), or use tools like [pg_bloat_check](https://github.com/ioguix/pgsql-bloat-estimation).
---
### Third-party Tools
- **[pg_stat_statements](https://www.postgresql.org/docs/current/pgstatstatements.html)**: Track queries that do lots of reads ??? can correlate with missing indexes.
- **[HypoPG](https://github.com/HypoPG/hypopg)**: Simulate hypothetical indexes and see if they???d help.
- **auto_explain** module: Can log plans with seq scans.
---
### Summary
PostgreSQL lacks SQL Server???s precise missing index DMVs, but you can:
- Use `pg_stat_all_tables` for tables with high seq scans.
- Combine with `pg_stat_user_indexes` for index usage.
- Analyze query plans for hotspots using `pg_stat_statements`.
- Optional: consider using `HypoPG` to simulate the benefits of potential indexes.
Let me know if you want a script to combine all of the above into one indexed optimization report.
Author: user
is this something where we could create a job that gathers information to a table for later review , to create the equivalent of sys.dm_db_missing_index_details? or are teh queries from pg_stat_all_tables already cumulative, so there is no need to capture over time/
Author: ChatGPT
Excellent follow-up, and you're absolutely thinking in the right direction.
### ??? Short Answer:
Yes ??? **`pg_stat_all_tables` and related views are cumulative**, **but** they reset when:
- The database is restarted
- Statistics are manually reset via `pg_stat_reset_*()` functions
So if you want a **long-term record like SQL Server's `sys.dm_db_missing_index_*`**, then yes, you **should** create a job that periodically snapshots this data into your own historical table for analysis over time.
---
### ???? What's Cumulative in PostgreSQL?
The following views collect cumulative counters **since the last reset**:
- `pg_stat_all_tables`
- `pg_stat_user_indexes`
- `pg_stat_statements` (if enabled)
These are **analogous to SQL Server's dynamic management views (DMVs)** ??? great for real-time insight, but **ephemeral** over restarts.
---
### ???? Suggested Strategy
1. **Create a table** to store snapshots:
```sql
CREATE TABLE monitoring.index_usage_snapshot (
snapshot_time timestamptz DEFAULT now(),
schemaname text,
relname text,
seq_scan bigint,
idx_scan bigint,
too_much_seq bigint,
rel_size_bytes bigint
);
```
2. **Create a job** (via cron, `pg_cron`, or an external scheduler) that runs this insert regularly:
```sql
INSERT INTO monitoring.index_usage_snapshot (schemaname, relname, seq_scan, idx_scan, too_much_seq, rel_size_bytes)
SELECT
schemaname,
relname,
seq_scan,
idx_scan,
seq_scan - idx_scan AS too_much_seq,
pg_relation_size(relid) AS rel_size_bytes
FROM pg_stat_all_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
```
3. **Use this historical table** to identify patterns:
- Tables with increasing `seq_scan - idx_scan` deltas
- Tables that are growing and never getting indexed
- Times of high sequential scan activity
---
### ???? Bonus: Resetting Stats (if needed)
You can optionally reset stats at specific times to segment snapshots (like doing a delta per day/week):
```sql
SELECT pg_stat_reset(); -- resets all stats for all databases
-- or:
SELECT pg_stat_reset_shared('bgwriter'); -- more granular options exist
```
---
### Final Tip:
Also consider capturing **query fingerprints** from `pg_stat_statements` (if enabled) alongside this data ??? that would be like collecting the `dm_exec_sql_text` part of SQL Server???s missing index data.
---
Would you like me to generate a full SQL job script (with DDL + job + retention policy) or a PowerShell/Linux shell script that runs the insert remotely and logs it?
Author: user
yes please create the full SQL job script (with DDL + job + retention policy)
Author: ChatGPT
Here???s a **complete PostgreSQL SQL job script** that:
1. Creates a schema and table to store index usage snapshots
2. Creates a stored procedure to insert snapshot data
3. Adds a simple retention cleanup policy
4. (Optionally) schedules the job using [pg_cron](https://github.com/citusdata/pg_cron)
> ??? **Assumptions**:
> - You have `pg_cron` installed (for native scheduling in PostgreSQL)
> - You want to retain 30 days of history
> - You???re running this in the `postgres` database, but this can be adjusted
---
### ???? Step 1: Create the schema and table
```sql
CREATE SCHEMA IF NOT EXISTS monitoring;
CREATE TABLE IF NOT EXISTS monitoring.index_usage_snapshot (
snapshot_time timestamptz DEFAULT now(),
schemaname text NOT NULL,
relname text NOT NULL,
seq_scan bigint,
idx_scan bigint,
too_much_seq bigint,
rel_size_bytes bigint
);
```
---
### ???? Step 2: Create a stored procedure to collect the snapshot
```sql
CREATE OR REPLACE PROCEDURE monitoring.collect_index_usage_snapshot()
LANGUAGE plpgsql
AS $$
BEGIN
INSERT INTO monitoring.index_usage_snapshot (
schemaname, relname, seq_scan, idx_scan, too_much_seq, rel_size_bytes
)
SELECT
schemaname,
relname,
seq_scan,
idx_scan,
seq_scan - idx_scan AS too_much_seq,
pg_relation_size(relid) AS rel_size_bytes
FROM pg_stat_all_tables
WHERE schemaname NOT IN ('pg_catalog', 'information_schema');
END;
$$;
```
---
### ???? Step 3: Retention policy (delete rows older than 30 days)
```sql
CREATE OR REPLACE PROCEDURE monitoring.cleanup_old_index_usage_snapshots()
LANGUAGE plpgsql
AS $$
BEGIN
DELETE FROM monitoring.index_usage_snapshot
WHERE snapshot_time < now() - interval '30 days';
END;
$$;
```
---
### ??? Step 4: Schedule the job (using pg_cron)
If you're using `pg_cron`, schedule it like so:
```sql
-- Run every 6 hours
SELECT cron.schedule(
'index_usage_snapshot_job',
'0 */6 * * *', -- every 6 hours
$$CALL monitoring.collect_index_usage_snapshot();$$
);
-- Run cleanup job daily
SELECT cron.schedule(
'index_usage_cleanup_job',
'0 2 * * *', -- daily at 2 AM
$$CALL monitoring.cleanup_old_index_usage_snapshots();$$
);
```
> ???? To see your jobs in `pg_cron`:
```sql
SELECT * FROM cron.job;
```
---
### ???? Optional: Query recent problem tables
```sql
SELECT *
FROM monitoring.index_usage_snapshot
WHERE snapshot_time > now() - interval '1 day'
AND too_much_seq > 100
ORDER BY too_much_seq DESC;
```
---
Let me know if you???d prefer to run this **outside the DB** using PowerShell, Bash, or cronjobs hitting `psql`. I can adapt the script accordingly.
Author: user
that was excellent and very enlightening. i will test it now
Author: ChatGPT
Awesome! I'm really glad it was helpful ??? sounds like you're about to make this a slick part of your toolbox.
Once you've had a chance to test it out, feel free to share how it's working or if you'd like to:
- Expand it to include index-level detail (like unused indexes),
- Tie it in with `pg_stat_statements` for query context,
- Or export/report it for review by others (HTML, CSV, etc.).
Happy tuning, Lowell ??? let me know what you find!